Data Quality and Data Profiling

A Glossary of Terms

by Brian Marshall

Data Quality is a measure of the value of data in relation to
potential data problems:

Data quality can be analyzed in relation to a particular
Data Quality Domain, making it possible to determine the
importance of different data problems.

[Back to Top]

A Data Quality Domain is an application or use of data that
imposes a set of
Data Quality Rules, each of which is associated
with a degree-of-importance for the domain.

[Back to Top]

A Data Quality Rule is a specification of one or more data
quality problems which should not exist in a set of data.

For example, a data quality rule might specify that in the
EMPLOYEE table, EMPLOYEE_NAME must be set and
that it must contain only letters and spaces. Another rule
might specify that EMPLOYEE_NAME should not contain
multiple consecutive spaces. These two rules might be specified
separately so that, in a particular
Data Quality Domain,
they can be assigned different degrees-of-importance.

[Back to Top]

Data Profiling can refer to: Data Quality Profiling or Database Profiling.

[Back to Top]

Data Quality Profiling is the process of analyzing a database
in relation to a
Data Quality Domain, to identify and prioritize
data quality problems. The results can include:

Data quality profiling can be useful when planning and managing
data cleanup projects.

[Back to Top]

Database Profiling is the process of analyzing a database to determine
its structure and internal relationships:

Database Profiling can also include analysis of:

Database profiling can be useful when planning and managing
data conversion and data cleanup projects.

Database profiling can be an initial step in defining a
Data Quality Domain, which is used in Data Quality Profiling.

[Back to Top]


This page was written by Brian Marshall of Calgary.
Brian started the ChkDB Open Source project.