ChkDB |
Open Source Rule-Based Data Checking |
|
Home Contact Download Releases Open Source License Project Contributing The PL/SQL Using ChkDB Installation Read-Me Rule-Sets ChkDB Tables E-R Diagram
Tiny Sample External Links Data Quality and Data Profiling (a glossary) |
ChkDB READMEThis is the README.txt file that comes with the distribution (without the license section)...
=================================================================
README.txt - Using ChkDB Rule-Based Data Checking
(After it has been installed)
Contents:
Re-Running Checks
www.chkdb.org
Domains and Rule-Sets
Populating a Domain
Running Checks
Errors while Running Checks
Reviewing Results
Running the Sample
Adding Tables to Domains
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Re-Running Checks
You are a DBA or SQL expert... you didn't setup ChkDB, but
you have now been told to "Run the ChkDB checks again".
This section describes how to do that.
You need to know:
- the database (the Oracle username and password)
- the "data domain" name
- the names of the tables to be checked
Login to SQL*Plus as the appropriate user.
These queries should each result in some rows:
select * from tab where tname like 'CHKDB%';
select * from CHKDB_DOMAIN;
select * from CHKDB_RULESET;
Pick domains from the results of the select from the
CHKDB_DOMAIN table.
Note that the tables specified to be checked might not
be the tables in your database; they might be views
into those tables or temporary tables to which data is
dumped. You might want to...
select DATA_DOMAIN, count(*) from CHKDB_TABLE
group by DATA_DOMAIN;
and then query CHKDB_TABLE to identify the tables to be
checked. In some cases, this could include multiple
views into the same database table.
Checks are run by executing a procedure in a package,
passing the domain and the table name. The table name
can include the wildcard character '%'.
Ex. To check the CLIENT table in the BOOKS domain:
SQL> call CHKDB_CHECK_DB.APPLY_RULES('BOOKS','CLIENT');
Ex. To check all tables beginning with 'ACCT'
in the BOOKS domain:
SQL> call CHKDB_CHECK_DB.APPLY_RULES('BOOKS','ACCT%');
Checking can take time - depending on table size, indexes,
and other factors, maybe a long time. It is very wise to
run checks on a single large table before attempting to
run checks on multiple large tables.
Results are written to the table CHKDB_RESULT. See the
file: sample_queries.sql
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
www.chkdb.org
The ChkDB website www.chkdb.org includes:
- a page where you can download ChkDB (zip or gz)
- data-domain definitions and rule-set summaries
- the .txt files in the release (including this one)
- the ChkDB table-create script and e-r diagram
- pages showing the sample files and how they are used.
- information about the ChkDB project
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Domains and Rule-Sets
A ChkDB distribution comes with files that define:
- data domains: tables/views that can be checked
- rule-sets: rules to apply to a data domain.
Domain and rule-set files are SQL script files.
To load a domain or rule-set into a database, you (or your
DBA) runs the script file with SQL*Plus.
Existing domains and rule-sets are described at:
http://www.chkdb.org/rulesets.htm
You can open domain or rule-set script files using a
text editor (like Notepad).
Remove a domain from a database by executing the
script remove_domain.sql, specifying the domain name.
Ex. If the domain is CHKDB_SAMP_HR:
sql> @remove_domain.sql CHKDB_SAMP_HR
This will drop tables listed in CHKDB_TABLE, for this
domain, where DROP_ON_REMOVE_YN is set to 'Y'. This
script will also delete all rule-sets in the domain.
Remove a rule-set (along with its rules and results)
by executing the script remove_ruleset.sql, specifying the
domain name and rule-set name. Ex:
sql> @remove_ruleset CHKDB_SAMP_HR SAMP_RULES
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Populating a Domain
A domain defines a set of tables or views to be checked.
Data checking can be done two ways:
- checking views into your database
- checking tables that are populated from your database
For each 'create table' statement in a domain script,
you can:
- define a view into your database based on the
'create table' statement, or,
- create the table and populate it from your database
with some software before checking is done.
Sometimes, you would like to create a view that is just
a subset of another view or table, but it would be
difficult to specify the 'create view' statement to
select the desired subset. One way to handle this is
to create a table with the same primary key as the
source table. You populate this table with the keys
of the subset using some software. Then you can define
the view simply, maybe something like this:
CREATE VIEW EMP_SUPERVISOR as
SELECT * from EMP e where exists
(select 'x' from SUPERVISOR_LIST s
where s.EMP_ID = e.EMP_ID);
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Running Checks
To run checks against one or more tables, you run a
procedure in a stored PL/SQL package.
You control which rules are applied with the ACTIVE_YN
column in the ChkDB tables. You can turn checking on or
off for a domain, table, rule-set or individual rule.
You run checks by calling a procedure. You can specify
a data domain and a table name. The table name can include
the wild-card character '%'.
Examples (using SQL*Plus):
To check the CHKDB_SAMP_DEPT table
in the CHKDB_SAMP_HR domain:
SQL> CALL CHKDB_CHECK_DB.APPLY_RULES('CHKDB_SAMP_HR','CHKDB_SAMP_DEPT');
To check all the tables whose names begin with CHKDB_SAMP
in the CHKDB_SAMP_HR domain:
SQL> CALL CHKDB_CHECK_DB.APPLY_RULES('CHKDB_SAMP_HR','CHKDB_SAMP%');
To apply all active rules in the CHKDB_SAMP_HR domain:
SQL> CALL CHKDB_CHECK_DB.APPLY_RULES('CHKDB_SAMP_HR','%');
Note that when a check is run for one or more tables, all
existing results for those tables is first deleted.
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Errors while Running Checks
If an error occurs when you run a check, it may be a result
of a rule that was not specified correctly in the CHKDB_RULE
table.
See the table CHKDB_RULE_ERR - if an error occurred because
of how a rule is defined, it will be described in this
table.
The table CHKDB_RULE_SQL has the SQL that was generated for
each rule that is executed.
If you create or modify rules, remember:
- Inside a string delimited by single quotes, you specify
a single quote as a pair of them - ex. 'don''t wait'.
- In the SQL generated for a rule, the table to which the
rule applies has an alias of t. You may need this
alias in where clauses that use 'exists' or if the
table being tested and CHKDB tables have columns with
the same name.
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Reviewing Results
When a check is run and a rule is found to apply to a
row in the table being checked, a row is inserted into
the CHKDB_RESULT table.
You may have an application or a way of running reports to
help you review these results, or you may run queries from
SQL*Plus.
A ChkDB distribution includes sample queries in a file
called: sample_queries.sql
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Running the Sample
You can execute run_sample.sql to:
- create and populate a sample database of two tables
- define a sample domain and rule-set
- run checks against the sample database
- run sample queries (overwriting sample_results.txt)
Alternatively, you may use the run_sample.sql script as
a guide to executing the various sample scripts one at a
time.
You might want to make a copy of sample_results.txt
before the file is overwritten by the sample queries.
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
|