ChkDB

Open Source Rule-Based Data Checking
Home
Contact
Download
Releases
Open Source
License
Project
Contributing
The PL/SQL
Using ChkDB
Installation
Read-Me
Rule-Sets
ChkDB Tables
E-R Diagram

Tiny Sample
Database
Data Domain
Implementation
Rule-Set
Do Checks
Queries
Results


External Links
Data Quality and
Data Profiling
(a glossary)

ChkDB README

This is the README.txt file that comes with the distribution (without the license section)...


=================================================================
 README.txt   -   Using ChkDB Rule-Based Data Checking
                  (After it has been installed)

 Contents:
    Re-Running Checks
    www.chkdb.org
    Domains and Rule-Sets
    Populating a Domain
    Running Checks
    Errors while Running Checks
    Reviewing Results
    Running the Sample
    Adding Tables to Domains


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Re-Running Checks

You are a DBA or SQL expert... you didn't setup ChkDB, but
you have now been told to "Run the ChkDB checks again".  
This section describes how to do that.

You need to know:
  - the database (the Oracle username and password)
  - the "data domain" name
  - the names of the tables to be checked

Login to SQL*Plus as the appropriate user.  
These queries should each result in some rows:
   select * from tab where tname like 'CHKDB%';
   select * from CHKDB_DOMAIN;
   select * from CHKDB_RULESET;

Pick domains from the results of the select from the
CHKDB_DOMAIN table.

Note that the tables specified to be checked might not
be the tables in your database; they might be views
into those tables or temporary tables to which data is
dumped.  You might want to...
   select DATA_DOMAIN, count(*) from CHKDB_TABLE
     group by DATA_DOMAIN;
and then query CHKDB_TABLE to identify the tables to be
checked.  In some cases, this could include multiple
views into the same database table.

Checks are run by executing a procedure in a package,
passing the domain and the table name.  The table name
can include the wildcard character '%'.

Ex. To check the CLIENT table in the BOOKS domain:
SQL> call CHKDB_CHECK_DB.APPLY_RULES('BOOKS','CLIENT');

Ex. To check all tables beginning with 'ACCT'
       in the BOOKS domain:
SQL> call CHKDB_CHECK_DB.APPLY_RULES('BOOKS','ACCT%');

Checking can take time - depending on table size, indexes,
and other factors, maybe a long time.  It is very wise to
run checks on a single large table before attempting to
run checks on multiple large tables.

Results are written to the table CHKDB_RESULT.  See the
file:  sample_queries.sql


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    www.chkdb.org

The ChkDB website  www.chkdb.org  includes:

  - a page where you can download ChkDB  (zip or gz)

  - data-domain definitions and rule-set summaries

  - the .txt files in the release (including this one)

  - the ChkDB table-create script and e-r diagram

  - pages showing the sample files and how they are used.
    
  - information about the ChkDB project


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Domains and Rule-Sets

A ChkDB distribution comes with files that define:
 - data domains: tables/views that can be checked
 - rule-sets: rules to apply to a data domain.

Domain and rule-set files are SQL script files. 

To load a domain or rule-set into a database, you (or your
DBA) runs the script file with SQL*Plus.

Existing domains and rule-sets are described at:
http://www.chkdb.org/rulesets.htm

You can open domain or rule-set script files using a 
text editor (like Notepad).

Remove a domain from a database by executing the
script remove_domain.sql, specifying the domain name. 
Ex.  If the domain is CHKDB_SAMP_HR:
sql> @remove_domain.sql CHKDB_SAMP_HR
This will drop tables listed in CHKDB_TABLE, for this
domain, where DROP_ON_REMOVE_YN is set to 'Y'.  This
script will also delete all rule-sets in the domain.

Remove a rule-set (along with its rules and results)
by executing the script remove_ruleset.sql, specifying the 
domain name and rule-set name. Ex:
sql> @remove_ruleset CHKDB_SAMP_HR SAMP_RULES


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Populating a Domain

A domain defines a set of tables or views to be checked.  

Data checking can be done two ways:
  - checking views into your database
  - checking tables that are populated from your database

For each 'create table' statement in a domain script,
you can:
  - define a view into your database based on the
    'create table' statement, or,
  - create the table and populate it from your database
    with some software before checking is done.

Sometimes, you would like to create a view that is just
a subset of another view or table, but it would be
difficult to specify the 'create view' statement to
select the desired subset.  One way to handle this is
to create a table with the same primary key as the 
source table.  You populate this table with the keys
of the subset using some software.  Then you can define 
the view simply, maybe something like this:
    CREATE VIEW EMP_SUPERVISOR as
      SELECT * from EMP e where exists 
         (select 'x' from SUPERVISOR_LIST s
            where s.EMP_ID = e.EMP_ID);

    
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Running Checks

To run checks against one or more tables, you run a 
procedure in a stored PL/SQL package.

You control which rules are applied with the ACTIVE_YN
column in the ChkDB tables.  You can turn checking on or 
off for a domain, table, rule-set or individual rule.

You run checks by calling a procedure.  You can specify
a data domain and a table name.  The table name can include
the wild-card character '%'.

  Examples (using SQL*Plus):

To check the CHKDB_SAMP_DEPT table 
  in the CHKDB_SAMP_HR domain:
SQL> CALL CHKDB_CHECK_DB.APPLY_RULES('CHKDB_SAMP_HR','CHKDB_SAMP_DEPT');

To check all the tables whose names begin with CHKDB_SAMP
  in the CHKDB_SAMP_HR domain:
SQL> CALL CHKDB_CHECK_DB.APPLY_RULES('CHKDB_SAMP_HR','CHKDB_SAMP%');

To apply all active rules in the CHKDB_SAMP_HR domain:
SQL> CALL CHKDB_CHECK_DB.APPLY_RULES('CHKDB_SAMP_HR','%');

Note that when a check is run for one or more tables, all
existing results for those tables is first deleted.


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Errors while Running Checks

If an error occurs when you run a check, it may be a result
of a rule that was not specified correctly in the CHKDB_RULE
table.

See the table CHKDB_RULE_ERR - if an error occurred because
of how a rule is defined, it will be described in this
table.

The table CHKDB_RULE_SQL has the SQL that was generated for
each rule that is executed.

If you create or modify rules, remember:
 - Inside a string delimited by single quotes, you specify
     a single quote as a pair of them - ex.  'don''t wait'.
 - In the SQL generated for a rule, the table to which the
     rule applies has an alias of t.  You may need this 
     alias in where clauses that use 'exists' or if the
     table being tested and CHKDB tables have columns with
     the same name.


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Reviewing Results

When a check is run and a rule is found to apply to a
row in the table being checked, a row is inserted into
the CHKDB_RESULT table.

You may have an application or a way of running reports to 
help you review these results, or you may run queries from
SQL*Plus.

A ChkDB distribution includes sample queries in a file
called:  sample_queries.sql


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Running the Sample

You can execute  run_sample.sql to:
  - create and populate a sample database of two tables
  - define a sample domain and rule-set
  - run checks against the sample database
  - run sample queries (overwriting  sample_results.txt)

Alternatively, you may use the run_sample.sql script as
a guide to executing the various sample scripts one at a 
time.

You might want to make a copy of sample_results.txt 
before the file is overwritten by the sample queries.


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =