Skip to content

Latest commit

 

History

History
39 lines (27 loc) · 3.65 KB

ct-table-generator.md

File metadata and controls

39 lines (27 loc) · 3.65 KB

Contingency Table Generator for Relational Data

Implements an efficient SQL-based method for computing instantiation counts for conjunctive conditions in a relational database. The method is described in our algorithm paper. See publication list for a brief summary.

Input

  • A relational database datadb
  • A set of first-order terms t1,..,tn. Default: complete, contains all terms associated with the relational schema.
  • Optional Expansion: A set of first-order variables A1,...,Ae,. Default: empty.
  • Optional Grounding: A set of groundings first-order variable A1=a1,...,_Ag=ag. Default: empty.

Output

A contingency table of the form

count t1 ... tn (A1) ... (Ae)
integer value1 .... valuen a1 .... ae

where

  • the values in each row define a conjunctive query t1=value1,..,tn=valuen
  • count is the number of times that the query is instantiated in the database data_db (i.e. the size of the query's result set).
  • If groundings of the form A = a are specified, the system adds each condition A = a to the query; the contingency table represents the query instantiation counts for the individuals named a only.
  • If first-order variables are specified, then a is a constant denoting an individual from the domain of each give first-order variable A. The contingency table represents the query instantiation counts for each tuple (a1,ae) of individuals in the respective domains of (A1,...,Ae).

Usage

  1. Specify the input database datadb: Modify jar/config.cfg with your own configuration as explained in the repository readme. See our project website for an explanation of the options. Make sure that the "AutomaticSetUp" option is set to 0.
  2. Run MakeSetup.runMS(). This creates a database named datadb_setup containing metadata. Edit the following tables (using SQL).
    • FunctorSet contains a list of all first-order terms for the database (called Fnodes). Delete all terms that should not be in the contingency table.
    • Expansions is empty by default. Insert first-order variables for expansions. The table Pvariables lists the available first-order variables (called population variables).
    • Groundings is empty by default. Insert first-order variables and constants for groundings. The table Pvariables lists the available first-order variables (called population variables).
  3. Run BayesBaseCT_SortMerge.buildCT(). This writes the contingency table to a database called datadb_ct.

Implementation Notes

  • The system first generates counts for conditions that involve only natural joins among existing tables. These counts correspond to contingency table entries where all relationships indicator variables are set to true. They are computed by automatically generating the appropriate select count(*) sql queries and executing the queries. If the option "LinkCorrelations" is set to 0, the system stops.
  • If the option "LinkCorrelations" is set to 1, the system uses the Moebius Join Algorithm to compute the correct counts for negated relationships (representing nonexisting links).