Implements an efficient SQL-based method for computing instantiation counts for conjunctive conditions in a relational database. The method is described in our algorithm paper. See publication list for a brief summary.
- A relational database
datadb
- A set of first-order terms t1,..,tn. Default: complete, contains all terms associated with the relational schema.
- Optional Expansion: A set of first-order variables A1,...,Ae,. Default: empty.
- Optional Grounding: A set of groundings first-order variable A1=a1,...,_Ag=ag. Default: empty.
A contingency table of the form
count | t1 | ... | tn | (A1) | ... | (Ae) |
---|---|---|---|---|---|---|
integer | value1 | .... | valuen | a1 | .... | ae |
where
- the values in each row define a conjunctive query t1=value1,..,tn=valuen
count
is the number of times that the query is instantiated in the databasedata_db
(i.e. the size of the query's result set).- If groundings of the form A = a are specified, the system adds each condition A = a to the query; the contingency table represents the query instantiation counts for the individuals named a only.
- If first-order variables are specified, then a is a constant denoting an individual from the domain of each give first-order variable A. The contingency table represents the query instantiation counts for each tuple (a1,ae) of individuals in the respective domains of (A1,...,Ae).
- Specify the input database
datadb
: Modifyjar/config.cfg
with your own configuration as explained in the repository readme. See our project website for an explanation of the options. Make sure that the "AutomaticSetUp" option is set to 0. - Run
MakeSetup.runMS()
. This creates a database nameddatadb_setup
containing metadata. Edit the following tables (using SQL).FunctorSet
contains a list of all first-order terms for the database (called Fnodes). Delete all terms that should not be in the contingency table.Expansions
is empty by default. Insert first-order variables for expansions. The tablePvariables
lists the available first-order variables (called population variables).Groundings
is empty by default. Insert first-order variables and constants for groundings. The tablePvariables
lists the available first-order variables (called population variables).
- Run
BayesBaseCT_SortMerge.buildCT()
. This writes the contingency table to a database calleddatadb_ct
.
- The system first generates counts for conditions that involve only natural joins among existing tables. These counts correspond to contingency table entries where all relationships indicator variables are set to true. They are computed by automatically generating the appropriate
select count(*)
sql queries and executing the queries. If the option "LinkCorrelations" is set to 0, the system stops. - If the option "LinkCorrelations" is set to 1, the system uses the Moebius Join Algorithm to compute the correct counts for negated relationships (representing nonexisting links).