Fix constraints with conditional sampling #866

amontanez24 · 2022-07-06T01:45:32Z

Environment Details

Please indicate the following details about the environment in which you found the bug:

SDV version: Master branch
Python version: Any
Operating System: Any

Error Description

If both constraints and conditions are specified on the same column, the Table.py class is incorrectly adding the same constraint to the _constraints_to_reverse attribute and upon reversing, this causes it to crash.

We should only add constraints to the _constraints_to_reverse attribute, if the is_condition parameter is False. This can be changed here

SDV/sdv/metadata/table.py

Lines 442 to 449 in c68ac3c

    
           if not is_condition: 
        
               self._constraints_to_reverse = [] 
        
           for constraint in self._constraints: 
        
               try: 
        
                   data = constraint.transform(data) 
        
                   self._constraints_to_reverse.append(constraint)

Requirements

Conditional sampling on a column that is sconstrained should work
Integration tests should be added for that scenario

Steps to reproduce

from sdv.sampling import Condition


data = pd.DataFrame(data={
    'low_col': [i for i in range(50)],
    'mid_col': [i+1 for i in range(50)],
    'high_col': [i+2 for i in range(50)]
})

i_constraint_1 = Inequality(
    low_column_name='low_col',
    high_column_name='mid_col'
)

model = GaussianCopula(constraints=[i_constraint_1])
model.fit(data)

my_condition = Condition(column_values={'low_col': 1, 'mid_col': 2}, num_rows=2)
model.sample_conditions(conditions=[my_condition])

Sampling conditions:   0%|          | 0/2 [00:00<?, ?it/s]
Error: Sampling terminated. Partial results are stored in a temporary file: .sample.csv.temp. This file will be overridden the next time you sample. Please rename the file if you wish to save these results.
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

17 frames
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'low_col#mid_col'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 'low_col#mid_col'

The text was updated successfully, but these errors were encountered:

amontanez24 added bug Something isn't working new Automatic label applied to new issues labels Jul 6, 2022

npatki removed the new Automatic label applied to new issues label Jul 6, 2022

amontanez24 mentioned this issue Jul 6, 2022

Fix constraints with conditional sampling #869

Merged

amontanez24 closed this as completed in #869 Jul 7, 2022

pvk-developer added this to the 0.16.0 milestone Jul 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix constraints with conditional sampling #866

Fix constraints with conditional sampling #866

amontanez24 commented Jul 6, 2022

Fix constraints with conditional sampling #866

Fix constraints with conditional sampling #866

Comments

amontanez24 commented Jul 6, 2022

Environment Details

Error Description

Requirements

Steps to reproduce