Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix constraints with conditional sampling #866

Closed
amontanez24 opened this issue Jul 6, 2022 · 0 comments · Fixed by #869
Closed

Fix constraints with conditional sampling #866

amontanez24 opened this issue Jul 6, 2022 · 0 comments · Fixed by #869
Labels
bug Something isn't working
Milestone

Comments

@amontanez24
Copy link
Contributor

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDV version: Master branch
  • Python version: Any
  • Operating System: Any

Error Description

If both constraints and conditions are specified on the same column, the Table.py class is incorrectly adding the same constraint to the _constraints_to_reverse attribute and upon reversing, this causes it to crash.

We should only add constraints to the _constraints_to_reverse attribute, if the is_condition parameter is False. This can be changed here

SDV/sdv/metadata/table.py

Lines 442 to 449 in c68ac3c

if not is_condition:
self._constraints_to_reverse = []
for constraint in self._constraints:
try:
data = constraint.transform(data)
self._constraints_to_reverse.append(constraint)

Requirements

  • Conditional sampling on a column that is sconstrained should work
  • Integration tests should be added for that scenario

Steps to reproduce

from sdv.sampling import Condition


data = pd.DataFrame(data={
    'low_col': [i for i in range(50)],
    'mid_col': [i+1 for i in range(50)],
    'high_col': [i+2 for i in range(50)]
})

i_constraint_1 = Inequality(
    low_column_name='low_col',
    high_column_name='mid_col'
)

model = GaussianCopula(constraints=[i_constraint_1])
model.fit(data)

my_condition = Condition(column_values={'low_col': 1, 'mid_col': 2}, num_rows=2)
model.sample_conditions(conditions=[my_condition])
Sampling conditions:   0%|          | 0/2 [00:00<?, ?it/s]
Error: Sampling terminated. Partial results are stored in a temporary file: .sample.csv.temp. This file will be overridden the next time you sample. Please rename the file if you wish to save these results.
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

17 frames
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'low_col#mid_col'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 'low_col#mid_col'
@amontanez24 amontanez24 added bug Something isn't working new Automatic label applied to new issues labels Jul 6, 2022
@npatki npatki removed the new Automatic label applied to new issues label Jul 6, 2022
@pvk-developer pvk-developer added this to the 0.16.0 milestone Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants