You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my table, I have a datetime column (submission_timestamp) and a date column (due_date). I want to synthesize data with an Inequality constraint showing that submission_timestamp <= due_date.
However, I am unable to apply an Inequality constraint to this data; SDV complains that the data violates the constraint.
The problem is that the date does not have enough granularity;
SDV (and Python in general) assumes that a due date such as 2016-10-12 is referring to the beginning of the day (2016-10-12 00:00:00)
However, in my dataset, a due date of 2016-10-12 is referring to the end of the day (2016-10-12 11:59:59) because it is ok to make a submission any time exactly on that day. This is why the inequality submission_timestamp <= due_date should be true.
Expected Behavior
I expect that when strict_boundaries=False with a date column, the assumed timestamp allow for the loosest possible interpretation:
If a date is a high column, we should assume it is referring to the end of day
If a date is a low column, we should assume it is referring to the beginning of day
The opposite should be assumed when strict_boundaries=True.
ConstraintsNotMetError:
Data is not valid for the 'Inequality' constraint:
SUBMISSION_TIMESTAMP DUE_DATE
0 2016-07-10 17:04:00 2016-07-10
1 2016-07-11 13:23:00 2016-07-11
2 2016-07-12 08:45:30 2016-07-12
The text was updated successfully, but these errors were encountered:
For any users encountering this: One workaround is simply to add 1 day to the date column for the purposes of SDV modeling. After creating synthetic data, it can be moved back one.
importpandasaspddata_copy=data.copy()
# add 1 day to each value in the high column and save it back in the original formatdata_copy['DUE_DATE'] =pd.to_datetime(data_copy['DUE_DATE']) +pd.DateOffset(1)
data_copy['DUE_DATE'] =data_copy['DUE_DATE'].dt.strftime('%Y-%m-%d')
# now fit and sample as usualsynthesizer=GaussianCopulaSynthesizer(metadata)
constraint= {
'constraint_class': 'Inequality',
'constraint_parameters': {
'low_column_name': 'SUBMISSION_TIMESTAMP',
'high_column_name': 'DUE_DATE',
'strict_boundaries': False
}
}
synthesizer.add_constraints([constraint])
synthesizer.fit(data_copy)
synthetic_data=synthesizer.sample(num_rows=5)
# finally subtract 1 day to the high column and save it back to the original formatsynthetic_data['DUE_DATE'] =pd.to_datetime(synthetic_data['DUE_DATE']) -pd.DateOffset(1)
synthetic_data['DUE_DATE'] =synthetic_data['DUE_DATE'].dt.strftime('%Y-%m-%d')
npatki
changed the title
Inequality constraint cannot be applied to compare datetime to date (end-of-day)
Inequality constraint cannot be applied to compare datetime to date
Nov 1, 2024
Environment Details
Error Description
This bug was first noticed by a Slack user.
In my table, I have a datetime column (
submission_timestamp
) and a date column (due_date
). I want to synthesize data with an Inequality constraint showing thatsubmission_timestamp <= due_date
.However, I am unable to apply an Inequality constraint to this data; SDV complains that the data violates the constraint.
The problem is that the date does not have enough granularity;
2016-10-12
is referring to the beginning of the day (2016-10-12 00:00:00
)2016-10-12
is referring to the end of the day (2016-10-12 11:59:59
) because it is ok to make a submission any time exactly on that day. This is why the inequalitysubmission_timestamp <= due_date
should be true.Expected Behavior
I expect that when
strict_boundaries=False
with a date column, the assumed timestamp allow for the loosest possible interpretation:The opposite should be assumed when
strict_boundaries=True
.Steps to reproduce
The text was updated successfully, but these errors were encountered: