You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please indicate the following details about the environment in which you found the bug:
SDV version:
Python version:
Operating System:
Error Description
When running PAR with categorical columns that are floats, PAR does not stick to the original categories when sampling. This leads to a very low diagnostic score for 'Data Validity' due to the CategoryAdherence metric failing.
If anyone is running into this, here is a suggested workaround:
Identify any categorical columns (in the metadata) that are actually represented as numbers in your data (ints, floats, etc.)
Cast these columns as objects before inputting them into the PARSynthesizer.
At the end when you get synthetic data, cast them back as ints, floats, etc.
Here is a code snippet that accomplishes the below. Replace the list CAT_COLUMN_NAMES with the list of your column names.
CAT_COLUMN_NAMES= ['ColA', 'ColB', ... ]
data=<yourpandasDataFrame>metadata=<yourSingleTableMetadataobject># cast the categorical columns to stringsforcol_nameinCAT_COLUMN_NAMES:
data[col_name] =data[col_name].astype('object')
# now proceed with modeling and sampling as usualsynthesizer=PARSynthesizer(metadata)
synthesizer.fit(data)
synthetic_data=synthesizer.sample(num_sequences=10)
# (optional) cast the categorical columns back to floatsforcol_nameinCAT_COLUMN_NAMES:
try:
synthetic_data[col_name] =synthetic_data[col_name].astype('float')
except:
print('Column name', col_name, 'could not be converted back to a float')
continue
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
When running PAR with categorical columns that are floats, PAR does not stick to the original categories when sampling. This leads to a very low diagnostic score for
'Data Validity'
due to theCategoryAdherence
metric failing.Steps to reproduce
The text was updated successfully, but these errors were encountered: