-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PARSynthesizer errors during fit
if sequence_index is numerical sdtype
#2079
Comments
Is there any workaround end-users can do to get around this in the meantime till this release drops @lajohn4747 ? |
Hi @ryantimjohn, sure thing. The bug only appears when import pandas as pd
from sdv.sequential import PARSynthesizer
index_name = 'COLUMN_NAME' # replace with the name of your numerical sequence index column
# convert the sequence index to datetime and update metadata to match
data[index_name] = pd.to_datetime('2000-01-01') + pd.to_timedelta(data[index_name], unit='d')
metadata.update_column(
column_name=index_name,
sdtype='datetime'
)
# now you can model and sample synthetic data using PAR
synthesizer = PARSynthesizer(metadata)
synthesizer.fit(data)
synthetic_data = synthesizer.sample(num_sequences=10)
# be sure to convert the datetimes back into numbers
synthetic_data[index_name] = synthetic_data[index_name] - pd.to_datetime('2000-01-01') This is a bit hacky, but after the next release, you will not need to apply the workaround. Hope that helps! |
Came up with the same solution, thank you! |
@npatki Unfortunately, when I did this, though, I got another error, sorry to ask for help troubleshooting but wondered if you could help because I saw you dealt with a similar error here: After converting the sequence index column to a date and update the metadata, when I run the dataframe through the PAR Synthesizer, I get this error: Is there any reason why this might be that comes to mind? Thanks very much for your help! Full stack trace: Cell In[15], line 5
1 from sdv.sequential import PARSynthesizer
2 synthesizer = PARSynthesizer(
3 modified_metadata,context_columns=context_columns,enforce_min_max_values=False,
4 verbose=True)
----> 5 synthesizer.fit(modified_data)
File /opt/conda/lib/python3.10/site-packages/sdv/single_table/base.py:460, in BaseSynthesizer.fit(self, data)
458 self._data_processor.reset_sampling()
459 self._random_state_set = False
--> 460 processed_data = self.preprocess(data)
461 self.fit_processed_data(processed_data)
File /opt/conda/lib/python3.10/site-packages/sdv/single_table/base.py:396, in BaseSynthesizer.preprocess(self, data)
389 warnings.warn(
390 'This model has already been fitted. To use the new preprocessed data, '
391 "please refit the model using 'fit' or 'fit_processed_data'."
392 )
394 is_converted = self._store_and_convert_original_cols(data)
--> 396 preprocess_data = self._preprocess(data)
398 if is_converted:
399 data.columns = self._original_columns
File /opt/conda/lib/python3.10/site-packages/sdv/sequential/par.py:280, in PARSynthesizer._preprocess(self, data)
277 if not self._data_processor._prepared_for_fitting:
278 self.auto_assign_transformers(data)
--> 280 self.update_transformers(sequence_key_transformers)
281 preprocessed = super()._preprocess(data)
283 if self._sequence_index:
File /opt/conda/lib/python3.10/site-packages/sdv/sequential/par.py:303, in PARSynthesizer.update_transformers(self, column_name_to_transformer)
299 if set(column_name_to_transformer).intersection(set(self.context_columns)):
300 raise SynthesizerInputError(
301 'Transformers for context columns are not allowed to be updated.')
--> 303 super().update_transformers(column_name_to_transformer)
File /opt/conda/lib/python3.10/site-packages/sdv/single_table/base.py:228, in BaseSynthesizer.update_transformers(self, column_name_to_transformer)
226 self._validate_transformers(column_name_to_transformer)
227 self._warn_for_update_transformers(column_name_to_transformer)
--> 228 self._data_processor.update_transformers(column_name_to_transformer)
229 if self._fitted:
230 msg = 'For this change to take effect, please refit the synthesizer using `fit`.'
File /opt/conda/lib/python3.10/site-packages/sdv/data_processing/data_processor.py:652, in DataProcessor.update_transformers(self, column_name_to_transformer)
646 raise NotFittedError(
647 'The DataProcessor must be prepared for fitting before the transformers can be '
648 'updated.'
649 )
651 for column, transformer in column_name_to_transformer.items():
--> 652 if column in self._keys and not transformer.is_generator():
653 raise SynthesizerInputError(
654 f"Invalid transformer '{transformer.__class__.__name__}' for a primary "
655 f"or alternate key '{column}'. Please use a generator transformer instead."
656 )
658 with warnings.catch_warnings(): |
Hi @ryantimjohn no problem. I suspect this is unrelated to to the |
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
After #2043, we fixed an issue where
enforce_min_max_values
was by default being set toTrue
for the sequence_index transformer. However, if no transformer is assigned to the sequence_index (i.e. if the sequence is already a numerical sdtype), fit now errors.To fix, we should check that (1) a transformer has been assigned (transformer is not
None
) and (2) that the transformer has theenforce_min_max_values
attribute (instead of adding an additional check, we could use getattr with aFalse
default value in place of directly accessing the attribute)Steps to reproduce
The text was updated successfully, but these errors were encountered: