You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you read data into a pandas dataframe, datetime columns often are (by default) set to the object (aka string) dtype. The user then must take some action to make SDV synthesize better datetime values:
user can manually cast the column to datetime dtype
user can set a datetime_format when creating the metadata for SDV to use
It's easy for someone (especially a new user) to skip this step entirely, causing issues later.
Suggested Warning
After discussing with Neha, I'm opening this feature request. Ideally, we can add a warning when the user tries to make progress with SDV (e.g. maybe training a synthesizer) that they should add a datetime_format.
The text was updated successfully, but these errors were encountered:
Note that metadata auto-detection will generally pick up a datetime_format for most common cases. But there are other ways of creating metadata, for example manually writing a Python dict or JSON file. In such cases, there may not be a datetime_format.
This warning should only appear when:
In metadata, thesdtype is 'datetime' AND
In metadata, there is no datetime_format specified AND
In the data, the dtype (storage type) is 'object'
Suggested API
This warning should result from metadata.validate_data function. Ideally the warning can pretty-print a list of columns to watch out for.
>>> metadata.validate_data(data)
Warning: No 'datetime_format' is present in the metadata for the following columns:
Table Name Column Name sdtype datetime_format
users start_date datetime None
users end_date datetime None
sessions timestamp datetime None
Without this specification, SDV may not be able to accurately parse the data. We recommend adding datetime formats using 'update_column'.
Note: For single table, we can exclude the Table Name.
Situation
If you read data into a pandas dataframe, datetime columns often are (by default) set to the object (aka string) dtype. The user then must take some action to make SDV synthesize better datetime values:
datetime_format
when creating the metadata for SDV to useIt's easy for someone (especially a new user) to skip this step entirely, causing issues later.
Suggested Warning
After discussing with Neha, I'm opening this feature request. Ideally, we can add a warning when the user tries to make progress with SDV (e.g. maybe training a synthesizer) that they should add a
datetime_format
.The text was updated successfully, but these errors were encountered: