You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to generate a synthetic data that I can append to my original data. Is there a way to exclude the primary keys that is already in the original data when generating synthetic data?
I want to generate a large number of data, but I have limited resources so I am planning to generate small sets of data and just append it to my original data.
The text was updated successfully, but these errors were encountered:
Is there a way to exclude the primary keys that is already in the original data when generating synthetic data?
This isn't explicitly supported, so I've filed #697 for tracking progress.
In the meantime, have found that the default work for most cases: If your primary key is a string, the SDV will generate 'a', 'b', 'c', etc. by default. (If it's numerical, it'll generate 0, 1, 2,...). Is this causing conflicts for you?
Other options:
Write metadata and specify a different regex for the primary key (see guide)
Manually overwrite the column after sampling with whatever you want
I want to generate a large number of data, but I have limited resources so I am planning to generate small sets of data and just append it to my original data.
I have filed #693 for handling batch sampling internally -- so you would not have to do this manually. You can follow that issue for updates.
Let me know if that helps or if you have any follow ups!
Hi @npatki! Thank you very much for the reply. This is really helpful.
For the meantime what I did was overwrite the column after sampling and create mapping for the foreign keys. But I will also try the first option you suggested. And will also wait and follow the issue you filed.
I want to generate a synthetic data that I can append to my original data. Is there a way to exclude the primary keys that is already in the original data when generating synthetic data?
I want to generate a large number of data, but I have limited resources so I am planning to generate small sets of data and just append it to my original data.
The text was updated successfully, but these errors were encountered: