You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HMASynthesizer does not currently support null values in foreign key columns. Adding the ability to handle null values for foreign keys would expand the range of datasets HMA can model.
Expected behavior
HMASynthesizer is able to fit on data that contains null values in foreign key columns and the presence of nulls is reflected in the sampled data.
Additional context
Changes to the fit process:
When generating extension columns for a child table, treat null as a valid foreign key value and calculate the extension row values for a null parent. Store these null parent extension values separately from the parent table but still have them retrievable for sampling.
Changes to sampling:
When creating a child table, HMA should leave some rows to be generated using the null parent's child synthesizer based on the percentage of null foreign keys in the relationship being used to create the child table (perhaps this could be handled in _enforce_table_size).
When finding parent_ids of other foreign key columns on a child table, treat the stored null parent extension row as another parent candidate and create a corresponding synthesizer to get likelihoods from.
The text was updated successfully, but these errors were encountered:
Problem Description
HMASynthesizer does not currently support null values in foreign key columns. Adding the ability to handle null values for foreign keys would expand the range of datasets HMA can model.
Expected behavior
HMASynthesizer is able to fit on data that contains null values in foreign key columns and the presence of nulls is reflected in the sampled data.
Additional context
Changes to the fit process:
When generating extension columns for a child table, treat null as a valid foreign key value and calculate the extension row values for a null parent. Store these null parent extension values separately from the parent table but still have them retrievable for sampling.
Changes to sampling:
When creating a child table, HMA should leave some rows to be generated using the null parent's child synthesizer based on the percentage of null foreign keys in the relationship being used to create the child table (perhaps this could be handled in
_enforce_table_size
).When finding parent_ids of other foreign key columns on a child table, treat the stored null parent extension row as another parent candidate and create a corresponding synthesizer to get likelihoods from.
The text was updated successfully, but these errors were encountered: