Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support nullable foreign keys in HMA #2063

Closed
rwedge opened this issue Jun 13, 2024 · 0 comments
Closed

Support nullable foreign keys in HMA #2063

rwedge opened this issue Jun 13, 2024 · 0 comments
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@rwedge
Copy link
Contributor

rwedge commented Jun 13, 2024

Problem Description

HMASynthesizer does not currently support null values in foreign key columns. Adding the ability to handle null values for foreign keys would expand the range of datasets HMA can model.

Expected behavior

HMASynthesizer is able to fit on data that contains null values in foreign key columns and the presence of nulls is reflected in the sampled data.

Additional context

Changes to the fit process:
When generating extension columns for a child table, treat null as a valid foreign key value and calculate the extension row values for a null parent. Store these null parent extension values separately from the parent table but still have them retrievable for sampling.

Changes to sampling:
When creating a child table, HMA should leave some rows to be generated using the null parent's child synthesizer based on the percentage of null foreign keys in the relationship being used to create the child table (perhaps this could be handled in _enforce_table_size).
When finding parent_ids of other foreign key columns on a child table, treat the stored null parent extension row as another parent candidate and create a corresponding synthesizer to get likelihoods from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants