Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add final dataset name to the processing_history table after bq-ingest runs #233

Open
theferrit32 opened this issue Oct 10, 2024 · 0 comments
Assignees

Comments

@theferrit32
Copy link
Contributor

Add a column to processing_history to store the name of the final BQ dataset a set of pipeline output files was loaded into.

e.g. ingesting clinvar_vcv_2024_10_10_kyle_dev and clinvar_rcv_2024_10_10_kyle_dev each with xml_release_date=2024-10-10 creates a dataset called clinvar_2024_10_10_kyle_dev. This is included in the slack message but not persisted into the processing_history table. It can be inferred using the same logic that was used to decide it in bq-ingest, but persisting it in a column makes it easy to lookup from queries.

Store it both in the vcv and the rcv row.

This field can be set during the same UPDATE that runs to set the release_date field at the end of bq-ingest.

    UPDATE clingen-dev.clinvar_kyle.processing_history
    SET release_date = '2024-10-10',
        final_dataset = 'clinvar_2024_10_10_kyle_dev'
    WHERE file_type = 'rcv'
    AND pipeline_version = 'kyle_dev'
    AND xml_release_date = '2024-10-10'
    AND bucket_dir = 'clinvar_rcv_2024_10_10_kyle_dev
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant