Add final dataset name to the processing_history table after bq-ingest runs #233

theferrit32 · 2024-10-10T17:00:56Z

Add a column to processing_history to store the name of the final BQ dataset a set of pipeline output files was loaded into.

e.g. ingesting clinvar_vcv_2024_10_10_kyle_dev and clinvar_rcv_2024_10_10_kyle_dev each with xml_release_date=2024-10-10 creates a dataset called clinvar_2024_10_10_kyle_dev. This is included in the slack message but not persisted into the processing_history table. It can be inferred using the same logic that was used to decide it in bq-ingest, but persisting it in a column makes it easy to lookup from queries.

Store it both in the vcv and the rcv row.

This field can be set during the same UPDATE that runs to set the release_date field at the end of bq-ingest.

    UPDATE clingen-dev.clinvar_kyle.processing_history
    SET release_date = '2024-10-10',
        final_dataset = 'clinvar_2024_10_10_kyle_dev'
    WHERE file_type = 'rcv'
    AND pipeline_version = 'kyle_dev'
    AND xml_release_date = '2024-10-10'
    AND bucket_dir = 'clinvar_rcv_2024_10_10_kyle_dev

The text was updated successfully, but these errors were encountered:

theferrit32 self-assigned this Oct 15, 2024

theferrit32 added a commit that referenced this issue Nov 26, 2024

Close #233, add final dataset to processing_history

4e033c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add final dataset name to the processing_history table after bq-ingest runs #233

Add final dataset name to the processing_history table after bq-ingest runs #233

theferrit32 commented Oct 10, 2024

Add final dataset name to the processing_history table after bq-ingest runs #233

Add final dataset name to the processing_history table after bq-ingest runs #233

Comments

theferrit32 commented Oct 10, 2024