Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output VRS_Error INFO field for VCF sites where VRS computation fails #301

Merged
merged 8 commits into from
Dec 21, 2023

Conversation

ehclark
Copy link
Contributor

@ehclark ehclark commented Dec 20, 2023

Update the implementation that annotates a VCF with VRS IDs such that when an error occurs, a VRS_Error INFO field is added to the site instead of the VRS INFO simply being left blank.

The existing implementation caught and suppressed many exceptions at the translation step. These exceptions are still caught and logged, but then re-thrown to propagate to the top-level VCF site iterator.

With the new behavior, every VCF site will be modified according to the following:

  • If an error occurs processing the site, a VRS_Error INFO field will be added with the relevant error message
  • Otherwise a VRS_Allele_IDs INFO field will be added with the VRS ID for the alleles supported by the available translators and blank for those alleles not supported

Percent, newline, tab, semicolon and comma characters in the error messages are escaped with URL (percent) encoding according to the VCF spec.

The unit tests for VCF annotation have been split out into individual tests to make modifications to the VCR cassette fixtures easier to make and debug. Additional VCR configuration and assertions have been added to enforce a requirement that the cassettes must match exactly what is expected.

See also #297

Add try/except to top-level VCF record iterator
Add error message generation to except block
Re-raise errors in _get_vrs_object()
Separated out each test to make fixing errors easier
Tightened passing criteria by asserting "all played" on the cassette
@ehclark ehclark requested review from a team as code owners December 20, 2023 14:35
Copy link
Contributor

@larrybabb larrybabb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@larrybabb
Copy link
Contributor

@korikuzma let me know if you have time to review and merge or if I should go ahead and do it.

@korikuzma
Copy link
Contributor

@larrybabb I'll take a look now.

Copy link
Contributor

@korikuzma korikuzma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good to me. I made a small comment that can be accepted or rejected

Comment on lines 140 to 149
VCF_ESCAPE_MAP = {
"%": "%25",
":": "%3A",
";": "%3B",
"=": "%3D",
",": "%2C",
"\r": "%0D",
"\n": "%0A",
"\t": "%09"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:, =, and \r are not used. Can we remove them? If so, then instead of

                for search in [ "%", ",", ";", "\t", "\n" ]:
                     err_msg = err_msg.replace(search, self.VCF_ESCAPE_MAP[search])

we can do

    VCF_ESCAPE_MAP = {
        "%": "%25",
        ";": "%3B",
        ",": "%2C",
        "\n": "%0A",
        "\t": "%09" 
    }
    VCF_ESCAPE_MAP_KEYS = VCF_ESCAPE_MAP.keys()
...
                for search in VCF_ESCAPE_MAP_KEYS:
                     err_msg = err_msg.replace(search, self.VCF_ESCAPE_MAP[search])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehclark do you agree with this feedback? If so, please submit the change if not, please explain why and I'll do the merge.

@korikuzma
Copy link
Contributor

@larrybabb You're good to merge

@larrybabb larrybabb merged commit 843422e into 2-alpha Dec 21, 2023
8 checks passed
@korikuzma korikuzma deleted the feature/vcf-vrs-error-field branch January 9, 2024 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VCFAnnotator adds an error message in an INFO field for sites where VRS ID computation fails
3 participants