-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
- implementing the spec change in #1238
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1238 +/- ##
===============================================
+ Coverage 69.323% 69.676% +0.353%
- Complexity 8105 8321 +216
===============================================
Files 543 547 +4
Lines 32480 33069 +589
Branches 5501 5685 +184
===============================================
+ Hits 22516 23041 +525
- Misses 7755 7803 +48
- Partials 2209 2225 +16
|
* to mean "same reference as RNAME field." | ||
*/ | ||
public static final String RESERVED_MRNM_SEQUENCE_NAME = "="; | ||
public static final String RESERVED_RNEXT_SEQUENCE_NAME = "="; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deprecate this instead of just renaming it
@@ -70,10 +70,12 @@ | |||
SPECIES_TAG)); | |||
|
|||
// Split on any whitespace | |||
private static final Pattern SEQUENCE_NAME_SPLITTER = Pattern.compile("\\s"); | |||
private static final Pattern SEQUENCE_NAME_SPLITTER = Pattern.compile("[\\s]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete this if unused
// These are the chars matched by \\s. | ||
private static final char[] WHITESPACE_CHARS = {' ', '\t', '\n', '\013', '\f', '\r'}; // \013 is vertical tab | ||
|
||
private static Pattern LEGAL_RNAME_PATTERN = Pattern.compile("[0-9A-Za-z!#$%&+./:;?@^_|~-][0-9A-Za-z!#$%&*+./:;=?@^_|~-]*"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this final
@@ -85,8 +87,8 @@ public SAMSequenceRecord(final String name) { | |||
|
|||
public SAMSequenceRecord(final String name, final int sequenceLength) { | |||
if (name != null) { | |||
if (SEQUENCE_NAME_SPLITTER.matcher(name).find()) { | |||
throw new SAMException("Sequence name contains invalid character: " + name); | |||
if (!LEGAL_RNAME_PATTERN.matcher(name).useAnchoringBounds(true).matches()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this into the validate method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SEQUENCE_NAME_SPLITTER
in both code and comments can be removed correct? I still see it elsewhere.
I notice that this enforces RNAME compliance with the regexp regardless of any declared @HD-VN. Did HTSJDK consider making this check dependent on the file's VN header and allowing such RNAMEs in existing files with HD-VN≤1.5 or so? There was discussion in samtools/hts-specs#333 (comment) that it seems important not to gratuitously reject existing data files (if any…) while allowing for diagnostics on newly-produced files. |
hmmm. this is a good point. will look into making is 1.5 and on only (not sure how....) |
@jmarshall, we have decided (in htsjdk) to implement a blanket rejection of non-compliant sam files. if this turns out to be an undue burden on our users, we will consider putting to work to make htsjdk more version-aware. right now we are working on the next version which should include more version-aware features and it should be easier to do such a thing there...I still think that we were right to keep the SPEC version specific and not declare all old SAM files as non-compliant. |
Fair enough. Please take a look at the text added in that new commit on samtools/hts-specs#333 too 😄 |
samtools/hts-specs#333