Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

- implementing the spec change in #1238

Merged
merged 3 commits into from
Dec 10, 2018

Conversation

yfarjoun
Copy link
Contributor

@yfarjoun yfarjoun commented Dec 5, 2018

@codecov-io
Copy link

codecov-io commented Dec 5, 2018

Codecov Report

Merging #1238 into master will increase coverage by 0.353%.
The diff coverage is 100%.

@@               Coverage Diff               @@
##              master     #1238       +/-   ##
===============================================
+ Coverage     69.323%   69.676%   +0.353%     
- Complexity      8105      8321      +216     
===============================================
  Files            543       547        +4     
  Lines          32480     33069      +589     
  Branches        5501      5685      +184     
===============================================
+ Hits           22516     23041      +525     
- Misses          7755      7803       +48     
- Partials        2209      2225       +16
Impacted Files Coverage Δ Complexity Δ
...c/main/java/htsjdk/samtools/SAMSequenceRecord.java 79.167% <100%> (-0.563%) 35 <2> (-1)
...samtools/cram/compression/ExternalCompression.java 79.661% <0%> (-8.91%) 17% <0%> (+6%)
...a/htsjdk/samtools/cram/build/ContainerFactory.java 95.385% <0%> (-2.441%) 38% <0%> (+18%)
...n/java/htsjdk/samtools/cram/structure/SliceIO.java 87.209% <0%> (-1.68%) 19% <0%> (+4%)
...s/cram/structure/block/BlockCompressionMethod.java 100% <0%> (ø) 4% <0%> (?)
...amtools/cram/structure/block/BlockContentType.java 100% <0%> (ø) 4% <0%> (?)
...va/htsjdk/samtools/cram/structure/block/Block.java 75% <0%> (ø) 22% <0%> (?)
...k/samtools/cram/structure/block/ExternalBlock.java 100% <0%> (ø) 2% <0%> (?)
src/main/java/htsjdk/samtools/SAMRecord.java 67.795% <0%> (+0.117%) 315% <0%> (+1%) ⬆️
src/main/java/htsjdk/samtools/SAMUtils.java 59.848% <0%> (+0.505%) 126% <0%> (+1%) ⬆️
... and 10 more

* to mean "same reference as RNAME field."
*/
public static final String RESERVED_MRNM_SEQUENCE_NAME = "=";
public static final String RESERVED_RNEXT_SEQUENCE_NAME = "=";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deprecate this instead of just renaming it

@@ -70,10 +70,12 @@
SPECIES_TAG));

// Split on any whitespace
private static final Pattern SEQUENCE_NAME_SPLITTER = Pattern.compile("\\s");
private static final Pattern SEQUENCE_NAME_SPLITTER = Pattern.compile("[\\s]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this if unused

// These are the chars matched by \\s.
private static final char[] WHITESPACE_CHARS = {' ', '\t', '\n', '\013', '\f', '\r'}; // \013 is vertical tab

private static Pattern LEGAL_RNAME_PATTERN = Pattern.compile("[0-9A-Za-z!#$%&+./:;?@^_|~-][0-9A-Za-z!#$%&*+./:;=?@^_|~-]*");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this final

@@ -85,8 +87,8 @@ public SAMSequenceRecord(final String name) {

public SAMSequenceRecord(final String name, final int sequenceLength) {
if (name != null) {
if (SEQUENCE_NAME_SPLITTER.matcher(name).find()) {
throw new SAMException("Sequence name contains invalid character: " + name);
if (!LEGAL_RNAME_PATTERN.matcher(name).useAnchoringBounds(true).matches()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this into the validate method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SEQUENCE_NAME_SPLITTER in both code and comments can be removed correct? I still see it elsewhere.

@lbergelson lbergelson merged commit 8cc1e37 into master Dec 10, 2018
@lbergelson lbergelson deleted the yf_tighten_sequence_name_requirements branch December 10, 2018 20:50
@jmarshall
Copy link
Member

I notice that this enforces RNAME compliance with the regexp regardless of any declared @HD-VN.

Did HTSJDK consider making this check dependent on the file's VN header and allowing such RNAMEs in existing files with HD-VN≤1.5 or so? There was discussion in samtools/hts-specs#333 (comment) that it seems important not to gratuitously reject existing data files (if any…) while allowing for diagnostics on newly-produced files.

@yfarjoun
Copy link
Contributor Author

hmmm. this is a good point. will look into making is 1.5 and on only (not sure how....)

@yfarjoun
Copy link
Contributor Author

@jmarshall, we have decided (in htsjdk) to implement a blanket rejection of non-compliant sam files. if this turns out to be an undue burden on our users, we will consider putting to work to make htsjdk more version-aware. right now we are working on the next version which should include more version-aware features and it should be easier to do such a thing there...I still think that we were right to keep the SPEC version specific and not declare all old SAM files as non-compliant.

@jmarshall
Copy link
Member

Fair enough. Please take a look at the text added in that new commit on samtools/hts-specs#333 too 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants