Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcriptomics Digestion and Fragmentation #801

Merged
merged 19 commits into from
Oct 15, 2024

Conversation

nbollis
Copy link
Member

@nbollis nbollis commented Sep 19, 2024

Implemented the derived calasses of the Omics abstracts and interfaces for RNA

Copy link

codecov bot commented Sep 19, 2024

Codecov Report

Attention: Patch coverage is 92.78261% with 83 lines in your changes missing coverage. Please review.

Project coverage is 76.48%. Comparing base (983c3b0) to head (04f7e67).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...zLib/Transcriptomics/Digestion/OligoWithSetMods.cs 87.78% 20 Missing and 7 partials ⚠️
mzLib/UsefulProteomicsDatabases/ProteinDbWriter.cs 88.23% 9 Missing and 3 partials ⚠️
mzLib/Transcriptomics/NucleicAcid.cs 92.08% 6 Missing and 5 partials ⚠️
...ProteomicsDatabases/Transcriptomics/RnaDbLoader.cs 93.71% 3 Missing and 7 partials ⚠️
mzLib/UsefulProteomicsDatabases/ProteinXmlEntry.cs 79.54% 6 Missing and 3 partials ⚠️
mzLib/Transcriptomics/ClassExtensions.cs 93.84% 1 Missing and 3 partials ⚠️
...zLib/Transcriptomics/Digestion/NucleolyticOligo.cs 96.39% 1 Missing and 3 partials ⚠️
...roteolyticDigestion/PeptideWithSetModifications.cs 0.00% 3 Missing ⚠️
...micsDatabases/Transcriptomics/RnaDecoyGenerator.cs 94.44% 1 Missing and 1 partial ⚠️
...UsefulProteomicsDatabases/FastaHeaderFieldRegex.cs 88.88% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #801      +/-   ##
==========================================
+ Coverage   75.52%   76.48%   +0.96%     
==========================================
  Files         202      211       +9     
  Lines       30945    31960    +1015     
  Branches     3129     3286     +157     
==========================================
+ Hits        23371    24446    +1075     
+ Misses       7040     6954      -86     
- Partials      534      560      +26     
Files with missing lines Coverage Δ
mzLib/Chemistry/ClassExtensions.cs 100.00% <100.00%> (ø)
mzLib/MzLibUtil/MzLibException.cs 100.00% <100.00%> (ø)
.../Fragmentation/Oligo/DissociationTypeCollection.cs 100.00% <100.00%> (+100.00%) ⬆️
...ragmentation/Oligo/TerminusSpecificProductTypes.cs 100.00% <100.00%> (ø)
mzLib/Omics/IBioPolymerWithSetMods.cs 95.23% <ø> (ø)
...ib/Transcriptomics/Digestion/RnaDigestionParams.cs 100.00% <100.00%> (ø)
mzLib/Transcriptomics/Digestion/Rnase.cs 100.00% <100.00%> (ø)
mzLib/Transcriptomics/RNA.cs 100.00% <100.00%> (ø)
mzLib/UsefulProteomicsDatabases/ProteinDbLoader.cs 95.40% <ø> (ø)
...UsefulProteomicsDatabases/FastaHeaderFieldRegex.cs 90.00% <88.88%> (-0.91%) ⬇️
... and 9 more

... and 4 files with indirect coverage changes

Copy link
Contributor

@Alexander-Sol Alexander-Sol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor quibbles, but a couple things that were confusing enough that I don't want to approve just yet

  1. NucleicAcid.Equals() doesn't check the sequence
  2. NucleicAcid.ParseSequence() does several things that I wouldn't suspect based on the name
  3. OligoWithSetMods termini setters modify multiple different fields.

mzLib/Transcriptomics/NucleicAcid.cs Show resolved Hide resolved
mzLib/Transcriptomics/NucleicAcid.cs Outdated Show resolved Hide resolved
mzLib/Transcriptomics/NucleicAcid.cs Outdated Show resolved Hide resolved
mzLib/Transcriptomics/NucleicAcid.cs Outdated Show resolved Hide resolved
mzLib/Transcriptomics/Digestion/NucleolyticOligo.cs Outdated Show resolved Hide resolved
Nic Bollis added 5 commits September 24, 2024 14:30
Introduced support for handling RNA data within the UsefulProteomicsDatabases project. Key changes include:

- Added `Transcriptomics\TestData` folder to `Test.csproj`.
- Changed access modifiers in `ProteinDbLoader.cs` to internal.
- Added `using` directives for `Transcriptomics` in `ProteinXmlEntry.cs`.
- Introduced methods `ParseRnaEndElement` and `ParseRnaEntryEndElement` in `ProteinXmlEntry.cs`.
- Modified `ParseAnnotatedMods` to check for RNA modifications.
- Added project reference to `Transcriptomics.csproj` in `UsefulProteomicsDatabases.csproj`.
- Added `ClassExtensions.cs` with `CreateNew` method for nucleic acids.
- Added `RnaDbLoader.cs` for RNA database loading.
- Added `RnaDecoyGenerator.cs` for generating decoy RNA sequences.
Updated `using` directives in `TestDigestion.cs` and `OligoWithSetMods.cs` to include necessary namespaces. Added assertions in `TestDigestion.cs` for `SequenceWithChemicalFormulas` and `FullSequenceWithMassShift`. Changed `namespace` in `OligoWithSetMods.cs` to `Transcriptomics.Digestion`. Implemented and cached `SequenceWithChemicalFormulas` property in `OligoWithSetMods.cs`.
- Added new files `ModomicsUnmodifiedTrimmed.fasta` and `ModomicsUnmodifiedTrimmed.fasta.gz` to `Test.csproj` with `CopyToOutputDirectory` set to `PreserveNewest`.
- Removed the `Transcriptomics\TestData` folder from `Test.csproj`.
- Introduced `Transcribe` method in `ClassExtensions.cs` for DNA to RNA transcription.
- Added summary comment to `NucleolyticOligo` class in `NucleolyticOligo.cs`.
- Added `ApplyRegex` method in `FastaHeaderFieldRegex.cs`.
- Introduced `ProteinDbWriter` class in `ProteinDbWriter.cs` for writing protein and nucleic acid databases.
- Modified `GetModsForThisProtein` to `GetModsForThisBioPolymer` in `ProteinDbWriter.cs`.
- Added `RnaDbLoader` class in `RnaDbLoader.cs` for RNA FASTA header detection and sequence loading.
- Updated user dictionary in `mzLib.sln.DotSettings` with new terms.
- Added test cases in `TestDbLoader.cs` for RNA database loading and header detection.
- Introduced `TestDecoyGeneration` class in `TestDecoyGenerator.cs` for RNA decoy generation tests.
- Added RNA sequence file `ModomicsUnmodifiedTrimmed.fasta` and its compressed version.
- Added `using` directives for `Transcriptomics.Digestion` and `UsefulProteomicsDatabases.Transcriptomics` in `TestDecoyGenerator.cs`.
- Introduced `TestCreateNew` in `TestDecoyGenerator.cs` to verify RNA and oligo creation.
- Added `using` directive for `MzLibUtil` in `TestDigestion.cs`.
- Added a test in `TestDigestion.cs` for exception handling with invalid sequences.
- Added `using` directives for `Omics` and related namespaces in `TestFragmentation.cs`.
- Modified `TestFragmentation_Modified` in `TestFragmentation.cs` to use `OligoWithSetMods` directly and added assertions.
- Updated `ClassExtensions.cs` to allow setting `isDecoy` in new `RNA` objects.
- Refactored `OligoWithSetMods.cs` to return a dictionary from `GetModsAfterDeserialization`.
- Updated `OligoWithSetMods.cs` to initialize `_allModsOneIsNterminus` using the returned dictionary.
Copy link
Contributor

@trishorts trishorts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good. resolved all my previous comments except 1. added 2 or 3 methods that could/should be unit tested. I don't see anything problematic.

Refactored constructors, improved exception handling, and added comprehensive tests across multiple files. Key changes include:

- `MzLibException.cs`: Updated constructor to include `innerException`.
- `TestDecoyGenerator.cs`: Added assertions for `CreateNew` method.
- `TestDigestion.cs`: Added assertions and new test for RNA digestion exception.
- Refactored modification lists and added various tests for modifications.
- `TestNucleicAcid.cs`: Refactored methods, adjusted precision, and updated terminus assignments.
- `NucleolyticOligo.cs`: Changed parameter types, updated comments, and improved variable names.
- `OligoWithSetMods.cs`: Enhanced exception messages and updated modification location checks.
- `NucleicAcid.cs`: Added `using` directive, changed exception type, and refactored methods.
- `mzLib.sln.DotSettings`: Updated user dictionary entries.
Added new test data files (`20mer1.fasta`, `20mer1.fasta.gz`, `20mer1.xml`, `20mer1.xml.gz`) to the `Transcriptomics\TestData` directory in the `Test.csproj` file, ensuring they are copied to the output directory. Introduced `TestDbReadingDifferentExtensions` in `TestDbLoader.cs` to verify RNA database reading from various formats. Added `TestDigestionMaxIsoforms` in `TestDigestion.cs` to test RNA sequence digestion with max isoforms. Updated `WriteNucleicAcidXmlDatabase` in `ProteinDbWriter.cs` with remarks for future implementation. Added a TODO in `RnaDecoyGenerator.cs` regarding palindromic sequences' impact on fragment ions. Included new RNA sequence data in test files for validation.
Alexander-Sol
Alexander-Sol previously approved these changes Oct 3, 2024
Copy link
Contributor

@Alexander-Sol Alexander-Sol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, all comments have been addressed

@elaboy elaboy self-requested a review October 15, 2024 21:30
@nbollis nbollis merged commit 6c18e9f into smith-chem-wisc:master Oct 15, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants