-
Notifications
You must be signed in to change notification settings - Fork 33
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Transcriptomics Digestion and Fragmentation (#801)
* Added in base classes * Implemented all tests * Made initial tests pass * Removed unnecessary namespaces * Expanded test coverage * Responded to Alex Comments * Add RNA support: loading, parsing, and decoy generation Introduced support for handling RNA data within the UsefulProteomicsDatabases project. Key changes include: - Added `Transcriptomics\TestData` folder to `Test.csproj`. - Changed access modifiers in `ProteinDbLoader.cs` to internal. - Added `using` directives for `Transcriptomics` in `ProteinXmlEntry.cs`. - Introduced methods `ParseRnaEndElement` and `ParseRnaEntryEndElement` in `ProteinXmlEntry.cs`. - Modified `ParseAnnotatedMods` to check for RNA modifications. - Added project reference to `Transcriptomics.csproj` in `UsefulProteomicsDatabases.csproj`. - Added `ClassExtensions.cs` with `CreateNew` method for nucleic acids. - Added `RnaDbLoader.cs` for RNA database loading. - Added `RnaDecoyGenerator.cs` for generating decoy RNA sequences. * Add new properties and caching to oligo digestion Updated `using` directives in `TestDigestion.cs` and `OligoWithSetMods.cs` to include necessary namespaces. Added assertions in `TestDigestion.cs` for `SequenceWithChemicalFormulas` and `FullSequenceWithMassShift`. Changed `namespace` in `OligoWithSetMods.cs` to `Transcriptomics.Digestion`. Implemented and cached `SequenceWithChemicalFormulas` property in `OligoWithSetMods.cs`. * Add RNA sequence and database handling and related test cases - Added new files `ModomicsUnmodifiedTrimmed.fasta` and `ModomicsUnmodifiedTrimmed.fasta.gz` to `Test.csproj` with `CopyToOutputDirectory` set to `PreserveNewest`. - Removed the `Transcriptomics\TestData` folder from `Test.csproj`. - Introduced `Transcribe` method in `ClassExtensions.cs` for DNA to RNA transcription. - Added summary comment to `NucleolyticOligo` class in `NucleolyticOligo.cs`. - Added `ApplyRegex` method in `FastaHeaderFieldRegex.cs`. - Introduced `ProteinDbWriter` class in `ProteinDbWriter.cs` for writing protein and nucleic acid databases. - Modified `GetModsForThisProtein` to `GetModsForThisBioPolymer` in `ProteinDbWriter.cs`. - Added `RnaDbLoader` class in `RnaDbLoader.cs` for RNA FASTA header detection and sequence loading. - Updated user dictionary in `mzLib.sln.DotSettings` with new terms. - Added test cases in `TestDbLoader.cs` for RNA database loading and header detection. - Introduced `TestDecoyGeneration` class in `TestDecoyGenerator.cs` for RNA decoy generation tests. - Added RNA sequence file `ModomicsUnmodifiedTrimmed.fasta` and its compressed version. * Refactor and enhance RNA and oligo handling in tests - Added `using` directives for `Transcriptomics.Digestion` and `UsefulProteomicsDatabases.Transcriptomics` in `TestDecoyGenerator.cs`. - Introduced `TestCreateNew` in `TestDecoyGenerator.cs` to verify RNA and oligo creation. - Added `using` directive for `MzLibUtil` in `TestDigestion.cs`. - Added a test in `TestDigestion.cs` for exception handling with invalid sequences. - Added `using` directives for `Omics` and related namespaces in `TestFragmentation.cs`. - Modified `TestFragmentation_Modified` in `TestFragmentation.cs` to use `OligoWithSetMods` directly and added assertions. - Updated `ClassExtensions.cs` to allow setting `isDecoy` in new `RNA` objects. - Refactored `OligoWithSetMods.cs` to return a dictionary from `GetModsAfterDeserialization`. - Updated `OligoWithSetMods.cs` to initialize `_allModsOneIsNterminus` using the returned dictionary. * Broke out TerminusSpecificProductTypes class and removed unnecessary namespaces * Update ProteinXmlEntry.cs * Added gene name to RNA constructore * Added gene name to RNA constructore * Refactor and enhance exception handling and tests Refactored constructors, improved exception handling, and added comprehensive tests across multiple files. Key changes include: - `MzLibException.cs`: Updated constructor to include `innerException`. - `TestDecoyGenerator.cs`: Added assertions for `CreateNew` method. - `TestDigestion.cs`: Added assertions and new test for RNA digestion exception. - Refactored modification lists and added various tests for modifications. - `TestNucleicAcid.cs`: Refactored methods, adjusted precision, and updated terminus assignments. - `NucleolyticOligo.cs`: Changed parameter types, updated comments, and improved variable names. - `OligoWithSetMods.cs`: Enhanced exception messages and updated modification location checks. - `NucleicAcid.cs`: Added `using` directive, changed exception type, and refactored methods. - `mzLib.sln.DotSettings`: Updated user dictionary entries. * Add test data files and methods for RNA sequence handling Added new test data files (`20mer1.fasta`, `20mer1.fasta.gz`, `20mer1.xml`, `20mer1.xml.gz`) to the `Transcriptomics\TestData` directory in the `Test.csproj` file, ensuring they are copied to the output directory. Introduced `TestDbReadingDifferentExtensions` in `TestDbLoader.cs` to verify RNA database reading from various formats. Added `TestDigestionMaxIsoforms` in `TestDigestion.cs` to test RNA sequence digestion with max isoforms. Updated `WriteNucleicAcidXmlDatabase` in `ProteinDbWriter.cs` with remarks for future implementation. Added a TODO in `RnaDecoyGenerator.cs` regarding palindromic sequences' impact on fragment ions. Included new RNA sequence data in test files for validation. * Added test coverage to the localize method within BioPolymerWithSetMods --------- Co-authored-by: Nic Bollis <nbollis@wisc.edu>
- Loading branch information
Showing
40 changed files
with
4,678 additions
and
62 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,19 +1,12 @@ | ||
using System; | ||
using System.Collections.Generic; | ||
using System.Linq; | ||
using System.Text; | ||
using System.Threading.Tasks; | ||
|
||
namespace Omics.Fragmentation | ||
namespace Omics.Fragmentation | ||
{ | ||
public enum FragmentationTerminus | ||
{ | ||
Both, //N- and C-terminus | ||
N, //N-terminus only | ||
C, //C-terminus only | ||
{ | ||
Both, //N- and C-terminus | ||
N, //N-terminus only | ||
C, //C-terminus only | ||
None, //used for internal fragments, could be used for top down intact mass? | ||
FivePrime, // 5' for NucleicAcids | ||
ThreePrime, // 3' for NucleicAcids | ||
} | ||
|
||
} | ||
} |
162 changes: 161 additions & 1 deletion
162
mzLib/Omics/Fragmentation/Oligo/DissociationTypeCollection.cs
Large diffs are not rendered by default.
Oops, something went wrong.
141 changes: 141 additions & 0 deletions
141
mzLib/Omics/Fragmentation/Oligo/TerminusSpecificProductTypes.cs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,141 @@ | ||
using System; | ||
using System.Collections.Generic; | ||
using System.Linq; | ||
using System.Text; | ||
using System.Threading.Tasks; | ||
|
||
namespace Omics.Fragmentation.Oligo | ||
{ | ||
public static class TerminusSpecificProductTypes | ||
{ | ||
public static List<ProductType> GetRnaTerminusSpecificProductTypes( | ||
this FragmentationTerminus fragmentationTerminus) | ||
{ | ||
return ProductIonTypesFromSpecifiedTerminus[fragmentationTerminus]; | ||
} | ||
|
||
/// <summary> | ||
/// The types of ions that can be generated from an oligo fragment, based on the terminus of the fragment | ||
/// </summary> | ||
public static Dictionary<FragmentationTerminus, List<ProductType>> ProductIonTypesFromSpecifiedTerminus = new Dictionary<FragmentationTerminus, List<ProductType>> | ||
{ | ||
{ | ||
FragmentationTerminus.FivePrime, new List<ProductType> | ||
{ | ||
ProductType.a, ProductType.aWaterLoss, ProductType.aBaseLoss, | ||
ProductType.b, ProductType.bWaterLoss, ProductType.bBaseLoss, | ||
ProductType.c, ProductType.cWaterLoss, ProductType.cBaseLoss, | ||
ProductType.d, ProductType.dWaterLoss, ProductType.dBaseLoss, | ||
} | ||
}, | ||
{ | ||
FragmentationTerminus.ThreePrime, new List<ProductType> | ||
{ | ||
ProductType.w, ProductType.wWaterLoss, ProductType.wBaseLoss, | ||
ProductType.x, ProductType.xWaterLoss, ProductType.xBaseLoss, | ||
ProductType.y, ProductType.yWaterLoss, ProductType.yBaseLoss, | ||
ProductType.z, ProductType.zWaterLoss, ProductType.zBaseLoss, | ||
} | ||
}, | ||
{ | ||
FragmentationTerminus.Both, new List<ProductType> | ||
{ | ||
|
||
ProductType.a, ProductType.aWaterLoss, ProductType.aBaseLoss, | ||
ProductType.b, ProductType.bWaterLoss, ProductType.bBaseLoss, | ||
ProductType.c, ProductType.cWaterLoss, ProductType.cBaseLoss, | ||
ProductType.d, ProductType.dWaterLoss, ProductType.dBaseLoss, | ||
ProductType.w, ProductType.wWaterLoss, ProductType.wBaseLoss, | ||
ProductType.x, ProductType.xWaterLoss, ProductType.xBaseLoss, | ||
ProductType.y, ProductType.yWaterLoss, ProductType.yBaseLoss, | ||
ProductType.z, ProductType.zWaterLoss, ProductType.zBaseLoss, | ||
ProductType.M | ||
} | ||
|
||
}, | ||
{ | ||
FragmentationTerminus.None, new List<ProductType>() | ||
} | ||
}; | ||
|
||
|
||
public static FragmentationTerminus GetRnaTerminusType(this ProductType fragmentType) | ||
{ | ||
switch (fragmentType) | ||
{ | ||
case ProductType.a: | ||
case ProductType.aWaterLoss: | ||
case ProductType.aBaseLoss: | ||
case ProductType.b: | ||
case ProductType.bWaterLoss: | ||
case ProductType.bBaseLoss: | ||
case ProductType.c: | ||
case ProductType.cWaterLoss: | ||
case ProductType.cBaseLoss: | ||
case ProductType.d: | ||
case ProductType.dWaterLoss: | ||
case ProductType.dBaseLoss: | ||
case ProductType.w: | ||
case ProductType.wWaterLoss: | ||
case ProductType.wBaseLoss: | ||
case ProductType.x: | ||
case ProductType.xWaterLoss: | ||
case ProductType.xBaseLoss: | ||
case ProductType.y: | ||
case ProductType.yWaterLoss: | ||
case ProductType.yBaseLoss: | ||
case ProductType.z: | ||
case ProductType.zWaterLoss: | ||
case ProductType.zBaseLoss: | ||
case ProductType.M: | ||
return ProductTypeToFragmentationTerminus[fragmentType]; | ||
|
||
case ProductType.aStar: | ||
case ProductType.aDegree: | ||
case ProductType.bAmmoniaLoss: | ||
case ProductType.yAmmoniaLoss: | ||
case ProductType.zPlusOne: | ||
case ProductType.D: | ||
case ProductType.Ycore: | ||
case ProductType.Y: | ||
default: | ||
throw new ArgumentOutOfRangeException(nameof(fragmentType), fragmentType, null); | ||
} | ||
} | ||
|
||
|
||
/// <summary> | ||
/// The terminus of the oligo fragment that the product ion is generated from | ||
/// </summary> | ||
public static Dictionary<ProductType, FragmentationTerminus> ProductTypeToFragmentationTerminus = new Dictionary<ProductType, FragmentationTerminus> | ||
{ | ||
{ ProductType.a, FragmentationTerminus.FivePrime }, | ||
{ ProductType.aWaterLoss, FragmentationTerminus.FivePrime }, | ||
{ ProductType.aBaseLoss, FragmentationTerminus.FivePrime }, | ||
{ ProductType.b, FragmentationTerminus.FivePrime }, | ||
{ ProductType.bWaterLoss, FragmentationTerminus.FivePrime }, | ||
{ ProductType.bBaseLoss, FragmentationTerminus.FivePrime }, | ||
{ ProductType.c, FragmentationTerminus.FivePrime }, | ||
{ ProductType.cWaterLoss, FragmentationTerminus.FivePrime }, | ||
{ ProductType.cBaseLoss, FragmentationTerminus.FivePrime }, | ||
{ ProductType.d, FragmentationTerminus.FivePrime }, | ||
{ ProductType.dWaterLoss, FragmentationTerminus.FivePrime }, | ||
{ ProductType.dBaseLoss, FragmentationTerminus.FivePrime }, | ||
|
||
{ ProductType.w, FragmentationTerminus.ThreePrime }, | ||
{ ProductType.wWaterLoss, FragmentationTerminus.ThreePrime }, | ||
{ ProductType.wBaseLoss, FragmentationTerminus.ThreePrime }, | ||
{ ProductType.x, FragmentationTerminus.ThreePrime }, | ||
{ ProductType.xWaterLoss, FragmentationTerminus.ThreePrime }, | ||
{ ProductType.xBaseLoss, FragmentationTerminus.ThreePrime }, | ||
{ ProductType.y, FragmentationTerminus.ThreePrime }, | ||
{ ProductType.yWaterLoss, FragmentationTerminus.ThreePrime }, | ||
{ ProductType.yBaseLoss, FragmentationTerminus.ThreePrime }, | ||
{ ProductType.z, FragmentationTerminus.ThreePrime }, | ||
{ ProductType.zWaterLoss, FragmentationTerminus.ThreePrime }, | ||
{ ProductType.zBaseLoss, FragmentationTerminus.ThreePrime }, | ||
|
||
{ ProductType.M, FragmentationTerminus.Both } | ||
}; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
>id:2|Name:20mer1|SOterm:20mer1|Type:tRNA|Subtype:Ala|Feature:VGC|Cellular_Localization:freezer|Species:standard | ||
GUACUGCCUCUAGUGAAGCA |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
<?xml version="1.0" encoding="utf-8"?> | ||
<mzLibProteinDb> | ||
<entry> | ||
<accession>20mer1</accession> | ||
<name>20mer1</name> | ||
<protein> | ||
<recommendedName> | ||
<fullName>20mer1</fullName> | ||
</recommendedName> | ||
</protein> | ||
<gene /> | ||
<organism> | ||
<name type="scientific">standard</name> | ||
</organism> | ||
<sequence length="20">GUACUGCCUCUAGUGAAGCA</sequence> | ||
</entry> | ||
</mzLibProteinDb> |
Binary file not shown.
10 changes: 10 additions & 0 deletions
10
mzLib/Test/Transcriptomics/TestData/ModomicsUnmodifiedTrimmed.fasta
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
>id:1|Name:tdbR00000010|SOterm:SO:0000254|Type:tRNA|Subtype:Ala|Feature:VGC|Cellular_Localization:prokaryotic cytosol|Species:Escherichia coli | ||
GGGGCUAUAGCUCAGCUGGGAGAGCGCCUGCUUUGCACGCAGGAGGUCUGCGGUUCGAUCCCGCAUAGCUCCACCA | ||
>id:2|Name:tdbR00000008|SOterm:SO:0000254|Type:tRNA|Subtype:Ala|Feature:GGC|Cellular_Localization:prokaryotic cytosol|Species:Escherichia coli | ||
GGGGCUAUAGCUCAGCUGGGAGAGCGCUUGCAUGGCAUGCAAGAGGUCAGCGGUUCGAUCCCGCUUAGCUCCACCA | ||
>id:3|Name:tdbR00000356|SOterm:SO:0001036|Type:tRNA|Subtype:Arg|Feature:ICG|Cellular_Localization:prokaryotic cytosol|Species:Escherichia coli | ||
GCAUCCGUAGCUCAGCUGGAUAGAGUACUCGGCUACGAACCGAGCGGUCGGAGGUUCGAAUCCUCCCGGAUGCACCA | ||
>id:4|Name:tdbR00000359|SOterm:SO:0001036|Type:tRNA|Subtype:Arg|Feature:CCG|Cellular_Localization:prokaryotic cytosol|Species:Escherichia coli | ||
GCGCCCGUAGCUCAGCUGGAUAGAGCGCUGCCCUCCGGAGGCAGAGGUCUCAGGUUCGAAUCCUGUCGGGCGCGCCA | ||
>id:5|Name:tdbR00000358|SOterm:SO:0001036|Type:tRNA|Subtype:Arg|Feature:UCU|Cellular_Localization:prokaryotic cytosol|Species:Escherichia coli | ||
GCGCCCUUAGCUCAGUUGGAUAGAGCAACGACCUUCUAAGUCGUGGGCCGCAGGUUCGAAUCCUGCAGGGCGCGCCA |
Binary file added
BIN
+369 Bytes
mzLib/Test/Transcriptomics/TestData/ModomicsUnmodifiedTrimmed.fasta.gz
Binary file not shown.
Oops, something went wrong.