-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transcriptomics Digestion and Fragmentation #801
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly minor quibbles, but a couple things that were confusing enough that I don't want to approve just yet
- NucleicAcid.Equals() doesn't check the sequence
- NucleicAcid.ParseSequence() does several things that I wouldn't suspect based on the name
- OligoWithSetMods termini setters modify multiple different fields.
Introduced support for handling RNA data within the UsefulProteomicsDatabases project. Key changes include: - Added `Transcriptomics\TestData` folder to `Test.csproj`. - Changed access modifiers in `ProteinDbLoader.cs` to internal. - Added `using` directives for `Transcriptomics` in `ProteinXmlEntry.cs`. - Introduced methods `ParseRnaEndElement` and `ParseRnaEntryEndElement` in `ProteinXmlEntry.cs`. - Modified `ParseAnnotatedMods` to check for RNA modifications. - Added project reference to `Transcriptomics.csproj` in `UsefulProteomicsDatabases.csproj`. - Added `ClassExtensions.cs` with `CreateNew` method for nucleic acids. - Added `RnaDbLoader.cs` for RNA database loading. - Added `RnaDecoyGenerator.cs` for generating decoy RNA sequences.
Updated `using` directives in `TestDigestion.cs` and `OligoWithSetMods.cs` to include necessary namespaces. Added assertions in `TestDigestion.cs` for `SequenceWithChemicalFormulas` and `FullSequenceWithMassShift`. Changed `namespace` in `OligoWithSetMods.cs` to `Transcriptomics.Digestion`. Implemented and cached `SequenceWithChemicalFormulas` property in `OligoWithSetMods.cs`.
- Added new files `ModomicsUnmodifiedTrimmed.fasta` and `ModomicsUnmodifiedTrimmed.fasta.gz` to `Test.csproj` with `CopyToOutputDirectory` set to `PreserveNewest`. - Removed the `Transcriptomics\TestData` folder from `Test.csproj`. - Introduced `Transcribe` method in `ClassExtensions.cs` for DNA to RNA transcription. - Added summary comment to `NucleolyticOligo` class in `NucleolyticOligo.cs`. - Added `ApplyRegex` method in `FastaHeaderFieldRegex.cs`. - Introduced `ProteinDbWriter` class in `ProteinDbWriter.cs` for writing protein and nucleic acid databases. - Modified `GetModsForThisProtein` to `GetModsForThisBioPolymer` in `ProteinDbWriter.cs`. - Added `RnaDbLoader` class in `RnaDbLoader.cs` for RNA FASTA header detection and sequence loading. - Updated user dictionary in `mzLib.sln.DotSettings` with new terms. - Added test cases in `TestDbLoader.cs` for RNA database loading and header detection. - Introduced `TestDecoyGeneration` class in `TestDecoyGenerator.cs` for RNA decoy generation tests. - Added RNA sequence file `ModomicsUnmodifiedTrimmed.fasta` and its compressed version.
- Added `using` directives for `Transcriptomics.Digestion` and `UsefulProteomicsDatabases.Transcriptomics` in `TestDecoyGenerator.cs`. - Introduced `TestCreateNew` in `TestDecoyGenerator.cs` to verify RNA and oligo creation. - Added `using` directive for `MzLibUtil` in `TestDigestion.cs`. - Added a test in `TestDigestion.cs` for exception handling with invalid sequences. - Added `using` directives for `Omics` and related namespaces in `TestFragmentation.cs`. - Modified `TestFragmentation_Modified` in `TestFragmentation.cs` to use `OligoWithSetMods` directly and added assertions. - Updated `ClassExtensions.cs` to allow setting `isDecoy` in new `RNA` objects. - Refactored `OligoWithSetMods.cs` to return a dictionary from `GetModsAfterDeserialization`. - Updated `OligoWithSetMods.cs` to initialize `_allModsOneIsNterminus` using the returned dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good. resolved all my previous comments except 1. added 2 or 3 methods that could/should be unit tested. I don't see anything problematic.
Refactored constructors, improved exception handling, and added comprehensive tests across multiple files. Key changes include: - `MzLibException.cs`: Updated constructor to include `innerException`. - `TestDecoyGenerator.cs`: Added assertions for `CreateNew` method. - `TestDigestion.cs`: Added assertions and new test for RNA digestion exception. - Refactored modification lists and added various tests for modifications. - `TestNucleicAcid.cs`: Refactored methods, adjusted precision, and updated terminus assignments. - `NucleolyticOligo.cs`: Changed parameter types, updated comments, and improved variable names. - `OligoWithSetMods.cs`: Enhanced exception messages and updated modification location checks. - `NucleicAcid.cs`: Added `using` directive, changed exception type, and refactored methods. - `mzLib.sln.DotSettings`: Updated user dictionary entries.
…into RnaImplementation
Added new test data files (`20mer1.fasta`, `20mer1.fasta.gz`, `20mer1.xml`, `20mer1.xml.gz`) to the `Transcriptomics\TestData` directory in the `Test.csproj` file, ensuring they are copied to the output directory. Introduced `TestDbReadingDifferentExtensions` in `TestDbLoader.cs` to verify RNA database reading from various formats. Added `TestDigestionMaxIsoforms` in `TestDigestion.cs` to test RNA sequence digestion with max isoforms. Updated `WriteNucleicAcidXmlDatabase` in `ProteinDbWriter.cs` with remarks for future implementation. Added a TODO in `RnaDecoyGenerator.cs` regarding palindromic sequences' impact on fragment ions. Included new RNA sequence data in test files for validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, all comments have been addressed
Implemented the derived calasses of the Omics abstracts and interfaces for RNA