Allow import and export of sequences with three letter amino acid codes #5556
Labels
Export to Sequence
Bucket: Bugs related to Export to Sequence mode
Test Cases Written
Test cases has been written for that issue
Milestone
Background
In addition to a single letter codes for amino acids, there exist standard three letter codes that are preferred by some biochemists/molecular biologists etc.
In the scope of this ticket is import and export of amino acid sequences (sequence format) with three letter codes, and not the canvas representation using three letters.
Three letter codes
Requirements
1. Import logic described bellow:
1.1. On import/paste from clipboard an additional drop down menu should appear if "Sequence" and then "Peptide" is selected. The options in that menu should be "1-letter code" and "3-letter code".
1.2. Valid input string can consist of 26*2 uppercase and lowercase English alphabet letters, spaces and line breaks.
1.3. Spaces should be interpreted as separating different sequences.
1.4. Line breaks should be ignored.
1.5. If an invalid symbol if used an error message should appear.
1.6. Within one sequence every n*3+1 letter symbol has to be uppercase.
1.7. If requirement 1.6. is not fulfilled an error message should appear, with a title of "Incorrect Formatting" and text "Given string cannot be interpreted as a valid three letter sequence because of incorrect formatting."
1.8. Every triplet of letters in the sequence (that has the first letter uppercase and others lowercase - requirement 1.6) should be interpreted as an amino acid using the table above.
1.9. If requirement 1.8. in not fulfilled an error message should appear, with a title "Invalid Sequence" and text "Given string cannot be interpreted as a valid three letter sequence."
2. Export logic described bellow:
2.1. On export "Sequence" should be replaced with two options "Sequence (1-letter code)" and "Sequence (3-letter code)"
2.1.1. The current export to sequence is the new export to "Sequence (1-letter code)"
2.2. Only purely amino acid sequences without non-standard ambiguous amino acids can be exported to "Sequence (3-letter code)"
2.3. If one of the amino acids is a non-standard ambiguous amino acid an error message should appear, with the title "Non-standard amino acid" and text "Non-standard ambiguous amino acids cannot be exported to the selected format".
2.4. If the sequence is not a purely amino acid sequence on export an error message should appear.
2.5. All amino acids should be exported as the three letter code of their natural analogue.
2.6. All amino acids with the natural analogue of
X
should be exported asXun
.2.7. All standard ambiguous amino acids should be exported as the appropriate three letter code from the table above.
2.8. The different sequences should be separated by space.
UX
The text was updated successfully, but these errors were encountered: