Skip to content

Commit

Permalink
RLE normalization code
Browse files Browse the repository at this point in the history
  • Loading branch information
ahwagner committed Jul 10, 2023
1 parent a4d53da commit 86cafd4
Showing 1 changed file with 12 additions and 10 deletions.
22 changes: 12 additions & 10 deletions docs/source/impl-guide/normalization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,14 +83,16 @@ the following normalization rules apply:
#. Compare the two Allele sequences, if:

a. both are empty, the input Allele is a reference Allele. Return the
input Allele unmodified.
input Allele unmodified. **Discussion point**: should this return a
reference length expression?

#. both are non-empty, the input Allele has been normalized to a
substitution. Return a new Allele with the modified `start`, `end`,
and `Alternate Allele Sequence`.

#. one is empty, the input Allele is an insertion (empty `reference
sequence`) or a deletion (empty `alternate sequence`). Continue to
sequence`) or a deletion (empty `alternate sequence`). Store the length
of the non-empty sequence: this is the `Repeat Subunit Length`. Continue to
step 3.

#. Determine bounds of ambiguity.
Expand All @@ -110,15 +112,15 @@ the following normalization rules apply:

#. Construct a new Allele covering the entire region of ambiguity.

a. Prepend characters from `left_roll_bound` to `start` to both
Allele Sequences.
a. If the `reference sequence` is empty, this is an unambiguous
insertion. Return a new `Allele` with the trimmed `alternate
sequence` as a `Literal Sequence Expression`.

#. Append characters from `start` to `right_roll_bound` to both
Allele Sequences.

#. Set `start` to `left_roll_bound` and `end` to `right_roll_bound`,
and return a new Allele with the modified `start`, `end`, and
`Alternate Allele Sequence`.
#. Otherwise, return a new `Allele` using a `reference length
expression`, using a `Location` specified by the coordinates
of the `left_roll_bound` and `right_roll_bound`, a `length`
specified by the length of the `alternate allele`, and a
`repeat subunit length` as determined in step 2c.

.. _normalization-diagram:

Expand Down

0 comments on commit 86cafd4

Please sign in to comment.