Skip to content

Commit

Permalink
Changed Readme NEWS part.
Browse files Browse the repository at this point in the history
  • Loading branch information
stela2502 committed Mar 5, 2024
1 parent 9c987d8 commit 5122d79
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 2 deletions.
16 changes: 15 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,21 @@ You can inspect the state of the program using this [deatiled comparison between

## News

Transpsable elements like Sine and Line repeats can not be indexed and mapped - both in BD Rhapsody as well as in 10X expression datasets. Analysis of 10x data is very slow at the moment. Problem is currently (08.02.2024) worked on.
### 2.1.0

I finally found fishy reads and had to further improve on the mapping.
Quantify_rhapsody_multi now gives the user access to these new options:

``--min-matches 1`` I one 8bp intitial match (100% identity) and one 32 bp relaxed match of at least 5 tries identify exactly one gene this read will be tagged as coming from that gene. This value is also used to filter if multiple genes are detected, but there the nw-val is more important.
``--highest-humming-val 0.9`` The humming value is used for a quick filtering of really useless reads. It is calculated as absolute difference between all trimers of both the target and the search 32bp fragment. This value is divided by the total length of the comparison. 0.9 is the default value and is very inclusive.
``--highest-nw-val 0.3`` The nw value is a Needleman-Wunsch inspired value. All 32bp fragments that pass the humming test, the initial table of the Needleman Wunsch algorithm is calculated and the final value from that comparison is again divided by the total length of the initial (max) 32bp fragments. In the end both the amount of passing matches as well as the mean nw value of a read will be used to identify the matching gene.
Values above 0.3 will lead to a lot of false postives.

In addition simple DNA fragments are now excluded from the index - like simple repeats or long stretches of one base. They did lead to mapping of really shady reads e.g. containing polA sequences from some random genes (NCBI blastn confirmed).

### 2.0.0

Transposable elements like Sine and Line repeats can now be indexed and mapped - both in BD Rhapsody as well as in 10X expression datasets. Analysis of 10x data is very slow at the moment. Problem is currently (08.02.2024) worked on.

Mapper has significantly improved: both the false positive as well as false negative rate had improved.
The improvement was possible by using a needleman-wunsch inspired algorithm.
Expand Down
2 changes: 1 addition & 1 deletion src/fast_mapper/fast_mapper.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1011,7 +1011,7 @@ impl FastMapper{
println!("read mapping to {} - should not happen here?: {:?}\n{:?}", bad_gene, self.gene_names_for_ids( &matching_geneids ),String::from_utf8_lossy(seq) );
//println!("This is our total matching set: {:?}", genes);
}*/
println!("gene {matching_geneids:?} detected");
//println!("gene {matching_geneids:?} detected");
if matching_geneids.len() == 1 {
return Ok( matching_geneids )
}else {
Expand Down

0 comments on commit 5122d79

Please sign in to comment.