Skip to content

Commit

Permalink
add more notes about scaled
Browse files Browse the repository at this point in the history
  • Loading branch information
taylorreiter committed Sep 14, 2022
1 parent 3ff8024 commit f43caec
Showing 1 changed file with 7 additions and 1 deletion.
8 changes: 7 additions & 1 deletion notebooks/20220913-fraction-mtx-after-subtracting-mgx.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,15 @@
"\n",
"These values were estimated by using FracMinHash sketching. \n",
"[FracMinhash sketches](https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2.abstract) represent a fraction of the k-mers in the original sequence.\n",
"\n",
"K-mers are words of length _k_ in nucleotide sequences.\n",
"Here we used k-mer sizes 21, 31, and 51 because these can be used to estimate genus, species, and strain-level similarity respectively between sets of sequences.\n",
"We also downsampled to 1/200th of all k-mers to facilitate faster comparisons.\n",
"\n",
"We also downsampled to 1/200th (0.5%) of all k-mers to facilitate faster comparisons (`scaled = 200`).\n",
"Importantly, the same fraction of k-mers is retained across samples if the same sequences are contained within those samples.\n",
"Within the sourmash ecosystem, scaled values of 1000 and 2000 are the most standard because these values are a nice compromise for speed of comparisons while retaining enough k-mers to accurately represent microbial genomes.\n",
"Because viral genomes can be very small, we reduced the scaled value to 200 to retain more k-mers with the idea that even small viral genomes would still have a few representative k-mers within the sketch.\n",
"See [this issue](https://github.com/sourmash-bio/sourmash/issues/1340) for more details about how the scaled value and the k-mer size impact the number of k-mers retained for viral genomes of different sizes.\n",
"\n",
"After we sketched both the metatranscriptome and the metagenome, k-mers that occurred in the metagenome were removed from its paired metatranscriptome.\n",
"\n",
Expand Down

0 comments on commit f43caec

Please sign in to comment.