-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: how to improve cell barcode correction? #74
Comments
Hi @AnnaAMonaco, Thanks for the detailed report, and we're happy to help! To your first question:
Thanks! |
Thanks for the reply! When it comes to my second question I was more wondering what would be the way to go when I know the valid barcodes, but giving this whitelist still leaves me with multiple noisy barcodes.
This makes me think that maybe many of these barcodes have more that 1 mismatch? So maybe giving it a list of potential barcodes and working with the min reads threshold could help. But I was really wondering if I would run something like I hope my question makes sense :) Anna |
Hi,
I have been having issues recovering the same amount of cell barcodes post correction in salmon alevin compared to cell ranger. Maybe alevin-fry can help in this task?
What I have
What I need
I have been running salmon alevin to get alignments and do some allele-specific analysis in a single-cell setting. I already use a whitelist containing only the cell barcodes that are passed as true cells by CellRanger (~80% of the total). This is the generic line I usually run:
salmon alevin -lISR -1 $Read1 -2 $Read2 --chromiumV3 -i $index -p 12 -o $outDir --tgMap $tsv --whitelist $whitelist --numCellBootstraps 20 --dumpFeatures
What the issue is
This only gives me a 37% mapping rate (expected is ~70% from running salmon pseudobulk on the data), and by troubleshooting it turns out that there are a large amount of reads that are discarded because of "noisy barcodes". From
alevin_meta_info.json
:"total_reads": 82603614, "reads_with_N": 0, "noisy_cb_reads": 35104396, "noisy_umi_reads": 7532
Question
From my understanding, alevin-fry could help with the cell barcode correcting, so I tried following the docs for "generate-permit-list" but I still have some questions.
First, I actually am having trouble generating the RAD directory that the options
--rad
and--sketch
should do. Running salmon alevin as above adding these two options -- either individually or together -- generated a "map.rad" file that alevin-fry doesn't take as input.$ alevin-fry generate-permit-list --input map.rad --output-dir $outDir --expected-ori either --valid-bc $whitelist
error: Invalid value "map.rad" for '--input <INPUT>': No valid directory was found at this path.
For more information try --help
This brings me to my second question: the above code would take my whitelist that contains barcodes I already know are true cells and correct against it, right? But in theory salmon alevin also does this in my quantification step. I know the other option is to use a list of all available barcodes and change the
--min-reads
threshold, but is this actually better than knowing which barcodes are true cells? Why not set this true whitelist as--unfiltered-pl
and then--min-reads 10
?I hope I was clear enough but I would obviously be happy to elaborate on any unclear part or anything I might have left out :)
Cheers,
Anna
The text was updated successfully, but these errors were encountered: