-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provide an explicit list of cell barcodes to whitelist? #642
Comments
Hi @bbimber. Just to clarify, you have a whitelist of cell barcodes and would like UMI-tools to automatically identify the acceptable cell barcodes which should be erorr-corrected to these cell barcodes. Is that correct? If so, I'm afraid this isn't currently supported by UMI-tools. When running It should be relatively trivial to determine for yourself what the error barcodes you wish to correct are if you already have a list of whitelisted cell barcodes. However, one issue will be that specifying all the possible error corrections without making reference to whether the barcode is actually observed, column 2 of the whitelist will get excessively long. You could run Hmm.. answering your question, I see the issue now! |
@IanSudbery, any objections to an option being added to whitelist to accept a pre-defined whitelist and then derive a sensible whitelist + error-corrections from the fastq? It should be a simple addition of a new UMI-tools/umi_tools/whitelist_methods.py Lines 469 to 474 in 9ce3a70
There is an option to define a error correction from just the whitelist CB sequences when reading in the whitelist in extract, but that's going to run into issues creating an excessively broad set of possible error corrections, since there is no checking that the error CBs are actually present in the data. I imagine the excessively broad whitelist might impact on runtime. UMI-tools/umi_tools/whitelist_methods.py Lines 501 to 504 in 9ce3a70
|
I've no objection, other than to add that I'm not really all that au-fait with There is an option already to read a supplied whitelist into |
@TomSmithCGAT: yes, your description is pretty accurate. I considered the options you were suggesting, including making the TSV whitelist format myself. Like you said, the utility of having umi-tools generate the error-corrected barcodes is that it would be empirical based on data |
I think it would only be empirical in that it a list of all possible barcodes that could be corrected would be filtered by those actaully present. I don't think it would make any different to the results. Where it might have a benefit is that the lists would be smaller, and therefore the extract process might be quicker/less memory consuming. |
Hello -
In some workflows that could use umi-tools, we already have an explicit whitelist of the corrected cell barcodes. There is still a need to identify the non-error-corrected cell barcodes.
As I understand umi-tools, one can run the whitelist command and either give it a cell number, or let the tool infer the cell #. Is there any way to provide a list of allowable cell barcodes, and to let umi-tools generate the whitelist TSV to map cellbarcode to error-corrected barcodes?
The text was updated successfully, but these errors were encountered: