Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing option --bc-pattern #673

Open
user-tq opened this issue Dec 23, 2024 · 1 comment
Open

Confusing option --bc-pattern #673

user-tq opened this issue Dec 23, 2024 · 1 comment

Comments

@user-tq
Copy link

user-tq commented Dec 23, 2024

My reads structure is

3bp umi+3bp N +T+ insertion +A+3bp N +3bp umi

so my cmd

umi_tools extract --bc-pattern=XXXNNNN --bc-pattern2=XXXNNNN --ignore-read-pair-suffixes --stdin=raw_1.fq.gz --stdout=r1_pipe.fq.gz --read2-in=raw_2.fq.gz --read2-out=r2_pipe.fq.gz 

but,i get

@E250050783L1C001R0030005922_CATAATGT
GTCTTACATTGGTGAAAGTAACTTTCACATGTTCAAAAACCAAATAAGATGATTTATCTCACCTCCTGCTGATCTTCTTGATTACAACCCAGTAATAGATAAACCAGAATATGTGGAAGAAGATAGATGGTCACTTTGAAATCATG

I don't know why it extracted 4bp as UMI,

zcat ./r1_pipe.fq.gz|  awk '{print length}'|head
37
146
1
146

In my understanding, 7bp of R1 and R2 should not participate in the downstream BWA, while 3bp of R1 and R2 should be combined as UMI. How can I achieve this?

@IanSudbery
Copy link
Member

Dear User-tq,

In barcode patterns entered as strings, Ns are the bases that are taken as UMIs. Thus your definition of XXXNNNN specifies to UMI tools 3 bases that are not part of the UMI, but should be retained on the read sequence, followed by 4 bases of UMI. If your reads start with 3 bases of UMI, then your bc-pattern needs to start with NNN. X in patterns is for using when there are bases upstream of the UMI, that you none the less want to use (for example for demuxing) once the UMI has been removed.

Alternatively you can use regular expressions, with name UMI and DIscard groups.

See the description in the documentation: https://umi-tools.readthedocs.io/en/latest/regex.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants