You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm testing this pipeline with a view to using it routinely for the many bulk BCR targeted sequencing data of the research team where I work.
We have data from 5'RACE with UMI. Our R1 reads consist of UMI + race linker preceded by a 27nt sequences (slightly variable in size) and our R2 reads start directly with the cprimer. I thought of cutting the sequence upstream of the UMI with cutadapt before launching the pipeline, but I realize that this will introduce errors in the analysis because cutadapt will search for the pattern of our 27nt sequence without taking into account the sequence of the UMI + race linker, which creates offsets for the alignment of the race linker for some of our R1 reads (same problem if I cut 27nt at the beginning of all the R1s).
First, I would like to add the possibility of cutting a sequence upstream of the UMI by looking for UMI+race linker pattern and cut what is there before the match. This is possible with MaskPrimers.py align in trim mode and a fasta containing the UMI+ race linker pattern. I am new to the analysis of this type of data, and therefore have difficulty to understand AIRR library_generation_methods, but I understand that we need to add a new supported library_generation_method and add a new process PRESTO_MASKPRIMERS_ALIGN_TRIM especially for this protocol that would launch MaskPrimers.py align in trim mode just before PRESTO_MASKPRIMERS_UMI step, right?
I already did a successful test by giving raw reads to the pipeline with dt_5p_race_umi library_generation_method and adding directly MaskPrimers.py align in trim mode command in front of the two MaskPrimers.py score commands in the .command.sh of PRESTO_MASKPRIMERS_UMI cache files, then by running corresponding .command.run and finaly resume the pipeline.
I would also like to add a step AssemblePairs.py join to join the reads that failed at the step PRESTO_ASSEMBLIES _UMI by their ends. In the same way should we reserve this new step to the new supported protocol ?
Description of feature
Hi,
I'm testing this pipeline with a view to using it routinely for the many bulk BCR targeted sequencing data of the research team where I work.
We have data from 5'RACE with UMI. Our R1 reads consist of UMI + race linker preceded by a 27nt sequences (slightly variable in size) and our R2 reads start directly with the cprimer. I thought of cutting the sequence upstream of the UMI with cutadapt before launching the pipeline, but I realize that this will introduce errors in the analysis because cutadapt will search for the pattern of our 27nt sequence without taking into account the sequence of the UMI + race linker, which creates offsets for the alignment of the race linker for some of our R1 reads (same problem if I cut 27nt at the beginning of all the R1s).
First, I would like to add the possibility of cutting a sequence upstream of the UMI by looking for UMI+race linker pattern and cut what is there before the match. This is possible with
MaskPrimers.py align
in trim mode and a fasta containing the UMI+ race linker pattern. I am new to the analysis of this type of data, and therefore have difficulty to understand AIRR library_generation_methods, but I understand that we need to add a newsupported library_generation_method
and add a new processPRESTO_MASKPRIMERS_ALIGN_TRIM
especially for this protocol that would launchMaskPrimers.py align
in trim mode just beforePRESTO_MASKPRIMERS_UMI
step, right?I already did a successful test by giving raw reads to the pipeline with
dt_5p_race_umi library_generation_method
and adding directlyMaskPrimers.py align
in trim mode command in front of the twoMaskPrimers.py score
commands in the .command.sh of PRESTO_MASKPRIMERS_UMI cache files, then by running corresponding .command.run and finaly resume the pipeline.I would also like to add a step
AssemblePairs.py join
to join the reads that failed at the stepPRESTO_ASSEMBLIES _UMI
by their ends. In the same way should we reserve this new step to the new supported protocol ?I would also like to know if it is easy to add an option for running
AssignGenes.py
with--format airr
instead of--format blast
, because there are columns that we need in IgBlast 19 columns mode and I saw that the expected file in the corresponding process must be in .fmt7 format.I would be grateful for your advice and sorry for my poor English.
Justine
The text was updated successfully, but these errors were encountered: