Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include process to estimate error #64

Open
ggabernet opened this issue Jan 25, 2020 · 3 comments
Open

include process to estimate error #64

ggabernet opened this issue Jan 25, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@ggabernet
Copy link
Member

EstimateError can be used to estimate sequencing and PCR error rates: https://presto.readthedocs.io/en/stable/examples/tasks.html#estimating-sequencing-and-pcr-error-rates-with-umi-data

A process could be included to estimate these errors per sample

@ggabernet ggabernet added the enhancement New feature or request label Jul 19, 2021
@StephenWist
Copy link

StephenWist commented Jan 16, 2025

Hello!

I decided to take a shot at this issue. I've never worked on a pipeline of this complexity so I'm still figuring out how to correctly define the inputs and outputs, how everything connects, etc...

What I've managed to do so far is make a process for EstimateError.py and got the pipeline to run on a sample of the Stern2014 data. Since it's such a small subset of the data (10K reads) and I'm testing on my wimpy home computer, I did not include additional EstimateError arguments like in the example.

I can keep working on this-I think only R1 is used at the moment and I still need to get the outputs and logs going where the rest of the Presto outputs/logs go. Currently the output files are being written to work dirs.

Can you advise if I stuck the process in a sensible place? It can run on the fastq outputs of PairSeq, PostConsensus_PairSeq, ClusterSets, and Parse_ClusterSets. My idea was running EstimateError right after PairSeq is the best choice. Maybe that's not the case.

Here's the code so far if you'd like to see: Comparing changes to master branch.

@StephenWist
Copy link

Logs and outputs are now being sent to a dir like the rest of the Presto outputs, and EstimateError is run on both R1 and R2. However, R2 almost never has a UMI if I remember correctly. Maybe that will cause unnecessary running of EstimateError and a flag or check for which read has the UMI could be added.

@StephenWist
Copy link

EstimateError now runs on R1 or R2 based on the barcode_position variable: https://github.com/nf-core/airrflow/commit/c504ac6c86e77bef7d4eaa23e24cc6a2b2b007b9

Still to do (I'm putting this here mainly as notes for myself):
Double check that running EstimateError where I put it (just after Presto PairSeq UMI) makes sense from a biological perspective.
Also, check how running the pipeline with library_generation_method set to any value besides specific_pcr_umi alters the workflow in regards to EstimateError.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants