include process to estimate error #64

ggabernet · 2020-01-25T10:45:00Z

EstimateError can be used to estimate sequencing and PCR error rates: https://presto.readthedocs.io/en/stable/examples/tasks.html#estimating-sequencing-and-pcr-error-rates-with-umi-data

A process could be included to estimate these errors per sample

StephenWist · 2025-01-16T17:26:22Z

Hello!

I decided to take a shot at this issue. I've never worked on a pipeline of this complexity so I'm still figuring out how to correctly define the inputs and outputs, how everything connects, etc...

What I've managed to do so far is make a process for EstimateError.py and got the pipeline to run on a sample of the Stern2014 data. Since it's such a small subset of the data (10K reads) and I'm testing on my wimpy home computer, I did not include additional EstimateError arguments like in the example.

I can keep working on this-I think only R1 is used at the moment and I still need to get the outputs and logs going where the rest of the Presto outputs/logs go. Currently the output files are being written to work dirs.

Can you advise if I stuck the process in a sensible place? It can run on the fastq outputs of PairSeq, PostConsensus_PairSeq, ClusterSets, and Parse_ClusterSets. My idea was running EstimateError right after PairSeq is the best choice. Maybe that's not the case.

Here's the code so far if you'd like to see: Comparing changes to master branch.

StephenWist · 2025-01-22T16:41:22Z

Logs and outputs are now being sent to a dir like the rest of the Presto outputs, and EstimateError is run on both R1 and R2. However, R2 almost never has a UMI if I remember correctly. Maybe that will cause unnecessary running of EstimateError and a flag or check for which read has the UMI could be added.

StephenWist · 2025-01-24T15:43:32Z

EstimateError now runs on R1 or R2 based on the barcode_position variable: https://github.com/nf-core/airrflow/commit/c504ac6c86e77bef7d4eaa23e24cc6a2b2b007b9

Still to do (I'm putting this here mainly as notes for myself):
Double check that running EstimateError where I put it (just after Presto PairSeq UMI) makes sense from a biological perspective.
Also, check how running the pipeline with library_generation_method set to any value besides specific_pcr_umi alters the workflow in regards to EstimateError.

ggabernet added the enhancement New feature or request label Jul 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

include process to estimate error #64

include process to estimate error #64

ggabernet commented Jan 25, 2020

StephenWist commented Jan 16, 2025 •

edited

Loading

StephenWist commented Jan 22, 2025

StephenWist commented Jan 24, 2025

include process to estimate error #64

include process to estimate error #64

Comments

ggabernet commented Jan 25, 2020

StephenWist commented Jan 16, 2025 • edited Loading

StephenWist commented Jan 22, 2025

StephenWist commented Jan 24, 2025

StephenWist commented Jan 16, 2025 •

edited

Loading