Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DMR pair fails with equal number of replicates #165

Closed
dietvin opened this issue Apr 17, 2024 · 3 comments
Closed

DMR pair fails with equal number of replicates #165

dietvin opened this issue Apr 17, 2024 · 3 comments
Labels
bug Something isn't working build-available custom build produced for fix.

Comments

@dietvin
Copy link

dietvin commented Apr 17, 2024

Hello everybody,

using modkit v.0.2.7, I'm currently trying to run DMR pair with three replicates for each sample. I used the following command:

modkit dmr pair \
    -a ${BASEDIR}/modkit_pileups/run1/WT.bed.gz \
    -a ${BASEDIR}/modkit_pileups/run2/WT.bed.gz \
    -a ${BASEDIR}/modkit_pileups/run3/WT.bed.gz \
    -b ${BASEDIR}/modkit_pileups/run1/TREATMENT.bed.gz \
    -b ${BASEDIR}/modkit_pileups/run2/TREATMENT.bed.gz \
    -b ${BASEDIR}/modkit_pileups/run3/TREATMENT.bed.gz \
    -o ${BASEDIR}/modkit_dmr/WT_TREATMENT_replicates.tsv \
    --base C \
    --threads 32 \
    --ref ${REF} \
    -f \
    --log-filepath ${BASEDIR}/modkit_dmr.log

This command fails with the following output:

> reading reference FASTA at "[...]/reference/GRCm39/GRCm39.primary_assembly.genome.fa"
> running single-site analysis
> using default prior, Beta(α: 0.55, β: 0.55)
> estimating max coverages from data
> sampled 536252 a records and 515770 b records, calculating max coverages for 95th percentile
> calculated max coverage for a: 4 and b: 4
> running with replicates and matched samples
thread '<unnamed>' panicked at src/dmr/single_site.rs:457:17:
assertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>thread '' panicked at <unnamed>src/dmr/single_site.rs' panicked at :thread 'src/dmr/single_site.rs457thread ':<unnamed>thread ':thread '<unnamed>457thread '' panicked at <unnamed>thread '17<unnamed>thread '' panicked at :thread '<unnamed>src/dmr/single_site.rs' panicked at <unnamed>thread ':
' panicked at thread '<unnamed>src/dmr/single_site.rsthread '17<unnamed>thread '' panicked at :src/dmr/single_site.rs' panicked at assertion `left == right` failed: matched samples need to be the same
  left: 3
 right: 1<unnamed>src/dmr/single_site.rs' panicked at <unnamed>:<unnamed>:
' panicked at <unnamed>src/dmr/single_site.rs457:src/dmr/single_site.rs
thread '' panicked at :src/dmr/single_site.rs' panicked at thread '457' panicked at thread 'assertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 1src/dmr/single_site.rs' panicked at thread '::457:src/dmr/single_site.rs<unnamed>thread '<unnamed>' panicked at src/dmr/single_site.rs:thread '457:<unnamed>thread '17' panicked at :
<unnamed>src/dmr/single_site.rsassertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 1' panicked at :
src/dmr/single_site.rs457::45717::
thread '17assertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 1<unnamed>:
thread '
' panicked at assertion `left == right` failed: matched samples need to be the same
  left: 1
 right: 2thread 'thread '<unnamed>src/dmr/single_site.rsthread '<unnamed>' panicked at src/dmr/single_site.rs:457:17:
assertion `left == right` failed: matched samples need to be the same
  left: 3
 right: 1
thread 'thread '<unnamed><unnamed>' panicked at ' panicked at src/dmr/single_site.rssrc/dmr/single_site.rs::457457::1717:
:
assertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 1assertion `left == right` failed: matched samples need to be the same
  left: 3
 right: 2

[...]

thread '<unnamed>:
assertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 1
' panicked at src/dmr/single_site.rs:45717:
assertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 3
src/dmr/single_site.rs:17:
assertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 3
assertion `left == right` failed: matched samples need to be the same
  left: 3
 right: 2
:17' panicked at src/dmr/single_site.rs:457:17:
assertion `left == right` failed: matched samples need to be the same
  left: 3
 right: 2
:17457:17:
assertion `left == right` failed: matched samples need to be the same
  left: 3
 right: 1
:
assertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 3
:
assertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 3
Rayon: detected unexpected panic; aborting
04_modkit_pileup_dmr.sh: line 62: 902121 Aborted                 (core dumped) modkit dmr pair -a ${BASEDIR}/modkit_pileups/run1/WT.bed.gz -a ${BASEDIR}/modkit_pileups/run2/WT.bed.gz -a ${BASEDIR}/modkit_pileups/run3/WT.bed.gz -b ${BASEDIR}/modkit_pileups/run1/TREATMENT.bed.gz -b ${BASEDIR}/modkit_pileups/run2/TREATMENT.bed.gz -b ${BASEDIR}/modkit_pileups/run3/TREATMENT.bed.gz -o ${BASEDIR}/modkit_dmr/WT_TREATMENT_replicates.tsv --base C --threads 32 --ref ${REF} -f --log-filepath ${BASEDIR}/modkit_dmr.log

From what I saw in the output, it consists only of the "assertion `left == right` failed" message. I can provide the entire output in case it is of use here.

Interestingly, this error happens only if the number of replicates is equal between the samples. Once the number of replicates differ between the samples it works as expected. Here it doesn't matter if I remove a replicate from sample A or B. For example removing the third replicate from sample B:

modkit dmr pair \
    -a ${BASEDIR}/modkit_pileups/run1/WT.bed.gz \
    -a ${BASEDIR}/modkit_pileups/run2/WT.bed.gz \
    -a ${BASEDIR}/modkit_pileups/run3/WT.bed.gz \
    -b ${BASEDIR}/modkit_pileups/run1/TREATMENT.bed.gz \
    -b ${BASEDIR}/modkit_pileups/run2/TREATMENT.bed.gz \
    -o ${BASEDIR}/modkit_dmr/WT_TREATMENT_replicates.tsv \
    --base C \
    --threads 32 \
    --ref ${REF}\
    -f \
    --log-filepath ${BASEDIR}/modkit_dmr.log

This produces the following output:

> reading reference FASTA at "[...]/reference/GRCm39/GRCm39.primary_assembly.genome.fa"
> running single-site analysis
> using default prior, Beta(α: 0.55, β: 0.55)
> estimating max coverages from data
> sampled 536252 a records and 293462 b records, calculating max coverages for 95th percentile
> calculated max coverage for a: 4 and b: 3
> running with replicates, but not matched samples
> finished, processed 29254455 sites successfully, 0 failed

I used the following function to create the bed files used as input here:

function prep_sample {
    bam=$1
    pileup=$2
    echo "Reading from ${bam}... Writing to ${pileup}"
    mkdir -p $(dirname ${pileup})
    modkit pileup ${bam} ${pileup} \
        -t 32 -r ${REF} --max-depth 1000000 --cpg --combine-mods
    bgzip -@ 32 -c ${pileup} > ${pileup}.gz
    tabix -p bed ${pileup}.gz
}

Please let me know if you need more information. Any help is appreciated.

Thanks in advance!
Vincent

@ArtRand
Copy link
Contributor

ArtRand commented Apr 17, 2024

Hello @dietvin,

Unfortunately this is a stale assert that I missed. If you have a different number of a and b samples you won't use the buggy code path (as a temporary work-around). I'll fix the bug and get you a build asap.

@ArtRand ArtRand added the bug Something isn't working label Apr 17, 2024
@ArtRand
Copy link
Contributor

ArtRand commented Apr 18, 2024

Hello @dietvin,

I've started working on a fix for this. In the mean time, I've attached a build to this thread that should at least unblock your work.

The panic comes from an assert firing where the code requires that you have modification counts for all of your samples at a given position. This way, you can calculate a MAP-based p-value and an effect size for all of the "matched pairs" of samples. When one sample doesn't have counts at a position (due to low coverage) the assert fires and you have a problem. The build will still report the per-replicate p-values and per-replicate effect sizes, but the entry will be "-" when at least 1 sample doesn't have counts at that position. The production solution is to calculate the per-replicate statistics when possible, but skip pairs where one member of the pair doesn't have data. I'm working on that next.

modkit_devd5d6015_centos7_x86_64.tar.gz

@ArtRand ArtRand added the build-available custom build produced for fix. label Apr 18, 2024
@dietvin
Copy link
Author

dietvin commented Apr 18, 2024

Thank you for the quick response. With the build you provided it works as described. Much appreciated!

@dietvin dietvin closed this as completed Apr 18, 2024
ArtRand added a commit that referenced this issue Apr 27, 2024
Bug: don't panic when samples are matched but counts cannot be found for all samples. (GH-165)

See merge request machine-learning/modkit!173
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working build-available custom build produced for fix.
Projects
None yet
Development

No branches or pull requests

2 participants