DMR pair fails with equal number of replicates #165

dietvin · 2024-04-17T07:09:44Z

Hello everybody,

using modkit v.0.2.7, I'm currently trying to run DMR pair with three replicates for each sample. I used the following command:

modkit dmr pair \
    -a ${BASEDIR}/modkit_pileups/run1/WT.bed.gz \
    -a ${BASEDIR}/modkit_pileups/run2/WT.bed.gz \
    -a ${BASEDIR}/modkit_pileups/run3/WT.bed.gz \
    -b ${BASEDIR}/modkit_pileups/run1/TREATMENT.bed.gz \
    -b ${BASEDIR}/modkit_pileups/run2/TREATMENT.bed.gz \
    -b ${BASEDIR}/modkit_pileups/run3/TREATMENT.bed.gz \
    -o ${BASEDIR}/modkit_dmr/WT_TREATMENT_replicates.tsv \
    --base C \
    --threads 32 \
    --ref ${REF} \
    -f \
    --log-filepath ${BASEDIR}/modkit_dmr.log

This command fails with the following output:

> reading reference FASTA at "[...]/reference/GRCm39/GRCm39.primary_assembly.genome.fa"
> running single-site analysis
> using default prior, Beta(α: 0.55, β: 0.55)
> estimating max coverages from data
> sampled 536252 a records and 515770 b records, calculating max coverages for 95th percentile
> calculated max coverage for a: 4 and b: 4
> running with replicates and matched samples
thread '<unnamed>' panicked at src/dmr/single_site.rs:457:17:
assertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>thread '' panicked at <unnamed>src/dmr/single_site.rs' panicked at :thread 'src/dmr/single_site.rs457thread ':<unnamed>thread ':thread '<unnamed>457thread '' panicked at <unnamed>thread '17<unnamed>thread '' panicked at :thread '<unnamed>src/dmr/single_site.rs' panicked at <unnamed>thread ':
' panicked at thread '<unnamed>src/dmr/single_site.rsthread '17<unnamed>thread '' panicked at :src/dmr/single_site.rs' panicked at assertion `left == right` failed: matched samples need to be the same
  left: 3
 right: 1<unnamed>src/dmr/single_site.rs' panicked at <unnamed>:<unnamed>:
' panicked at <unnamed>src/dmr/single_site.rs457:src/dmr/single_site.rs
thread '' panicked at :src/dmr/single_site.rs' panicked at thread '457' panicked at thread 'assertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 1src/dmr/single_site.rs' panicked at thread '::457:src/dmr/single_site.rs<unnamed>thread '<unnamed>' panicked at src/dmr/single_site.rs:thread '457:<unnamed>thread '17' panicked at :
<unnamed>src/dmr/single_site.rsassertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 1' panicked at :
src/dmr/single_site.rs457::45717::
thread '17assertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 1<unnamed>:
thread '
' panicked at assertion `left == right` failed: matched samples need to be the same
  left: 1
 right: 2thread 'thread '<unnamed>src/dmr/single_site.rsthread '<unnamed>' panicked at src/dmr/single_site.rs:457:17:
assertion `left == right` failed: matched samples need to be the same
  left: 3
 right: 1
thread 'thread '<unnamed><unnamed>' panicked at ' panicked at src/dmr/single_site.rssrc/dmr/single_site.rs::457457::1717:
:
assertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 1assertion `left == right` failed: matched samples need to be the same
  left: 3
 right: 2

[...]

thread '<unnamed>:
assertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 1
' panicked at src/dmr/single_site.rs:45717:
assertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 3
src/dmr/single_site.rs:17:
assertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 3
assertion `left == right` failed: matched samples need to be the same
  left: 3
 right: 2
:17' panicked at src/dmr/single_site.rs:457:17:
assertion `left == right` failed: matched samples need to be the same
  left: 3
 right: 2
:17457:17:
assertion `left == right` failed: matched samples need to be the same
  left: 3
 right: 1
:
assertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 3
:
assertion `left == right` failed: matched samples need to be the same
  left: 2
 right: 3
Rayon: detected unexpected panic; aborting
04_modkit_pileup_dmr.sh: line 62: 902121 Aborted                 (core dumped) modkit dmr pair -a ${BASEDIR}/modkit_pileups/run1/WT.bed.gz -a ${BASEDIR}/modkit_pileups/run2/WT.bed.gz -a ${BASEDIR}/modkit_pileups/run3/WT.bed.gz -b ${BASEDIR}/modkit_pileups/run1/TREATMENT.bed.gz -b ${BASEDIR}/modkit_pileups/run2/TREATMENT.bed.gz -b ${BASEDIR}/modkit_pileups/run3/TREATMENT.bed.gz -o ${BASEDIR}/modkit_dmr/WT_TREATMENT_replicates.tsv --base C --threads 32 --ref ${REF} -f --log-filepath ${BASEDIR}/modkit_dmr.log

From what I saw in the output, it consists only of the "assertion `left == right` failed" message. I can provide the entire output in case it is of use here.

Interestingly, this error happens only if the number of replicates is equal between the samples. Once the number of replicates differ between the samples it works as expected. Here it doesn't matter if I remove a replicate from sample A or B. For example removing the third replicate from sample B:

modkit dmr pair \
    -a ${BASEDIR}/modkit_pileups/run1/WT.bed.gz \
    -a ${BASEDIR}/modkit_pileups/run2/WT.bed.gz \
    -a ${BASEDIR}/modkit_pileups/run3/WT.bed.gz \
    -b ${BASEDIR}/modkit_pileups/run1/TREATMENT.bed.gz \
    -b ${BASEDIR}/modkit_pileups/run2/TREATMENT.bed.gz \
    -o ${BASEDIR}/modkit_dmr/WT_TREATMENT_replicates.tsv \
    --base C \
    --threads 32 \
    --ref ${REF}\
    -f \
    --log-filepath ${BASEDIR}/modkit_dmr.log

This produces the following output:

> reading reference FASTA at "[...]/reference/GRCm39/GRCm39.primary_assembly.genome.fa"
> running single-site analysis
> using default prior, Beta(α: 0.55, β: 0.55)
> estimating max coverages from data
> sampled 536252 a records and 293462 b records, calculating max coverages for 95th percentile
> calculated max coverage for a: 4 and b: 3
> running with replicates, but not matched samples
> finished, processed 29254455 sites successfully, 0 failed

I used the following function to create the bed files used as input here:

function prep_sample {
    bam=$1
    pileup=$2
    echo "Reading from ${bam}... Writing to ${pileup}"
    mkdir -p $(dirname ${pileup})
    modkit pileup ${bam} ${pileup} \
        -t 32 -r ${REF} --max-depth 1000000 --cpg --combine-mods
    bgzip -@ 32 -c ${pileup} > ${pileup}.gz
    tabix -p bed ${pileup}.gz
}

Please let me know if you need more information. Any help is appreciated.

Thanks in advance!
Vincent

The text was updated successfully, but these errors were encountered:

ArtRand · 2024-04-17T13:10:38Z

Hello @dietvin,

Unfortunately this is a stale assert that I missed. If you have a different number of a and b samples you won't use the buggy code path (as a temporary work-around). I'll fix the bug and get you a build asap.

ArtRand · 2024-04-18T00:47:22Z

Hello @dietvin,

I've started working on a fix for this. In the mean time, I've attached a build to this thread that should at least unblock your work.

The panic comes from an assert firing where the code requires that you have modification counts for all of your samples at a given position. This way, you can calculate a MAP-based p-value and an effect size for all of the "matched pairs" of samples. When one sample doesn't have counts at a position (due to low coverage) the assert fires and you have a problem. The build will still report the per-replicate p-values and per-replicate effect sizes, but the entry will be "-" when at least 1 sample doesn't have counts at that position. The production solution is to calculate the per-replicate statistics when possible, but skip pairs where one member of the pair doesn't have data. I'm working on that next.

modkit_devd5d6015_centos7_x86_64.tar.gz

dietvin · 2024-04-18T05:46:06Z

Thank you for the quick response. With the build you provided it works as described. Much appreciated!

…for all samples. (GH-165)

Bug: don't panic when samples are matched but counts cannot be found for all samples. (GH-165) See merge request machine-learning/modkit!173

ArtRand added the bug Something isn't working label Apr 17, 2024

ArtRand added the build-available custom build produced for fix. label Apr 18, 2024

dietvin closed this as completed Apr 18, 2024

ArtRand mentioned this issue Apr 19, 2024

modkit dmr pair - no significance values in output #168

Closed

ArtRand added a commit that referenced this issue Apr 27, 2024

Bug: don't panic when samples are matched but counts cannot be found …

af028b9

…for all samples. (GH-165)

ArtRand added a commit that referenced this issue Apr 27, 2024

Merge branch 'ar/gh-165' into 'master'

3a19d8c

Bug: don't panic when samples are matched but counts cannot be found for all samples. (GH-165) See merge request machine-learning/modkit!173

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DMR pair fails with equal number of replicates #165

DMR pair fails with equal number of replicates #165

dietvin commented Apr 17, 2024

ArtRand commented Apr 17, 2024 •

edited

Loading

ArtRand commented Apr 18, 2024

dietvin commented Apr 18, 2024

DMR pair fails with equal number of replicates #165

DMR pair fails with equal number of replicates #165

Comments

dietvin commented Apr 17, 2024

ArtRand commented Apr 17, 2024 • edited Loading

ArtRand commented Apr 18, 2024

dietvin commented Apr 18, 2024

ArtRand commented Apr 17, 2024 •

edited

Loading