anota2seq fails with more than 2 levels in the samplesheet phenotype / contrast variable #89

FelixKrueger · 2025-01-30T13:45:35Z

Description of feature

Background: We used a public dataset with matched Ribo-seq and RNA-seq and tried to get the anota2seq step to work. [by we I mostly mean my colleague @naiarabediaga]

As contrast file we used the contrast file:

id,variable,reference,target,batch,pair
KI_LIF_vs_WT,treatment,WT_LIF,KI_LIF,,pair
KI_LIF2i_vs_WT,treatment,WT_LIF2i,KI_LIF2i,,pair

This fails for the reason that anota2seq only accepts one contrast per file, which had already been pointed out in this bug report. Splitting the contrast files into two single files has the potential to work:

contrast_Amiri1.csv
::::::::::::::
id,variable,reference,target,batch,pair
KI_LIF_vs_WT,treatment,WT_LIF,KI_LIF,,pair

contrast_Amiri2.csv
::::::::::::::
id,variable,reference,target,batch,pair
KI_LIF2i_vs_WT,treatment,WT_LIF2i,KI_LIF2i,,pair

By looking at the anota2seq job that gets run, it looks like the Nextflow logic extracts the contrast information from the contrast file, and submits this as a meta flag to the run ([id:KI_LIF_vs_WT, variable:treatment, reference:WT_LIF, target:KI_LIF, batch:, pair:pair]):

'NFCORE_RIBOSEQ:RIBOSEQ:ANOTA2SEQ_ANOTA2SEQRUN ([id:KI_LIF_vs_WT, variable:treatment, reference:WT_LIF, target:KI_LIF, batch:, pair:pair])'

I wonder if the logic could be changed within ANOTA2SEQ_ANOTA2SEQRUN to extract these information for each line and submit them as a separate job, similar to re-running the job multiple times with only 1 line of contrast information? As a test with the test data, couldn't we simple re-use the same line as a second contrast to see that it launches 2 ANOTA2SEQ jobs?

As a side note, this also failed because anota2seq requires the levels in the contrast file to be sorted, but this has already been addressed here: nf-core/modules#7395.

Many thanks!

The text was updated successfully, but these errors were encountered:

pinin4fjords · 2025-01-30T13:58:12Z

I think the linked bug report has to do with replicate numbers, so may not be directly relevant.

But the intended functionality is, I think, as you suggest- the process should run multiple times, once for each contrast. We need to understand why that isn't happening.

Could you post the contrast file used here please, and the associated nextflow logs for an example run? I'd like to discount the possibility of the splitting logic failing due to e.g. bad line endings

FelixKrueger · 2025-01-30T15:58:11Z

Here is the contrast file. To me it looks like ASCII text, the last line as now newline character.

contrast_Amiri.csv

This is the nextflow.log

nf-hIr0jcq0zluyJ.log

pinin4fjords · 2025-01-30T16:25:14Z

OK, I suspect there's some confusion here from anota2seq's messaging. The process is definitely just receiving information for one contrast:

  opt <- list(
      output_prefix = ifelse('null' == 'null', 'KI_LIF_vs_WT', 'null'),
      count_file = 'salmon.merged.gene_counts_length_scaled.tsv',
      sample_file = 'samplesheet_anota2seq_Amiri.csv',
      sample_treatment_col = 'treatment',
      reference_level = 'WT_LIF',
      target_level = 'KI_LIF',
      sample_id_col = "sample",

So that's not the issue.

naiarabediaga · 2025-01-30T16:54:41Z

Issue #66 reports two problems. The first issue relates to the pipeline's supposed limitation in handling more than one comparison. As you mentioned, it successfully processes the first contrast, as shown in the lines you've shared. That said, although I haven’t been able to test whether it can handle two comparisons—since the program crashes when attempting to run the first one—based on what I see in the anota2seqrun.r script, it doesn’t seem to support multiple comparisons. If that’s the case, I believe it would be useful to introduce this functionality, allowing the pipeline to process as many comparisons as specified in the contrast file. The second issue in #66 pertains to the number of replicates. I suspect this may be a limitation of Anota2seq, that is ok. However, if that’s the case, it would be helpful to include this in the documentation. Thank you!

pinin4fjords · 2025-01-30T17:00:31Z

I think this is basically the same as the issue solved in #91 in response to #90.

pinin4fjords · 2025-01-30T17:01:12Z

Please do make PRs to documentation, happy to review.

naiarabediaga · 2025-01-30T17:08:44Z

Thank you Jonathan, I am sorry if I have misunderstood something, but I am not sure that setting opt$subset_to_contrast_samples to FALSE will fix the multiple comparison issue. Do you think that the pipeline will perform two comparisons (i.e. run nota2seqrun.r twice) given the contrast matrix below? id,variable,reference,target,batch,pair KI_LIF_vs_WT,treatment,WT_LIF,KI_LIF,,pair KI_LIF2i_vs_WT,treatment,WT_LIF2i,KI_LIF2i,,pair El jue, 30 ene 2025 a las 17:00, Jonathan Manning ***@***.***>) escribió:

…

I *think* this is basically the same as the issue solved in #91 <#91> in response to #90 <#90>. — Reply to this email directly, view it on GitHub <#89 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFTMG6MWXKYQGAY4WBOTZWT2NJLELAVCNFSM6AAAAABWFJQWMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRVGA3DENJTHE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Naiara Garcia Bediaga, PhD

pinin4fjords · 2025-01-30T17:24:07Z

@naiarabediaga the point is that I think you're misinterpreting the error- the messages from anota2seq are confusing and this has nothing to do with the contrasts as the workflow understands them.

If you look at the code that produces this error it's this:

        if (dim(contrasts)[2] != (nPheno - 1)) {
            if (dim(contrasts)[2] > (nPheno - 1)) {
                
                stop("Too many custom contrasts supplied.\nPlease check your contrast matrix.\n")
            }
            if (dim(contrasts)[2] < (nPheno - 1)) {
                
                stop("Too few custom contrasts supplied.\nPlease check your contrast matrix.\n")
            }
        }

We're always specifying a single contrast- the module has no capacity to do anything else - so dim(contrasts)[2] is 1. So what this is saying is that there can only be 2 levels in the phenotype vector. That means we need to subset the sample sheet to only the samples for the contrast at hand, which is what we force in #91.

naiarabediaga · 2025-01-30T17:51:04Z

Indeed, the messages from anota2seq can be quite confusing. I agree, the error message 'T*oo few custom contrasts supplied. Please check your contrast matrix'* (see below) is unrelated to the issue of multiple comparisons, and has to do with the "leveling" of the contrast matrix mentioned in #68. I think this has already been addressed and it was closed today. [image: Screenshot 2025-01-30 at 17.30.55.png] The opt$subset_to_contrast_samples was another important issue that made anota2seq crash, but I think you have already solved it , right? Both the table of counts and sample sheet m*ust be subset to include only the samples involved in the contrast.* Regarding the multiple comparisons issue, it’s unrelated to the errors we’ve been encountering. It’s more about how certain lines in the documentation led me to believe the pipeline could somehow loop through more than one comparison ( see below) *"To carry out this analysis, the pipeline must be supplied with one or more ‘contrasts’ describing the comparison to be made."* But I am happy with one comparison, if this functionality (something like a loop for more than one comparison 😬,) cannot be introduced. Thank you so much! El jue, 30 ene 2025 a las 17:24, Jonathan Manning ***@***.***>) escribió:

…

@naiarabediaga <https://github.com/naiarabediaga> the point is that I think you're misinterpreting the error- the messages from anota2seq are confusing and this has nothing to do with the contrasts as the workflow understands them. If you look at the code that produces this error <https://rdrr.io/bioc/anota2seq/src/R/anota2seqInternalFunctions.R> it's this: if ([dim](https://rdrr.io/r/base/dim.html)([contrasts](https://rdrr.io/r/stats/contrasts.html))[2] != (nPheno - 1)) { if ([dim](https://rdrr.io/r/base/dim.html)([contrasts](https://rdrr.io/r/stats/contrasts.html))[2] > (nPheno - 1)) { [stop](https://rdrr.io/r/base/stop.html)("Too many custom contrasts supplied.\nPlease check your contrast matrix.\n") } if ([dim](https://rdrr.io/r/base/dim.html)([contrasts](https://rdrr.io/r/stats/contrasts.html))[2] < (nPheno - 1)) { [stop](https://rdrr.io/r/base/stop.html)("Too few custom contrasts supplied.\nPlease check your contrast matrix.\n") } } We're always specifying a single contrast- the module has no capacity to do anything else. So what this is saying is that there can *only* be 2 levels in the phenotype vector. That means we need to subset the sample sheet to only the samples for the contrast at hand, which is what we force in #91 <#91>. — Reply to this email directly, view it on GitHub <#89 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFTMG6OOHM62OVHMLQSONTD2NJN45AVCNFSM6AAAAABWFJQWMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRVGEYTONJZHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Naiara Garcia Bediaga, PhD

pinin4fjords · 2025-01-30T18:03:56Z

@naiarabediaga the documentation is correct, the pipeline will indeed loop over the multiple contrasts you provide, by running multiple iterations of the ANOTA2SEQ process. It does that by splitting the contrasts file you provide. The process itself only works with a single contrast (not a contrast file) at once is all.

It died here because of the problem with the first iteration is all. Once that issue is solved you should see those multiple contrasts being analysed, one by one.

FelixKrueger · 2025-01-31T15:09:14Z

The multi-contrast issue has now been fixed (#94), and is making its way into the 1.1.0 release as we speak.

I think we can close this issue now.

FelixKrueger added the enhancement New feature or request label Jan 30, 2025

pinin4fjords changed the title ~~Perform translational efficiency analysis with anota2seq for several contrasts~~ anota2seq fails with more than 2 levels in the samplesheet phenotype / contrast variable Jan 30, 2025

FelixKrueger closed this as completed Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

anota2seq fails with more than 2 levels in the samplesheet phenotype / contrast variable #89

anota2seq fails with more than 2 levels in the samplesheet phenotype / contrast variable #89

FelixKrueger commented Jan 30, 2025 •

edited

Loading

pinin4fjords commented Jan 30, 2025 •

edited

Loading

FelixKrueger commented Jan 30, 2025

pinin4fjords commented Jan 30, 2025

naiarabediaga commented Jan 30, 2025 via email

pinin4fjords commented Jan 30, 2025

pinin4fjords commented Jan 30, 2025

naiarabediaga commented Jan 30, 2025 via email

pinin4fjords commented Jan 30, 2025 •

edited

Loading

naiarabediaga commented Jan 30, 2025 via email

pinin4fjords commented Jan 30, 2025 •

edited

Loading

FelixKrueger commented Jan 31, 2025

anota2seq fails with more than 2 levels in the samplesheet phenotype / contrast variable #89

anota2seq fails with more than 2 levels in the samplesheet phenotype / contrast variable #89

Comments

FelixKrueger commented Jan 30, 2025 • edited Loading

Description of feature

pinin4fjords commented Jan 30, 2025 • edited Loading

FelixKrueger commented Jan 30, 2025

pinin4fjords commented Jan 30, 2025

naiarabediaga commented Jan 30, 2025 via email

pinin4fjords commented Jan 30, 2025

pinin4fjords commented Jan 30, 2025

naiarabediaga commented Jan 30, 2025 via email

pinin4fjords commented Jan 30, 2025 • edited Loading

naiarabediaga commented Jan 30, 2025 via email

pinin4fjords commented Jan 30, 2025 • edited Loading

FelixKrueger commented Jan 31, 2025

FelixKrueger commented Jan 30, 2025 •

edited

Loading

pinin4fjords commented Jan 30, 2025 •

edited

Loading

pinin4fjords commented Jan 30, 2025 •

edited

Loading

pinin4fjords commented Jan 30, 2025 •

edited

Loading