Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

anota2seq fails with more than 2 levels in the samplesheet phenotype / contrast variable #89

Closed
FelixKrueger opened this issue Jan 30, 2025 · 11 comments
Labels
enhancement New feature or request

Comments

@FelixKrueger
Copy link
Contributor

FelixKrueger commented Jan 30, 2025

Description of feature

Background: We used a public dataset with matched Ribo-seq and RNA-seq and tried to get the anota2seq step to work. [by we I mostly mean my colleague @naiarabediaga]

As contrast file we used the contrast file:

id,variable,reference,target,batch,pair
KI_LIF_vs_WT,treatment,WT_LIF,KI_LIF,,pair
KI_LIF2i_vs_WT,treatment,WT_LIF2i,KI_LIF2i,,pair

This fails for the reason that anota2seq only accepts one contrast per file, which had already been pointed out in this bug report. Splitting the contrast files into two single files has the potential to work:

contrast_Amiri1.csv
::::::::::::::
id,variable,reference,target,batch,pair
KI_LIF_vs_WT,treatment,WT_LIF,KI_LIF,,pair

contrast_Amiri2.csv
::::::::::::::
id,variable,reference,target,batch,pair
KI_LIF2i_vs_WT,treatment,WT_LIF2i,KI_LIF2i,,pair

By looking at the anota2seq job that gets run, it looks like the Nextflow logic extracts the contrast information from the contrast file, and submits this as a meta flag to the run ([id:KI_LIF_vs_WT, variable:treatment, reference:WT_LIF, target:KI_LIF, batch:, pair:pair]):

'NFCORE_RIBOSEQ:RIBOSEQ:ANOTA2SEQ_ANOTA2SEQRUN ([id:KI_LIF_vs_WT, variable:treatment, reference:WT_LIF, target:KI_LIF, batch:, pair:pair])'

I wonder if the logic could be changed within ANOTA2SEQ_ANOTA2SEQRUN to extract these information for each line and submit them as a separate job, similar to re-running the job multiple times with only 1 line of contrast information? As a test with the test data, couldn't we simple re-use the same line as a second contrast to see that it launches 2 ANOTA2SEQ jobs?

As a side note, this also failed because anota2seq requires the levels in the contrast file to be sorted, but this has already been addressed here: nf-core/modules#7395.

Many thanks!

@FelixKrueger FelixKrueger added the enhancement New feature or request label Jan 30, 2025
@pinin4fjords
Copy link
Member

pinin4fjords commented Jan 30, 2025

I think the linked bug report has to do with replicate numbers, so may not be directly relevant.

But the intended functionality is, I think, as you suggest- the process should run multiple times, once for each contrast. We need to understand why that isn't happening.

Could you post the contrast file used here please, and the associated nextflow logs for an example run? I'd like to discount the possibility of the splitting logic failing due to e.g. bad line endings

@FelixKrueger
Copy link
Contributor Author

Here is the contrast file. To me it looks like ASCII text, the last line as now newline character.

contrast_Amiri.csv

This is the nextflow.log

nf-hIr0jcq0zluyJ.log

@pinin4fjords
Copy link
Member

OK, I suspect there's some confusion here from anota2seq's messaging. The process is definitely just receiving information for one contrast:

  opt <- list(
      output_prefix = ifelse('null' == 'null', 'KI_LIF_vs_WT', 'null'),
      count_file = 'salmon.merged.gene_counts_length_scaled.tsv',
      sample_file = 'samplesheet_anota2seq_Amiri.csv',
      sample_treatment_col = 'treatment',
      reference_level = 'WT_LIF',
      target_level = 'KI_LIF',
      sample_id_col = "sample",

So that's not the issue.

@naiarabediaga
Copy link

naiarabediaga commented Jan 30, 2025 via email

@pinin4fjords
Copy link
Member

I think this is basically the same as the issue solved in #91 in response to #90.

@pinin4fjords
Copy link
Member

Please do make PRs to documentation, happy to review.

@naiarabediaga
Copy link

naiarabediaga commented Jan 30, 2025 via email

@pinin4fjords
Copy link
Member

pinin4fjords commented Jan 30, 2025

@naiarabediaga the point is that I think you're misinterpreting the error- the messages from anota2seq are confusing and this has nothing to do with the contrasts as the workflow understands them.

If you look at the code that produces this error it's this:

        if (dim(contrasts)[2] != (nPheno - 1)) {
            if (dim(contrasts)[2] > (nPheno - 1)) {
                
                stop("Too many custom contrasts supplied.\nPlease check your contrast matrix.\n")
            }
            if (dim(contrasts)[2] < (nPheno - 1)) {
                
                stop("Too few custom contrasts supplied.\nPlease check your contrast matrix.\n")
            }
        }

We're always specifying a single contrast- the module has no capacity to do anything else - so dim(contrasts)[2] is 1. So what this is saying is that there can only be 2 levels in the phenotype vector. That means we need to subset the sample sheet to only the samples for the contrast at hand, which is what we force in #91.

@pinin4fjords pinin4fjords changed the title Perform translational efficiency analysis with anota2seq for several contrasts anota2seq fails with more than 2 levels in the samplesheet phenotype / contrast variable Jan 30, 2025
@naiarabediaga
Copy link

naiarabediaga commented Jan 30, 2025 via email

@pinin4fjords
Copy link
Member

pinin4fjords commented Jan 30, 2025

@naiarabediaga the documentation is correct, the pipeline will indeed loop over the multiple contrasts you provide, by running multiple iterations of the ANOTA2SEQ process. It does that by splitting the contrasts file you provide. The process itself only works with a single contrast (not a contrast file) at once is all.

It died here because of the problem with the first iteration is all. Once that issue is solved you should see those multiple contrasts being analysed, one by one.

@FelixKrueger
Copy link
Contributor Author

The multi-contrast issue has now been fixed (#94), and is making its way into the 1.1.0 release as we speak.

I think we can close this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants