Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional fasta with GENCODE annotation results in biotype error #604

Closed
j-andrews7 opened this issue Apr 23, 2021 · 1 comment
Closed
Labels
bug Something isn't working

Comments

@j-andrews7
Copy link

Description of the bug

Passing an additional FASTA to add to an annotation (e.g. ERCC spike-ins) results in their "type" in the resulting GTF to be set as "gene_biotype". This is an issue when using a GENCODE annotation and the biotype QC, as their --featurecounts_group_type is "gene_type".

Steps to reproduce

Run with a GENCODE annotation file and provide an additional fasta. Set --featurecounts_group_type "gene_type". This results in an error as follows:

//================================= Running ==================================\\
  ||                                                                            ||
  || Load annotation file HG19_ERCC92.gtf ...                                   ||
  
  ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
  The specified gene identifier attribute is 'gene_type' 
  An example of attributes included in your GTF annotation is 'exon_id "ERCC-00002.1"; exon_number "1"; gene_biotype "transgene"; gene_id "ERCC-00002_gene"; gene_name "ERCC-00002_gene"; gene_source "custom"; transcript_id "ERCC-00002_gene"; transcript_name "ERCC-00002_gene";' 
  The program has to terminate.

Expected behaviour

The ideal behavior would be to set the gene_biotype field to gene_type during the GTF building step if the --encode flag is provided.

System

  • Hardware: HPC
  • Executor: lsf

Container engine

  • Engine: singularity
  • version: dev (v3.1)
  • Image tag: nfcore/rnaseq:dev
@j-andrews7 j-andrews7 added the bug Something isn't working label Apr 23, 2021
@drpatelh drpatelh mentioned this issue Apr 23, 2021
drpatelh added a commit that referenced this issue Apr 23, 2021
@cutleraging
Copy link

Sorry to re-open this but I am having a similar issue. The difference is that I am not using a gencode annotation.

Command:

$ nextflow run nf-core/rnaseq \
-r 3.9 \
-profile singularity \
--outdir /gs/gsfs0/users/vijg-lab/2022-Ronnie/DNA-RNA_ERCC_test/sequencing/3_nf-core-rnaseq_ERCC \
--input samplesheet.csv \
--email ronald.cutler@einsteinmed.edu \
--with_umi FALSE \
--genome GRCh38 \
--aligner star_rsem \
--save_unaligned TRUE \
--max_cpus 16 \
--max_memory 128.GB \
--additional_fasta /gs/gsfs0/users/rcutler/References/ERCC92.fa \
--save_reference TRUE

Error:

//================================= Running ==================================\\
 ||                                                                            ||
 || Load annotation file GRCh38_ERCC92.gtf ...                                 ||
 
 ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
 The specified gene identifier attribute is 'gene_biotype' 
 An example of attributes included in your GTF annotation is 'gene_id "DDX11L1"; gene_name "DDX11L1"; transcript_id "rna0"; tss_id "TSS31672";' 
 The program has to terminate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants