You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The salmon_tx2gene.py script is throwing an error of
Traceback (most recent call last):
File "/home/humebc/.nextflow/assets/nf-core/rnaseq/bin/salmon_tx2gene.py", line 88, in <module>
tx2gene(args.gtf, args.salmon, args.id, args.extra, args.output)
File "/home/humebc/.nextflow/assets/nf-core/rnaseq/bin/salmon_tx2gene.py", line 47, in tx2gene
gene_dict[attr_dict[gene_id]].append(attr_dict)
KeyError: 'gene_id'
This is being caused due to the fact that the "gene_id" key is not present in the key-value pairings of the .gtf file being used as input. This is only the case for a small number of the lines. For the majority, "gene_id" is present. I originally gave your pipeline a GFF file as input. The GFF file is this one: GCF_017654675.1_Xenopus_laevis_v10.1_genomic.gff.gz.
The offending line in the salmon_tx2gene.py is this one:
gene_dict[attr_dict[gene_id]].append(attr_dict)
Because the "gene_id" key is present in the vast majority of the lines, this line can be surrounded in a try: except: and the script will complete with no issue. E.g.:
Alternatively, i suppose this problem could be fixed in the part of the workflow that is responsible for converting the GFF file into a .gtf file by ensuring that every output line has a "gene_id" key value pairing.
The other option of checking during the conversion will be trickier I suppose because we are using GFFREAD to do that and not a custom script. If you can find a solution with that then we can try and add it but patched for now based on your initial suggestion.
Will close for now but feel free to re-open if things change. Cheers!
Check Documentation
I have checked the following places for your error:
Description of the bug
The salmon_tx2gene.py script is throwing an error of
This is being caused due to the fact that the "gene_id" key is not present in the key-value pairings of the .gtf file being used as input. This is only the case for a small number of the lines. For the majority, "gene_id" is present. I originally gave your pipeline a GFF file as input. The GFF file is this one: GCF_017654675.1_Xenopus_laevis_v10.1_genomic.gff.gz.
The offending line in the salmon_tx2gene.py is this one:
Because the "gene_id" key is present in the vast majority of the lines, this line can be surrounded in a try: except: and the script will complete with no issue. E.g.:
Alternatively, i suppose this problem could be fixed in the part of the workflow that is responsible for converting the GFF file into a .gtf file by ensuring that every output line has a "gene_id" key value pairing.
Steps to reproduce
Steps to reproduce the behaviour:
See above
Expected behaviour
I would expect the script to complete and produce the salmon_tx2gene.tsv.
nextflow.log
Log files
Have you provided the following extra information/files:
.nextflow.log
fileSystem
linux server
linux ubuntu
v3.3
Nextflow Installation
21.04.1.5556
Container engine
-Docker
Additional context
The text was updated successfully, but these errors were encountered: