Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastQC gives KeyError if report generated from an empty FastQ file #1129

Closed
elderberry-smells opened this issue Mar 18, 2020 · 6 comments
Closed
Labels
bug: core Bug in the main MultiQC code module: change
Milestone

Comments

@elderberry-smells
Copy link

elderberry-smells commented Mar 18, 2020

Description of bug:
running MultiQC in a snakemake pipeline, it should be joing up the log data for the fastqc reports, and trimmomatic reports (this is working). Snakemake sees the inputs fine, fastqc runs and generates an html and zip file for each fastq file, but the multiqc doesn't like the data fastqc is generating, throwing a "key error" as though the fastqc data is missing a column?

MultiQC Error log:

[INFO   ]         multiqc : This is MultiQC v1.8
[INFO   ]         multiqc : Template    : default
[INFO   ]         multiqc : Searching   : /home/bioinf/gbs_data/sample3/log
[INFO   ]     trimmomatic : Found 179 logs
[INFO   ]          fastqc : Found 368 reports
[ERROR  ]         multiqc : Oops! The 'fastqc' MultiQC module broke... 
  Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues 
  If possible, please include a log file that triggers the error - the last file found was:
    log/fastqc/SK-GBD-000435.1_fastqc.zip
============================================================
Module fastqc raised an exception: Traceback (most recent call last):
  File "/home/bioinf/anaconda3/envs/gbs/lib/python3.6/site-packages/multiqc/multiqc.py", line 546, in run
    output = mod()
  File "/home/bioinf/anaconda3/envs/gbs/lib/python3.6/site-packages/multiqc/modules/fastqc/fastqc.py", line 94, in __init__
    self.fastqc_general_stats()
  File "/home/bioinf/anaconda3/envs/gbs/lib/python3.6/site-packages/multiqc/modules/fastqc/fastqc.py", line 204, in fastqc_general_stats
    'avg_sequence_length': bs['avg_sequence_length'],
KeyError: 'avg_sequence_length'

File that triggers the error:

rule all:
    input:
        "log/multiqc_report.html"

rule fastqc:
    input:
        snps = snp_file,
        qc = "demultiplex/{sample}.{read}.fastq"
    output:
        "log/fastqc/{sample}.{read}_fastqc.html",
        "log/fastqc/{sample}.{read}_fastqc.zip"
    threads: 4
    shell:
        "fastqc -o log/fastqc/ -t {threads} {input.qc}"

rule multiqc:
    input:
        expand(["log/fastqc/{sample}.{read}_fastqc.html"], sample=samples, read = [1, 2])
    output:
        "log/multiqc_report.html"
    shell:
        "multiqc log -o log/"

MultiQC run details (please complete the following):

  • Command used to run MultiQC: multiqc log -o log/
  • MultiQC Version: 1.8
  • Operating System: Ubuntu
  • Python Version: 3.6.10
  • Method of MultiQC installation: conda

Additional context
Add any other context about the problem here.

@VGalata
Copy link

VGalata commented Mar 26, 2020

I had the same error and found out that some of my FASTQ files were empty (for whatever reason). Therefore, the FastQC reports did not contain any data which most likely triggered the key error. I fixed the issue with the FASTQ files and now everything runs through.

@maurya-anand
Copy link

maurya-anand commented Mar 27, 2020

I have the same error. Please suggest some workaround. Thanks.

[INFO   ]         multiqc : This is MultiQC v1.8
[INFO   ]         multiqc : Template    : default
Module fastqc raised an exception: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/multiqc/multiqc.py", line 546, in run
    output = mod()
  File "/usr/local/lib/python3.6/dist-packages/multiqc/modules/fastqc/fastqc.py", line 94, in __init__
    self.fastqc_general_stats()
  File "/usr/local/lib/python3.6/dist-packages/multiqc/modules/fastqc/fastqc.py", line 204, in fastqc_general_stats
    'avg_sequence_length': bs['avg_sequence_length'],
KeyError: 'avg_sequence_length'

@elderberry-smells
Copy link
Author

I had the same error and found out that some of my FASTQ files were empty (for whatever reason). Therefore, the FastQC reports did not contain any data which most likely triggered the key error. I fixed the issue with the FASTQ files and now everything runs through.

Thanks so much for this reply. That was the issue, I had a fastq file that was empty as a result of a demultiplex script and the sample not being in the library.

I removed the empty fastq and the program runs as expected.

@ewels
Copy link
Member

ewels commented Mar 27, 2020

Nice, thanks all! If someone has one of these FastQC reports (zipped), please could they attach it here? We could presumably add some code to MultiQC to check for this and give a nice warning message and skip the file. Would be better than an ugly KeyError exception and stopping for all further FastQC reports.

Phil

@ewels ewels reopened this Mar 27, 2020
@ewels ewels changed the title Ooops! 'fastqc' MultiQC module broke... FastQC gives KeyError if report generated from an empty FastQ file Mar 27, 2020
@elderberry-smells
Copy link
Author

empty_fastqc.zip

That is what the file looks like when it produces a fastqc report from an empty fastq file.

@ewels ewels added the fix label Mar 29, 2020
@ewels ewels added this to the MultiQC v1.9 milestone May 23, 2020
@ewels ewels added bug: core Bug in the main MultiQC code and removed fix labels May 28, 2020
@ewels ewels closed this as completed in 04244f1 May 28, 2020
@ewels
Copy link
Member

ewels commented May 28, 2020

Ok, thanks all - this is now fixed. Instead of falling over with a KeyError, MultiQC now throws a warning to the console but troops on and does the best it can with the report.

If you run on only empty FastQC reports then the report looks kind of strange 😅 But when mixed with normal samples it should behave more like you'd expect.

Amusingly, most of the FastQC modules give a passing status for an empty file..

Thanks all for your input on this - let me know if you find any problems 👍

Phil

ewels added a commit to MultiQC/test-data that referenced this issue May 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug: core Bug in the main MultiQC code module: change
Projects
None yet
Development

No branches or pull requests

4 participants