Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUSCO v4 - filename pattern change #1163

Closed
jmodlis opened this issue Apr 21, 2020 · 7 comments
Closed

BUSCO v4 - filename pattern change #1163

jmodlis opened this issue Apr 21, 2020 · 7 comments

Comments

@jmodlis
Copy link

jmodlis commented Apr 21, 2020

I am running BUSCO v4 and multiQC v1.8, and the BUSCO results are not included in my multiQC output. The folder I am directing multiQC to contains only QUAST and BUSCO results, and I have run multiQC on QUAST/BUSCO results previously (pre BUSCO v4) and both were included in the multiQC output.

I am wondering if it is related to the fact that I ran BUSCO with the auto lineage detect parameter, so now there are two short_summary files in a single results folders... They are short_summary.generic.[database].[filename].txt and short_summary.specific[database].[filename].txt

one of two BUSCO output files

# BUSCO version is: 4.0.6 
# The lineage dataset is: bacteria_odb10 (Creation date: 2019-06-26, number of species: 4085, number of BUSCOs: 124)
# Summarized benchmarking in BUSCO notation for asm.polished.miniasm.fa
# BUSCO was run in mode: genome

        ***** Results: *****

        C:77.4%[S:73.4%,D:4.0%],F:16.9%,M:5.7%,n:124       
        96      Complete BUSCOs (C)                        
        91      Complete and single-copy BUSCOs (S)        
        5       Complete and duplicated BUSCOs (D)         
        21      Fragmented BUSCOs (F)                      
        7       Missing BUSCOs (M)                         
        124     Total BUSCO groups searched                

log file

[INFO   ]         multiqc : This is MultiQC v1.8 (81d0983)
[INFO   ]         multiqc : Template    : default
[INFO   ]         multiqc : Searching   : ./../processedData/QC
[INFO   ]           quast : Found 47 reports
[INFO   ]         multiqc : Compressing plot data
[INFO   ]         multiqc : Report      : ../processedData/QC/multiQC/multiqc_report.html
[INFO   ]         multiqc : Data        : ../processedData/QC/multiQC/multiqc_data
[INFO   ]         multiqc : MultiQC complete

And yes, I only checked like a million times that there are actually BUSCO files in the QC folder...not going too crazy yet. ... ;)

Thanks
Jen

@ewels
Copy link
Member

ewels commented Apr 21, 2020

Hi @jmodlis,

Before I look into this further, could you have a quick read over https://multiqc.info/docs/#not-enough-samples-found please? If you run MultiQC with the -v flag (verbose mode) or have a look in the multiqc_data/multiqc.log file it may tell you that it is skipping or overwriting the logs for some reason (eg. if they have identical sample names).

If that's not the case, please could you try to find a minimal case of just two log files where one is parsed and one is not? If you could attach those to the issue then I'll be able to recreate the problem and fix it.

Thanks,

Phil

@jmodlis
Copy link
Author

jmodlis commented Apr 21, 2020

Ok, I have uploaded a file where it worked and a file where it didn't. It doesn't appear to be doing any overwriting, but maybe I am not looking in the right place.

The contents of the directory are a quast folder and a busco folder. In the failed log, it is looking at some of the files but it doesn't appear to look at the short summary files.

This is an example of the files/dirs in one of the sample directories within the busco dir:

auto_lineage
logs
prodigal_output
run_bacteria_odb10
run_burkholderiales_odb10
short_summary.generic.bacteria_odb10.GP_7_asm.polished.flye.txt
short_summary.specific.burkholderiales_odb10.GP_7_asm.polished.flye.txt

success_multiqc.log
fail_multiqc.log

Thanks,
Jen

@ewels
Copy link
Member

ewels commented Apr 21, 2020

Sorry, I meant BUSCO reports / log files (you can see which files are used under the docs: https://multiqc.info/docs/#busco).

Though from your directory listing there I can see that the pattern short_summary_* would not match your short summary files (they don't have the second underscore, instead a .).

Have you renamed any files in your analysis? You could try customising the BUSCO search pattern to match these files (docs):

sp:
    busco:
        fn: 'short_summary*'

Phil

@jmodlis
Copy link
Author

jmodlis commented Apr 21, 2020

You know, I saw that busco file pattern search last night but didn't notice the final underscore. It does look like the previous successful run had short_summary_* files. I did not change any file names in the newest BUSCO results, but I did run the auto-lineage detection option, and it is a newer version of BUSCO.

@ewels
Copy link
Member

ewels commented Apr 21, 2020

Ok, that sounds pretty likely then if BUSCO has updated slightly. Could you please confirm that everything does work if you change the file search pattern as above? Just paste that snippet into a file called multiqc_config.yaml in the directory where you are launching MultiQC from.

If it does indeed work then I'll update the default search pattern in MultiQC 👍

@ewels ewels changed the title BUSCO v4 support - not showing up in multiQC report BUSCO v4 - filename pattern change Apr 21, 2020
@jmodlis
Copy link
Author

jmodlis commented Apr 21, 2020

Yes, multiQC finds the BUSCO files if I change the default search pattern :)

One last thing I could check would be to run BUSCO without the lineage auto-detect option to make sure that the output files aren't different without that option enabled.

@ewels ewels closed this as completed in 974f770 Apr 22, 2020
@ewels
Copy link
Member

ewels commented Apr 22, 2020

It's a minor change so I don't mind either way really. Pattern updated above, will go out in the next release. In the short term you can use the config overwrite 👍

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants