Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different stats for protein mode #18

Open
arslan9732 opened this issue Jan 3, 2024 · 2 comments
Open

Different stats for protein mode #18

arslan9732 opened this issue Jan 3, 2024 · 2 comments
Labels
question Further information is requested

Comments

@arslan9732
Copy link

Hi,
I ran compleasm and BUSCO using protein mode on my annotated genome. Compleasm's result is not good. Is there a change in your hmmsearch or something else? Here are the results:

Compleasm:

S:74.25%,1727
D:20.42%, 475
F:1.98%, 46
M:3.35%, 78
N: 2326

BUSCO

 C:95.3%[S:84.3%,D:11.0%],F:1.2%,M:3.5%,n:2326
        2216    Complete BUSCOs (C)
        1961    Complete and single-copy BUSCOs (S)
        255     Complete and duplicated BUSCOs (D)
        29      Fragmented BUSCOs (F)
        81      Missing BUSCOs (M)
        2326    Total BUSCO groups searched
@huangnengCSU
Copy link
Owner

Hi @arslan9732,
It's interesting. Could you share the protein file and also which lineage do you use? So I can figure out what happened.
Thanks!

@huangnengCSU huangnengCSU added the bug Something isn't working label Jan 8, 2024
@arslan9732
Copy link
Author

Hi @huangnengCSU,
Sorry, I can't share my file here. But I also tried with a public data set and showed the same behavior. I used Arabidopsis Thalaina protein file https://ftp.ebi.ac.uk/ensemblgenomes/pub/release-57/plants/fasta/arabidopsis_thaliana/pep/Arabidopsis_thaliana.TAIR10.pep.all.fa.gz
The results are:
Compleasm

./compleasm.py protein -p ATH.faa -l eudicots -t 50 -o ATH-comp -L /mnt/bin/minibusco/mb_downloads

S: 51.42%,1196
D: 47.98%,1116
F: 0.34%,8
M: 0.26%,6
N: 2326

BUSCO

busco -i ATH.faa -l eudicots_odb10 --download_path /mnt/data/arslan/tool/busco_download/ -o ATH-busco -m protein -c 50 -f
        --------------------------------------------------
        |Results from dataset eudicots_odb10              |
        --------------------------------------------------
        |C:99.8%[S:59.5%,D:40.3%],F:0.0%,M:0.2%,n:2326    |
        |2320   Complete BUSCOs (C)                       |
        |1383   Complete and single-copy BUSCOs (S)       |
        |937    Complete and duplicated BUSCOs (D)        |
        |0      Fragmented BUSCOs (F)                     |
        |6      Missing BUSCOs (M)                        |
        |2326   Total BUSCO groups searched               |
        --------------------------------------------------

@huangnengCSU huangnengCSU added question Further information is requested and removed bug Something isn't working labels Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants