On the inconsistency of Taxid and BERTax taxonomy labels and the calculation of evaluation metrics for AveP. #14

NickShanyt · 2023-12-07T07:48:16Z

Hi!
I'm interested in your work and I'm trying to reproduce the results on the data you released, but I'm having some problems.

1, The released sequence data contains taxid, and I used NCBI to map these taxids into taxonomic classification, and I got the corresponding taxonomic level for each sequence. However, many of these taxonomic labels obtained cannot correspond to those labels in the BERTax model(5 superkingdom，44 phylum，156 genus), and some of them I have corrected manually.

Although I have done the correction in the final dataset, the genus level correction is a bit difficult in similar dataset and non-similar dataset. I would like to ask, is this an objective problem right? Is there any possible solution?

2, I would also like to ask if the Accuracy and AveP metrics mentioned in the paper are accuracy and precision as we know them? Use from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score is it possible to calculate the same metrics mentioned in the paper?

Thank you for your work.

The text was updated successfully, but these errors were encountered:

f-kretschmer · 2023-12-14T10:53:42Z

Sorry for the late answer.

Since we did our evaluations, the NCBI taxonomy has likely had some changes. Here is a taxdump for the version we used: https://upload.uni-jena.de/data/656deff28d9cd2.73093822/taxdump.tar.gz. It can be used with ete3 (https://github.com/f-kretschmer/bertax_training/blob/master/utils/tax_entry.py).
The average precision was calculated based on micro average Precision-Recall-curves (sklearn.metrics.average_precision_score). For the accuracy, we used a balanced version due to unbalanced data: taking the mean over all superkingdom classes, as described in the paper. Additionally, there are also confusion matrices for everything here: https://github.com/f-kretschmer/bertax/tree/master/confusion_matrices.

Hope this helps!

f-kretschmer · 2023-12-20T10:20:13Z

I'm sorry, I think the taxdump.tar.gz is the incorrect version, this must be the correct one: https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_archive/new_taxdump_2021-04-01.zip

yongrenr · 2024-05-13T06:59:55Z

Hello!
I'm interested in your work and I'm trying to reproduce the results on the data you released, but I'm having some problems.

"The average precision was calculated based on micro average Precision-Recall-curves (sklearn.metrics.average_precision_score). For the accuracy, we used a balanced version due to unbalanced data: taking the mean over all superkingdom classes, as described in the paper. Additionally, there are also confusion matrices for everything here: https://github.com/f-kretschmer/bertax/tree/master/confusion_matrices."
s
I wonder if this accuracy calculation is only used for superkingdom classes, and is it used in phylum classes and genus classes？

f-kretschmer · 2024-05-14T09:50:37Z

Hi!

Both the balanced accuracy calculation (sklearn.metrics.balanced_accuracy_score) and average precision calculation (sklearn.metrics.precision_score) is used for all ranks.

yongrenr · 2024-05-14T09:55:25Z

Hi!

Both the balanced accuracy calculation (sklearn.metrics.balanced_accuracy_score) and average precision calculation (sklearn.metrics.precision_score) is used for all ranks.
Thank you very much for your prompt reply!!!!
I'm curious about what kind of metrics are used in your PNAs paper？Thank u！!!!

f-kretschmer · 2024-05-14T10:06:04Z

In this table it is Average Precision (AveP), but we also have Precision-Recall-plots, ROC-curves and balanced accuracy.

yongrenr · 2024-05-14T10:13:17Z

In this table it is Average Precision (AveP), but we also have Precision-Recall-plots, ROC-curves and balanced accuracy.

So comprehensive!!
I have one more small question.On the Closely and Distantly datasets, the performance of the phyl is average. But why do gates work so well in the Final dataset? I'd like to ask if you have done anything else other than changing the number of attention heads.
Thank you very much!!!

f-kretschmer · 2024-05-14T11:38:14Z

The "final" dataset has a lot more data and also an additional output layer for "genus" prediction. Everything is detailed in the section "Performance of Final BERTax Model" in the PNAS Paper. See especially SFig. 2, which has a visualization trying to show why adding the genus layer leads to better performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On the inconsistency of Taxid and BERTax taxonomy labels and the calculation of evaluation metrics for AveP. #14

On the inconsistency of Taxid and BERTax taxonomy labels and the calculation of evaluation metrics for AveP. #14

NickShanyt commented Dec 7, 2023

f-kretschmer commented Dec 14, 2023

f-kretschmer commented Dec 20, 2023 •

edited

Loading

yongrenr commented May 13, 2024

f-kretschmer commented May 14, 2024

yongrenr commented May 14, 2024

f-kretschmer commented May 14, 2024

yongrenr commented May 14, 2024

f-kretschmer commented May 14, 2024

On the inconsistency of Taxid and BERTax taxonomy labels and the calculation of evaluation metrics for AveP. #14

On the inconsistency of Taxid and BERTax taxonomy labels and the calculation of evaluation metrics for AveP. #14

Comments

NickShanyt commented Dec 7, 2023

f-kretschmer commented Dec 14, 2023

f-kretschmer commented Dec 20, 2023 • edited Loading

yongrenr commented May 13, 2024

f-kretschmer commented May 14, 2024

yongrenr commented May 14, 2024

f-kretschmer commented May 14, 2024

yongrenr commented May 14, 2024

f-kretschmer commented May 14, 2024

f-kretschmer commented Dec 20, 2023 •

edited

Loading