Skip to content

Latest commit

 

History

History
17 lines (17 loc) · 1.47 KB

README.md

File metadata and controls

17 lines (17 loc) · 1.47 KB

Annotation Analysis

Crowdsourcing biocuration: The Community Assessment of Community Annotation with Ontologies (CACAO) Figure Generation Code

The input files and code used to generate the graphical figures in the CACAO manuscript are provided here.

Requirements

  • requirements.txt has all versioned python packages used to generate the figures. Conda was used as the package manager.

Data

  • The cacao_expanded_info.dat file is a modified gpad that is a precursor to the final quality-checked file sent to GO. Additional taxon information, as well as various CACAO-specific fields, have their own added columns. Like a GPAD, it is a tab-delimited file.
    • The taxon information was retrieved using ete3.
  • The cacao_dcnt-tinfo.txt and uniprot_dcnt-tinfo.txt files are results from the GOATOOLS analysis. The descendant count (dcnt) values for GO terms used in CACAO were calculated here.
  • The goa_uniprot_all_noiea_20200101.gaf is provided, but can also be located in the GO Data Archive.

Pie Charts

  • cacao_taxon_pie.py generates the taxonomy pie chart.
  • cacao_go_pie.py generates the GO aspect pie chart.

Descendant Count

  • cacao_dcnt.py generates the descendant count (dcnt) box plot comparison.

Notes

  • Code was formatted using Black