treeforall hackathon project: Given the request for a modestly sized tree for a large taxon T, support various useful ways to sample from taxon T, including (1) a random sample of species from T, (2) the species in T that have a genome in NCBI genomes, and (3) the top N species in T in terms of the number of occurrence records in iDigBio.
see the google doc https://docs.google.com/document/d/1E3QIxEYUu4Q6A3Dc_zJUoxb0O0vEWX88_2ptYW1iMjg
- choose N species randomly from T
- choose those species from N that have property A, e.g., has NCBI genome
- choose the top N species from T by relevance metric, e.g., counts in iDigBio
Data files. There is a README.md in the data directory
Instructions and documentation.
See the README.md in the perl directory. This contains scripts to obtain the induced subtree for any species in a named taxon that have genomes in NCBI.
See the README.md in the python directory.
Implementation of taxon sampling in Open Refine.
Python code to sample randomly from a taxon.
Utility code (Python) to read a csv file, invoke the OT match_names service, and add the resulting matches as a new column in the csv file.