Skip to content
This repository has been archived by the owner on Dec 18, 2023. It is now read-only.

Extract angiosperm order and family names from web pages #1

Open
rafelafrance opened this issue Sep 8, 2021 · 0 comments
Open

Extract angiosperm order and family names from web pages #1

rafelafrance opened this issue Sep 8, 2021 · 0 comments
Assignees

Comments

@rafelafrance
Copy link
Owner

We need to filter the iDigBio information to only include angiosperm data (among other things). Two web pages have a fairly exhaustive list of this information along with some alternate names. Alternate names will be very helpful for older data because the names will change over time.

We need to write a script that will download the pages and then extract the data from them and write them to a text file that can be read programmatically. I'm OK with either CSV or JSON-lines output. Links to the pages:

We need to be able to run this script (or jupyter notebook) repeatedly... Or more accurately all of us need to be able to run this script to gather the information ourselves.

Note that I have had success using the BeautifulSoup4 python library. I just used it to parse a checklist for lice hosts.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants