Extract angiosperm order and family names from web pages #1

rafelafrance · 2021-09-08T20:15:40Z

We need to filter the iDigBio information to only include angiosperm data (among other things). Two web pages have a fairly exhaustive list of this information along with some alternate names. Alternate names will be very helpful for older data because the names will change over time.

We need to write a script that will download the pages and then extract the data from them and write them to a text file that can be read programmatically. I'm OK with either CSV or JSON-lines output. Links to the pages:

We need to be able to run this script (or jupyter notebook) repeatedly... Or more accurately all of us need to be able to run this script to gather the information ourselves.

Note that I have had success using the BeautifulSoup4 python library. I just used it to parse a checklist for lice hosts.

rafelafrance assigned robgur, rafelafrance and jidec Sep 14, 2021

rafelafrance unassigned robgur and jidec Nov 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract angiosperm order and family names from web pages #1

Extract angiosperm order and family names from web pages #1

rafelafrance commented Sep 8, 2021

Extract angiosperm order and family names from web pages #1

Extract angiosperm order and family names from web pages #1

Comments

rafelafrance commented Sep 8, 2021