tanzanian-mammal-scraper

A set of scrapers to pull data/media from http://archive.fieldmuseum.org/tanzania, and output a CSV that is semi-ready for EMu-import. Some parsing may be required.

How to scrape a dichotomous key:

Run python3 scrape_key.py [starter-URL] [path/where/output/should/go/]

For example, to scrape the Skull Key, start from its 'general' page:

python3 scrape_key.py "http://archive.fieldmuseum.org/tanzania/SkullKey.asp?ID=2" output/

Key-scraper output CSV fields:

order = original URL-order, if needed
url = original url
page_type = "Option-page" for pages with dichotomous options/branches, or "Match-page" for matched/id'ed species-detail pages
opt_a_link = "Option A" URL, a child of the current 'url'
opt_a_img = "Option A" image, if any
opt_a_text = "Option A" text, related to type/trait descriptions
opt_b_link = "Option B" URL, a child of the current 'url'
opt_b_img = "Option B" image, if any
opt_b_text = "Option B" text, related to type/trait descriptions
other_images = Images from match-pages (or if more than 2 pages occur on option-pages)
taxon = Genus name from Match-page
match_text = body-text of a Match-page

How to scrape the species-list page:

Run python3 scrape_species.py [species list URL] [path/where/output/should/go/]

For example, to start from the "general" page of the Skull Key:

python3 scrape_species.py "http://archive.fieldmuseum.org/tanzania/Species_Home.asp" output/

EMu Records

2024-July: English records are imported to EMu

Multimedia

Group name: TAN website images (496)

Narratives

Group name: TAN Skin Key Options and Matches (535)

EMu / To do:

Prep & Import corresponding Swahili key pages
Prep & Import Skull Key pages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
input		input
output		output
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
get_common_names.py		get_common_names.py
parse_descriptions.py		parse_descriptions.py
scrape_images.py		scrape_images.py
scrape_skin_key.py		scrape_skin_key.py
scrape_skin_key_swa.py		scrape_skin_key_swa.py
scrape_skull_key.py		scrape_skull_key.py
scrape_skull_key_swa.py		scrape_skull_key_swa.py
scrape_species.py		scrape_species.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tanzanian-mammal-scraper

How to scrape a dichotomous key:

Key-scraper output CSV fields:

How to scrape the species-list page:

EMu Records

Multimedia

Narratives

EMu / To do:

About

Releases

Packages

Languages

License

fieldmuseum/tanzanian-mammal-scraper

Folders and files

Latest commit

History

Repository files navigation

tanzanian-mammal-scraper

How to scrape a dichotomous key:

Key-scraper output CSV fields:

How to scrape the species-list page:

EMu Records

Multimedia

Narratives

EMu / To do:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages