BNAO Diet Scraper

This project was initially designed to scrape diet data from Birds of North America Online (BNAO), which is now Birds of the World. I use Scrapy Spiders in Python to extract text files with the relevant data plus additional functions in separate files to clean up the data for use later.

Scrapy Spiders

BDSfunctions_WebScrape.py (which is poorly named at the moment) is a spider that logs into Birds of the World via an American Ornithologists' Society (AOS) login, goes to any number of diet pages provided in a list of scientific names in a .txt file and extracts all text from the diet section. Then it puts text from each species into a separate row of a csv file. Still debugging.

BDSspider_ABACodes.py is a spider that extracts all 969 six-letter codes which are used by Birds of the World in the url of these species pages.

BDSspider_AllCodes.py is a spider that logs into Birds of the World via an AOS login, goes to the browse taxonomy page, and scrapes the urls into a text file. Eventually this will turn into a list of all species codes that can be used by the spider in BDSfunctions_WebScrape.py.

Other Files

BDSfunctions_ABACodes.py takes the output file from BDSspider_ABACodes.py, extracts the codes, and outputs a .txt file list of codes. Also has a function for extracting Parulidae codes specifically. Could be expanded to call specific families with user input eventually.

BDSfunctions_AllCodes.py takes the output file from BDSspider_AllCodes.py, extracts the codes, and outputs a .csv file with separate columns for codes, scientific names, and common names of all species in Birds of the World.

Author

Emily Webb, Biology PhD Candidate at Arizona State University

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BNAO Diet Scraper

Scrapy Spiders

Other Files

Author

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
BDSfunctions_ABACodes.py		BDSfunctions_ABACodes.py
BDSfunctions_AllCodes.py		BDSfunctions_AllCodes.py
BDSfunctions_WebScrape.py		BDSfunctions_WebScrape.py
BDSspider_ABACodes.py		BDSspider_ABACodes.py
BDSspider_AllCodes.py		BDSspider_AllCodes.py
README.md		README.md

TheEmilyAves/bnao-diet-scraper

Folders and files

Latest commit

History

Repository files navigation

BNAO Diet Scraper

Scrapy Spiders

Other Files

Author

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages