Skip to content

A notebook to try to scape the National Sea Grant Catalog for CA publications

License

Notifications You must be signed in to change notification settings

amandawhitmire/nsgl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scraping the National Sea Grant Library online catalog

Amanda Whitmire 🧜‍♀️

Background

The National Sea Grant Library (NSGL), which was housed on the Bay Campus of the University of Rhode Island, is no longer being funded and the entire collection is being deaccessioned. The collection consists of 2 print copies of each title published by every state Sea Grant program going back to the 1970's. Subject areas include research, trade and popular material on the scientific, economic, political and social aspects of the coastal zones of the U.S., including the Great Lakes. Many of these titles have only a handful of libraries holding them or the NSG library is the only library.

From the National Sea Grant Library website (Dec. 2022):

The NSGL was the digital library and official archive for NOAA’s Sea Grant documents from the 1960s through 2020. Its catalogue remains a comprehensive and deeply searchable collection of Sea Grant-funded documents from the over 30 programs and projects across the United States and Territories. This collection includes a wide variety of subjects including oceanography, marine education, aquaculture, fisheries, aquatic nuisance species, coastal hazards, seafood safety, limnology, coastal zone management, marine recreation, and law.

The online catalog provides global access to tens of thousands of full-text digital documents. The Pell Library at the University of Rhode Island has become the archive for Sea Grant — preserving and enabling access to information-rich documents and publications from the 1960s through 2020. For those documents that aren’t available electronically, anyone can request a PDF be created as part of the Digitization on Demand program.

Plan for the collection

The NOAA Central Library is taking existing digital copies of titles into the NOAA Institutional Repository, but the scans are not up to current standards. It is not clear how much of the collection has been scanned, but initial sampling indicates that it is less than half of the collection (excluding peer-reviewed articles in the analysis b/c they are available via other sources).

There are 2 copies of each print title. As of now, Internet Archive will be taking the archival copy of the entire collection. The circulating copies have been offered in batches by state to any library willing to take them. Because of space limitations, URI will only be retaining the Rhode Island Sea Grant material. If no home is found for the 2nd copy of the items, they will go to the trash.

The Harold A. Miller Library, a branch of the Stanford Libraries located at Hopkins Marine Station, has received shipment of the entire California Sea Grant Collection. According to the National Sea Grant Catalog, there are 4,398 CA items between the 'California Sea Grant' and 'Southern California Sea Grant' programs, filling about 40 linear feet of shelf space according to the retiring NSGL Librarian. The librarian was unable to retrieve catalog records from Ye Olde Online Catalog, so this GH Repo is for code & files related to scraping the NSGL online catalog and processing the records.

Outputs

De-duplicated catalog records for all three states can be found in one main file: 'nsgl-all-records.csv'.

Files for individual states can be found in ~/catalogRecords-raw as:

  • 'nsgl-ca-records.csv' for California
  • 'nsgl-or-records.csv' for Oregon
  • 'nsgl-wa-records.csv' for Washington

The NOAA IR offers direct download of metadata from the catalog (:sparkles: happy :sparkles:). You can find information about this here. To download the Sea Grant Collection, use 'noaa%3A11' as the PID. Here are some example URLs:

You can find the data as a CSV file in this repo as 'NOAA-IR-metadata.csv'.

Repo Contents

├── catalogRecords-raw/
	├── a csv file for each state with the compiled catalog records
├── images/
	├── screenshots of various things (NSGL catalog, etc.)
├── itemDownloads/
	├── PDFs from when I was trying to get them. Incomplete. N=779
├── itemRecords-ca-csv/
	├── individual csv files for each catalog record scraped from NSGL for CA
├── itemRecords-or-csv/
	├── individual csv files for each catalog record scraped from NSGL for OR
├── itemRecords-wa-csv/
	├── individual csv files for each catalog record scraped from NSGL for WA
├── pages-searchResults_CA/
	├── txt files with item IDs & html pages for each page of 100 NSGL search results for CA
├── pages-searchResults_OR/
	├── txt files with item IDs & html pages for each page of 100 NSGL search results for OR
├── pages-searchResults_WA/
	├── txt files with item IDs & html pages for each page of 100 NSGL search results for WA
├──  buildCatalog.Rmd	R code for compiling the catalog from individual text files
├──  LICENSE.md
├──  NOAA-IR-metadata.csv	CSV file with metadata records for the Sea Grant Collection 
├──  nsgl-all-records.csv	De-duplicated catalog records for all three states in one CSV file. 11,451 rows
├──  nsgl-ca-catalogscrape_selenium.ipynb	Jupyter notebook for scraping CA records from teh NSGL online catalog
├──  nsgl-or-catalogscrape_selenium.ipynb	Jupyter notebook for scraping OR records from teh NSGL online catalog
├──  nsgl-wa-catalogscrape_selenium.ipynb	Jupyter notebook for scraping WA records from teh NSGL online catalog
├──  seagrant-ca-ids.txt	compiled from txt files in /pages-searchResults_CA
├──  seagrant-or-ids.txt	compiled from txt files in /pages-searchResults_OR
├──  seagrant-wa-ids.txt	compiled from txt files in /pages-searchResults_WA
└── README.md (this file)

Acknowledgements

  • Peter Broadwell, 🤩 Research Developer at the Center for Interdisciplinary Digital Research, Stanford University, for support with web scraping techniques
  • Claudia A. Engel, 😎 Academic Technology Specialist and Lecturer, Department of Anthropology, Stanford University, for helping me with merging CSV records in R
  • Deborah Mongeau, 🦉 Professor Emerita, University of Rhode Island Libraries & former National Sea Grant librarian, for taking care of this amazing collection.

MIT License

About

A notebook to try to scape the National Sea Grant Catalog for CA publications

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published