-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generate streaming json from preston extracted from dwca #148
Comments
This enables streaming of all of GBIF by saying:
where all.json.gz is a giant super huge file with all records. |
Some more specific examples - extract all scientificNames
with names.txt attached. top 10 names.txt
then combine with Nomer or other taxonomic name matching tools to do name alignment . |
generating names and their associated exact locations of their original source
with
the original line containing scientificName "Chalceus guaporensis Zanata & Toledo-Piza" |
example for extracting image urls for UC Santa Barbara's @seltmann invertebrate zoology collection -
with
Now, tracking all image urls . . . would be:
So, putting it together, you'd be able to track the UCSB-IZC and its images using:
|
initial support introduced in https://github.com/bio-guoda/preston/releases/tag/0.3.5 . Suggest to report future improvement in separate issue. |
To help increase access to biodiversity data, suggest to make Preston stream DwC-A records in json line-by-line.
with
The text was updated successfully, but these errors were encountered: