-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
track images referenced inside of (iDigBio) Darwin Core archives #71
Comments
More examples: record: shows thumbnail: https://api.idigbio.org/v2/media/ab0e3f3d-b758-418b-8cf1-ed85d893fd65?size=thumbnail webview: clicking on the thumbnail takes to: that includes iDigBio webview image: clicking on the webview image links to the original, raw image: |
note that - redirects to: https://s.idigbio.org/idigbio-images-prod-webview/400c92a515dbeefb3eef8526dbcbb5e2.jpg resolve to same content with hash://sha256/5b960b8282c39f835d65b8574ef1c02a0ed387802f57077250ce926678f11f3b . Similarly, resolves to with hash://sha256/37b4504da1b471f35acd0b39a3e7bb5a3f711d60d2eec22220fffc79e0d69b15 |
Also, https://api.idigbio.org/v2/media/ab0e3f3d-b758-418b-8cf1-ed85d893fd65?size=fullsize resolves to https://s.idigbio.org/idigbio-images-prod-fullsize/400c92a515dbeefb3eef8526dbcbb5e2.jpg with hash://sha256/e35df5a12bc6b4f977b815c9ec5d35dedbab1ba01b6aaef23db10ecc4ca4d7c8 And, the raw original image at http://data.huh.harvard.edu/23097057-6561-4dbe-81ec-1be2133f1b7d/image has same hash as the fullsize image served by iDigBio: hash;//sha256/e35df5a12bc6b4f977b815c9ec5d35dedbab1ba01b6aaef23db10ecc4ca4d7c8 With byte count: 37802973 |
related api call https://api.idigbio.org/v2/view/records/c98df68c-a32b-4c8e-b9e8-e57d20e67dea {
"data": {
"dcterms:accessRights": "https://huh.harvard.edu/access-digital-reproductions-works-public-domain",
"dcterms:language": "en",
"dcterms:license": "https://huh.harvard.edu/pages/use",
"dcterms:modified": "2020-02-11 16:50:48.0",
"dcterms:references": "http://data.huh.harvard.edu/23097057-6561-4dbe-81ec-1be2133f1b7d/image",
"dcterms:rightsHolder": "President and Fellows of Harvard College",
"dcterms:type": "http://purl.org/dc/dcmitype/PhysicalObject",
"dwc:Identification": [
{
"coreid": "23097057-6561-4dbe-81ec-1be2133f1b7d",
"dwc:family": "Orchidaceae",
"dwc:genus": "Aa",
"dwc:scientificName": "Aa fiebrigii (Schlechter) Schlechter",
"dwc:scientificNameAuthorship": "(Schlechter) Schlechter",
"dwc:specificEpithet": "fiebrigii",
"dwc:taxonRank": "Species"
}
],
"dwc:ResourceRelationship": [
{
"coreid": "23097057-6561-4dbe-81ec-1be2133f1b7d",
"dwc:relatedResourceID": "http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/23097057-6561-4dbe-81ec-1be2133f1b7d",
"dwc:relationshipOfResource": "sameAs"
}
],
"dwc:associatedMedia": "[see Simple Media extension]",
"dwc:basisOfRecord": "PreservedSpecimen",
"dwc:catalogNumber": "barcode-02162424",
"dwc:collectionCode": "AMES",
"dwc:collectionID": "urn:lsid:biocol.org:col:15408",
"dwc:continent": "South America",
"dwc:country": "Argentina",
"dwc:countryCode": "AR",
"dwc:datasetName": "Harvard University Herbaria: All Records",
"dwc:disposition": "in collection",
"dwc:dynamicProperties": "{\"huh_taxonomic_group\": \"Vascular\", \"huh_project_id\": 9, \"huh_project_name\": \"Plants on Edge/Endless Forms\"}",
"dwc:eventDate": "1927-01",
"dwc:family": "Orchidaceae",
"dwc:fieldNumber": "4782",
"dwc:genus": "Aa",
"dwc:higherGeography": "South America;Argentina;Tucumán;",
"dwc:institutionCode": "AMES",
"dwc:institutionID": "urn:lsid:biocol.org:col:15408",
"dwc:locality": "Dept. Chichigasti, Santa Rosa.",
"dwc:month": "1",
"dwc:occurrenceID": "23097057-6561-4dbe-81ec-1be2133f1b7d",
"dwc:occurrenceStatus": "present",
"dwc:otherCatalogNumbers": "AMES-accession-39219",
"dwc:preparations": "Sheet",
"dwc:recordNumber": "4782",
"dwc:recordedBy": "Sant. Venturi",
"dwc:reproductiveCondition": "NotDetermined",
"dwc:scientificName": "Aa fiebrigii (Schlechter) Schlechter",
"dwc:scientificNameAuthorship": "(Schlechter) Schlechter",
"dwc:sex": "undetermined",
"dwc:specificEpithet": "fiebrigii",
"dwc:stateProvince": "Tucumán",
"dwc:verbatimElevation": "3600 m.",
"dwc:verbatimLocality": "Dept. Chichigasti, Santa Rosa.",
"dwc:year": "1927",
"id": "23097057-6561-4dbe-81ec-1be2133f1b7d"
},
"etag": "e5133811b819f5c82fd3c66c027a97dfc00de4dc",
"links": {
"mediarecords": [
"https://api.idigbio.org/v2/view/mediarecord/ab0e3f3d-b758-418b-8cf1-ed85d893fd65"
],
"recordsets": [
"https://api.idigbio.org/v2/view/recordsets/7450a9e3-ef95-4f9e-8260-09b498d2c5e6"
]
},
"modified": "2020-08-23T07:11:28.837888",
"recordIds": [
"7450a9e3-ef95-4f9e-8260-09b498d2c5e6\\23097057-6561-4dbe-81ec-1be2133f1b7d",
"7450a9e3-ef95-4f9e-8260-09b498d2c5e6\\http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/23097057-6561-4dbe-81ec-1be2133f1b7d"
],
"type": "records",
"uuid": "c98df68c-a32b-4c8e-b9e8-e57d20e67dea",
"version": 0
} and https://api.idigbio.org/v2/view/mediarecords/ab0e3f3d-b758-418b-8cf1-ed85d893fd65 {
"data": {
"coreid": "23097057-6561-4dbe-81ec-1be2133f1b7d",
"dcterms:created": "2020-01-14 00:00:00.0",
"dcterms:format": "image/jpeg",
"dcterms:identifier": "http://data.huh.harvard.edu/23097057-6561-4dbe-81ec-1be2133f1b7d/image",
"dcterms:license": "https://huh.harvard.edu/access-digital-reproductions-works-public-domain",
"dcterms:references": "http://data.huh.harvard.edu/23097057-6561-4dbe-81ec-1be2133f1b7d",
"dcterms:rightsHolder": "President and Fellows of Harvard College",
"dcterms:type": "StillImage"
},
"etag": "f381092ccb300cf531878cb39ed3be62f19de77f",
"links": {
"records": [
"https://api.idigbio.org/v2/view/record/c98df68c-a32b-4c8e-b9e8-e57d20e67dea"
],
"recordsets": [
"https://api.idigbio.org/v2/view/recordsets/7450a9e3-ef95-4f9e-8260-09b498d2c5e6"
]
},
"modified": "2020-08-23T07:11:28.837888",
"recordIds": [
"7450a9e3-ef95-4f9e-8260-09b498d2c5e6\\media\\http://data.huh.harvard.edu/23097057-6561-4dbe-81ec-1be2133f1b7d/image"
],
"type": "mediarecords",
"uuid": "ab0e3f3d-b758-418b-8cf1-ed85d893fd65",
"version": 0
} |
Note that image retrieval is by md5 of the image:
with url pattern:
|
That that, for some reason, the iDigBio API says 502 internal server error, on serving beyond 100k items. See known issue iDigBio/idigbio-search-api#32 . |
Using newly introduced you can now track images inside darwin core archives as demonstrated in the UCSB-ICZ example at #148 (comment) - example for extracting image urls for UC Santa Barbara's @seltmann invertebrate zoology collection -
with
Now, tracking all image urls . . . would be:
So, putting it together, you'd be able to track the UCSB-IZC and its images using:
Originally posted by @jhpoelen in #148 (comment) |
Darwin Core archives may contain urls to images. These urls reference images that are typically stored outside of the darwin core archive.
Currently, Preston does not track images referenced inside darwin core archives.
Suggest to extend Preston to include image tracking.
Additionally, extend support to include tracking of iDigBio thumbnail or web optimized images in addition the raw image referenced in the indexed dataset. (separate into different issue if needed)
The text was updated successfully, but these errors were encountered: