Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataverse sample in croissant format #232

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions datasets/dataverse/crosswalks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
### Crosswalk from OAI-ORE to "Croissant" Format

| OAI-ORE Property | "Croissant" Property |
|-----------------------------|----------------------------------|
| OAI-ORE `@context` | "Croissant" `@context` |
| OAI-ORE `@type` | "Croissant" `@type` |
| OAI-ORE `@id` | "Croissant" `@id` |
| OAI-ORE `dc:title` | "Croissant" `name` |
| OAI-ORE `dc:description` | "Croissant" `description` |
| OAI-ORE `dc:creator` | "Croissant" `citation:Depositor` |
| OAI-ORE `dcterms:modified` | "Croissant" `schema:dateModified`|
| OAI-ORE `dcterms:created` | "Croissant" `schema:datePublished`|
| OAI-ORE `dc:license` | "Croissant" `license` |
| OAI-ORE `dcterms:hasPart` | "Croissant" `schema:hasPart` |
| OAI-ORE `dcterms:isPartOf` | "Croissant" `schema:includedInDataCatalog` |
| OAI-ORE `ore:aggregates` | "Croissant" `ore:aggregates` |
| OAI-ORE `ore:describes` | "Croissant" `ore:describes` |
| OAI-ORE `ore:isDescribedBy` | "Croissant" `ore:isDescribedBy` |

180 changes: 180 additions & 0 deletions datasets/dataverse/dataverse.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
{
"dcterms:modified": "2023-09-27",
"dcterms:creator": "DataverseNL",
"@type": "ore:ResourceMap",
"@id": "https://dataverse.nl/api/datasets/export?exporter=OAI_ORE&persistentId=doi:10.34894/VFS3VQ",
"ore:describes": {
"Subject": "Medicine, Health and Life Sciences",
"Title": "Safety and pharmacodynamic efficacy of eculizumab in aneurysmal subarachnoid hemorrhage (CLASH): a phase 2a randomized clinical trial.",
"citation:Depositor": "Vergouwen, Mervyn",
"Deposit Date": "2023-09-21",
"citation:Contact": [
{
"datasetContact:Name": "data management",
"datasetContact:Affiliation": "UMC Utrecht"
},
{
"datasetContact:Name": "Broeders, Willem",
"datasetContact:Affiliation": "UMC Utrecht"
}
],
"citation:Keyword": {
"keyword:Term": "subarachnoid hemorrhage"
},
"Author": {
"author:Name": "Vergouwen, Mervyn",
"author:Affiliation": "UMC Utrecht"
},
"citation:Description": {
"dsDescription:Text": "The dataset includes the raw data collected for the CLASH-trial."
},
"Related Publication": {
"Citation": "Koopman I, Tack RW, Wunderink HF, Bruns AH, van der Schaaf IC, Cianci D, Gelderman KA, van de Ridder IM, Hol EM, Rinkel GJ, Vergouwen MD. Safety and pharmacodynamic efficacy of eculizumab in aneurysmal subarachnoid hemorrhage (CLASH): A phase 2a randomized clinical trial. Eur Stroke J. 2023 Aug 22:23969873231194123. doi: 10.1177/23969873231194123. Online ahead of print.",
"ID Type": "pmid",
"ID Number": "37606053",
"URL": "https://journals.sagepub.com/doi/full/10.1177/23969873231194123?rfr_dat=cr_pub++0pubmed&url_ver=Z39.88-2003&rfr_id=ori%3Arid%3Acrossref.org"
},
"@id": "doi:10.34894/VFS3VQ",
"@type": [
"ore:Aggregation",
"schema:Dataset"
],
"schema:version": "1.0",
"schema:name": "Safety and pharmacodynamic efficacy of eculizumab in aneurysmal subarachnoid hemorrhage (CLASH): a phase 2a randomized clinical trial.",
"schema:dateModified": "2023-09-27 15:15:06.674",
"schema:datePublished": "2023-09-27",
"dvcore:termsOfUse": "The standard Data Sharing Agreement (DSA) of the UMC Utrecht must be signed without adjustments. This DSA is in compliance with Dutch law. No costs are involved.",
"dvcore:confidentialityDeclaration": "no",
"dvcore:specialPermissions": "To obtain access to the data, a <a href=\"https://www.umcutrecht.nl/en/data-request-form-umc-utrecht\">request form</a> has to be completed. In addition to a completed request form, a Data Sharing Agreement (DSA) in line with GDPR regulations and/or a Research Collaboration Agreement (RCA) should be signed before data is shared. Only data requests in line with the Terms of Use will be taken into consideration. ",
"dvcore:restrictions": "See Data Sharing Agreement.",
"dvcore:citationRequirements": "See Data Sharing Agreement.",
"dvcore:conditions": "To access and use the dataset please read the Terms of Use and the Terms of Access.",
"dvcore:disclaimer": "See Data Sharing Agreement.",
"dvcore:fileTermsOfAccess": {
"dvcore:termsOfAccess": "The data is not available for download directly via DataverseNL. Data is available on request by completing the <a href=\"https://www.umcutrecht.nl/en/data-request-form-umc-utrecht\">request form</a>. Only data requests in line with the Terms of Use will be taken into consideration. In addition to a completed request form, the Data Sharing Agreement (DSA) in line with GDPR regulations and/or the Research Collaboration Agreement (RCA) should be signed before data is shared. If a data request is approved, the data will be delivered in a safe and secure manner. By signing the DSA and/or RCA and accessing the Materials, the recipient represents his/her acceptance of the Terms of Use. ",
"dvcore:fileRequestAccess": true,
"dvcore:availabilityStatus": "The data is not available for download directly via DataverseNL but is available on request if the request is compliant with the Terms of Access. ",
"dvcore:contactForAccess": "Please fill out the <a href=\"https://www.umcutrecht.nl/en/data-request-form-umc-utrecht\">request form</a>."
},
"schema:includedInDataCatalog": "DataverseNL",
"ore:aggregates": [
{
"schema:description": "Blood parameters",
"schema:name": "CLASH_bloedafname_longformat_LOD30112021.sav",
"dvcore:restricted": true,
"schema:version": 3,
"dvcore:datasetVersionId": 25355,
"@id": "https://dataverse.nl/file.xhtml?fileId=382085",
"schema:sameAs": "https://dataverse.nl/api/access/datafile/382085",
"@type": "ore:AggregatedResource",
"schema:fileFormat": "application/x-spss-sav",
"dvcore:filesize": 30134,
"dvcore:storageIdentifier": "file://18ad6ac957a-2d81a9fa399f",
"dvcore:rootDataFileId": -1,
"dvcore:checksum": {
"@type": "MD5",
"@value": "a53741b30daa1bc08494b26d39041c62"
}
},
{
"schema:description": "main file",
"schema:name": "CLASH_database_uitgebreid_LOD_03032022.sav",
"dvcore:restricted": true,
"schema:version": 3,
"dvcore:datasetVersionId": 25355,
"@id": "https://dataverse.nl/file.xhtml?fileId=382084",
"schema:sameAs": "https://dataverse.nl/api/access/datafile/382084",
"@type": "ore:AggregatedResource",
"schema:fileFormat": "application/x-spss-sav",
"dvcore:filesize": 156995,
"dvcore:storageIdentifier": "file://18ad6ab85d9-ca8aea1a511c",
"dvcore:rootDataFileId": -1,
"dvcore:checksum": {
"@type": "MD5",
"@value": "1e4c8267c2dd8c3ecf4bc6b2404c614b"
}
},
{
"schema:description": "GCS scores",
"schema:name": "CLASH_GCS_longformat.sav",
"dvcore:restricted": true,
"schema:version": 3,
"dvcore:datasetVersionId": 25355,
"@id": "https://dataverse.nl/file.xhtml?fileId=382086",
"schema:sameAs": "https://dataverse.nl/api/access/datafile/382086",
"@type": "ore:AggregatedResource",
"schema:fileFormat": "application/x-spss-sav",
"dvcore:filesize": 32310,
"dvcore:storageIdentifier": "file://18ad6ad1705-775fab552427",
"dvcore:rootDataFileId": -1,
"dvcore:checksum": {
"@type": "MD5",
"@value": "9664180463d345816ecb9fcb6d9a3568"
}
},
{
"schema:description": "SAE reporting",
"schema:name": "CLASH_SAE_longformat_12012022.sav",
"dvcore:restricted": true,
"schema:version": 3,
"dvcore:datasetVersionId": 25355,
"@id": "https://dataverse.nl/file.xhtml?fileId=382087",
"schema:sameAs": "https://dataverse.nl/api/access/datafile/382087",
"@type": "ore:AggregatedResource",
"schema:fileFormat": "application/x-spss-sav",
"dvcore:filesize": 104711,
"dvcore:storageIdentifier": "file://18ad6ad1746-18c2569f17eb",
"dvcore:rootDataFileId": -1,
"dvcore:checksum": {
"@type": "MD5",
"@value": "5b5c0b1384437fe31cccc1b542aac7b1"
}
},
{
"schema:description": "Publication of CLASH trial",
"schema:name": "Safety and pharmacodynamic efficacy of eculizumab in aneurysmal subarachnoid hemorrhage.pdf",
"dvcore:restricted": false,
"schema:version": 1,
"dvcore:datasetVersionId": 25355,
"@id": "https://dataverse.nl/file.xhtml?fileId=382088",
"schema:sameAs": "https://dataverse.nl/api/access/datafile/382088",
"@type": "ore:AggregatedResource",
"schema:fileFormat": "application/pdf",
"dvcore:filesize": 593876,
"dvcore:storageIdentifier": "file://18ad6b04df6-7a64f6b05506",
"dvcore:rootDataFileId": -1,
"dvcore:checksum": {
"@type": "MD5",
"@value": "96045317b449ec3020374a48c9f638d4"
}
}
],
"schema:hasPart": [
"https://dataverse.nl/file.xhtml?fileId=382085",
"https://dataverse.nl/file.xhtml?fileId=382084",
"https://dataverse.nl/file.xhtml?fileId=382086",
"https://dataverse.nl/file.xhtml?fileId=382087",
"https://dataverse.nl/file.xhtml?fileId=382088"
]
},
"@context": {
"Author": "http://purl.org/dc/terms/creator",
"Citation": "http://purl.org/dc/terms/bibliographicCitation",
"Deposit Date": "http://purl.org/dc/terms/dateSubmitted",
"ID Number": "http://purl.org/spar/datacite/ResourceIdentifier",
"ID Type": "http://purl.org/spar/datacite/ResourceIdentifierScheme",
"Related Publication": "http://purl.org/dc/terms/isReferencedBy",
"Subject": "http://purl.org/dc/terms/subject",
"Title": "http://purl.org/dc/terms/title",
"URL": "https://schema.org/distribution",
"author": "https://dataverse.org/schema/citation/author#",
"citation": "https://dataverse.org/schema/citation/",
"datasetContact": "https://dataverse.org/schema/citation/datasetContact#",
"dcterms": "http://purl.org/dc/terms/",
"dsDescription": "https://dataverse.org/schema/citation/dsDescription#",
"dvcore": "https://dataverse.org/schema/core#",
"keyword": "https://dataverse.org/schema/citation/keyword#",
"ore": "http://www.openarchives.org/ore/terms/",
"schema": "http://schema.org/"
}
}
95 changes: 95 additions & 0 deletions datasets/dataverse/metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
{
"@context": {
"@language": "en",
"@vocab": "https://schema.org/",
"column": "ml:column",
"data": {
"@id": "ml:data",
"@type": "@json"
},
"dataType": {
"@id": "ml:dataType",
"@type": "@vocab"
},
"extract": "ml:extract",
"field": "ml:field",
"fileProperty": "ml:fileProperty",
"format": "ml:format",
"includes": "ml:includes",
"isEnumeration": "ml:isEnumeration",
"jsonPath": "ml:jsonPath",
"ml": "http://mlcommons.org/schema/",
"parentField": "ml:parentField",
"path": "ml:path",
"recordSet": "ml:recordSet",
"references": "ml:references",
"regex": "ml:regex",
"repeated": "ml:repeated",
"replace": "ml:replace",
"sc": "https://schema.org/",
"separator": "ml:separator",
"source": "ml:source",
"subField": "ml:subField",
"transform": "ml:transform",
"wd": "https://www.wikidata.org/wiki/"
},
"@type": "sc:Dataset",
"name": "Safety and pharmacodynamic efficacy of eculizumab in aneurysmal subarachnoid hemorrhage (CLASH): a phase 2a randomized clinical trial.",
"description": "PASS is a large-scale image dataset that does not include any humans and which can be used for high-quality pretraining while significantly reducing privacy concerns.",
"citation": "@Article{asano21pass, author = \"Yuki M. Asano and Christian Rupprecht and Andrew Zisserman and Andrea Vedaldi\", title = \"PASS: An ImageNet replacement for self-supervised pretraining without humans\", journal = \"NeurIPS Track on Datasets and Benchmarks\", year = \"2021\" }",
"license": "https://creativecommons.org/licenses/by/4.0/",
"url": "https://www.robots.ox.ac.uk/~vgg/data/pass/",
"distribution": [
{
"@type": "sc:FileObject",
"name": "metadata",
"contentUrl": "https://zenodo.org/record/6615455/files/pass_metadata.csv",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the PASS dataset. Should you adapt it to a dataset from https://dataverse.nl?

"encodingFormat": "text/csv",
"sha256": "0b033707ea49365a5ffdd14615825511"
},
{
"@type": "sc:FileObject",
"name": "pass9",
"contentUrl": "https://zenodo.org/record/6615455/files/PASS.9.tar",
"encodingFormat": "application/x-tar",
"sha256": "f4f87af4327fd1a66dd7944b9f59cbcc"
},
{
"@type": "sc:FileSet",
"name": "image-files",
"containedIn": "pass9",
"encodingFormat": "image/jpeg",
"includes": "*.jpg"
}
],
"recordSet": [
{
"@type": "ml:RecordSet",
"name": "images",
"key": "hash",
"field": [
{
"@type": "ml:Field",
"name": "hash",
"description": "The hash of the image, as computed from YFCC-100M.",
"dataType": "sc:Text",
"references": {
"distribution": "metadata",
"extract": {
"column": "hash"
}
},
"source": {
"distribution": "image-files",
"extract": {
"fileProperty": "filename"
},
"transform": {
"regex": "([^\\/]+)\\.jpg"
}
}
}
]
}
]
}
Loading