-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sparql creator
query is too narrowly defined
#48
Comments
I'm having trouble figuring out where to find this problem. It might actually have something to do with how mnlite serves |
What do you mean, to the For example, here's a link to the subprocessor configuration for extracting a single creator from SO to populate the SOLR |
I didn't understand the simplicity of how In that case, maybe it makes sense to look at the SPARQL query. I'm not that familiar with SPARQL but I understand some SQL. Here is a "creator": {
"@type": "Role",
"creator": {
"@type": "Person",
"Affiliation": {
"@type": "Organization",
"name": "Centre for Earth Observation Science - University of Manitoba"
},
"Email": "yendamuk@myumanitoba.ca",
"Identifier": {
"@type": "PropertyValue",
"propertyID": "https://registry.identifiers.org/registry/orcid",
"url": "https://orcid.org/0009-0001-2454-4614",
"value": "0009-0001-2454-4614"
},
"Name": "Yendamuri, Kiran\t"
}
}, And here is the SPARQL query: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX list: <http://jena.hpl.hp.com/ARQ/list#>
PREFIX SO: <http://schema.org/>
SELECT (?name as ?author)
WHERE {
?dsId rdf:type SO:Dataset .
?dsId SO:creator ?list .
?list list:index (?pos ?member) .
?member SO:name ?name .
}
order by (?pos)
limit 1 |
@iannesbitt the actual SPARQL query for the PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX list: <http://jena.hpl.hp.com/ARQ/list#>
PREFIX SO: <http://schema.org/>
SELECT (?name as ?origin)
WHERE {
?dsId rdf:type SO:Dataset .
?dsId SO:creator ?list .
?list list:index (?pos ?member) .
?member SO:name ?name .
}
order by (?pos) Note that it assumes {
"@context": "https://schema.org",
"schema:Dataset": {
"@type": "schema:Dataset",
"name": "Test dataset",
"creator": {
"@type": "Role",
"creator": {
"@type": "Person",
"affiliation": {
"@type": "Organization",
"name": "Centre for Earth Observation Science - University of Manitoba"
},
"email": "yendamuk@myumanitoba.ca",
"identifier": {
"@type": "PropertyValue",
"propertyID": "https://registry.identifiers.org/registry/orcid",
"url": "https://orcid.org/0009-0001-2454-4614",
"value": "0009-0001-2454-4614"
},
"name": "Yendamuri, Kiran"
}
}
}
} Here's a SPARQL query to retrieve both the name and email from that. Somehow we need to support these multiple encoding approaches: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX list: <http://jena.hpl.hp.com/ARQ/list#>
PREFIX SO: <http://schema.org/>
SELECT ?name ?email
WHERE {
?dsId rdf:type SO:Dataset .
?dsId SO:creator $creator .
$creator SO:creator ?role .
$role SO:name ?name .
$role SO:email $email .
} This produces the following results: {
"head": {
"vars": [
"name",
"email"
]
},
"results": {
"bindings": [
{
"name": {
"value": "Yendamuri, Kiran",
"type": "literal"
},
"email": {
"value": "yendamuk@myumanitoba.ca",
"type": "literal"
}
}
]
},
"metadata": {
"httpRequests": 46
}
} In an ideal world we would also capture the ORCID and Affiliation too. |
@mbjones should I open an issue or PR in the indexer to track this? |
yeah, or we could transfer this issue over to the indexer repo if it has what we need... |
creator
is not populated correctlycreator
query is too narrowly defined
Ok, should be good now. How should we test a change like this? |
@taojing2002, @mbjones, @artntek and I met for a while to discuss this problem today. We came up with a three-point proposal to try and broaden the configurations of
|
Note: the above comment was edited to include @taojing2002 who was mis-tagged in the original |
I noticed that there was systemmetadata for each test document, but I didn't see any documentation on how to create it. Is there a method for creating it automatically? |
the system metadata needs to accompany an indexer test document primarily to indicate the type of object being sent to the indexer. Other than that, I think the sys meta can contain any valid values |
wrt the I believe that means the indexer needs to do the ops described in step 2. That said, it would certainly be much simpler to index if the json-ld content could be pre-processed to a common representation prior to passing on to the CNs. Perhaps such pre-processing should be part of the indexer? |
Edit 2023-10-11: updating in light of new info
Both OpenTopography and CanWIN have an issue where no creator is found in the SO doc, making the citations incorrect.
CanWIN example: DataONE representation, dataset landing, schema validation
Citation:
OpenTopography example: DataONE representation, dataset landing, schema validation
Citation:
sonormal
(and by extensionmnlite
) needs to be more adaptable when looking for dataset creators. I think this is set insonormal.normalize._forceSODatasetLists
.The text was updated successfully, but these errors were encountered: