Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a single JSON-LD context across all of GO, and make any necessary changes to noctua-models #617

Open
cmungall opened this issue Apr 18, 2018 · 10 comments

Comments

@cmungall
Copy link
Member

Parts of our stack (amigo, noctua-js, GAFs, etc) use CURIEs/IDs as currency. Other parts (minerva, go-rdf, ontology) use URIs.

The expansion/contraction rules are not well defined.

We should have a single json-ld context file we use across the GO.

Furthermore, the contexts of this should be as predictable as possible. E.g. obolibrary for all ontologies, purl.uniprot for all uniprot entries, and something like id.org for everything else. This will require a one-time change to Noctua models.

Previous tickets:

@balhoff
Copy link
Member

balhoff commented Apr 19, 2018

Would be nice to reuse the OBO prefixes context: http://obofoundry.org/registry/obo_context.jsonld

As far as I know, while a JSON document can reference multiple contexts, a context can't import another context. Should "single JSON-LD context" mean a single defined set of JSON-LD contexts, or do you want to have the pipeline concatenate a few source contexts into the single JSON-LD context?

@cmungall
Copy link
Member Author

I'm adding rdf_uri_prefix to db-xrefs yaml. Note this will often be different from the web page expansion. Currently these are all obolibrary or identifiers.org.

db-xrefs.yaml is the canonical source metadata for GO. We will generate a json-ld context from this as part of the release. Minerva will use this for expansion/contraction when communicating with Noctua/golr. ontobio will use this when converting GAFs to GO-CAMs. The neo build will use this to expand GPIs to make an OWL file of all the gene products.

Jim: currently there is only a handful of ontologies in here and these are just manually synced with obo_context. We have tools in the prefixcommons repo to detect inconsistencies between these.

cmungall added a commit to prefixcommons/biocontext that referenced this issue Apr 19, 2018
cmungall added a commit to prefixcommons/biocontext that referenced this issue Apr 19, 2018
This is a derived copy of the central GO
db-xrefs.yaml file
See: geneontology/go-site#617
cmungall added a commit to prefixcommons/biocontext that referenced this issue Apr 19, 2018
cmungall added a commit to biolink/ontobio that referenced this issue Apr 19, 2018
NOTE: this is a temporary measure. We will build the go json ld context as part of the pipeline in future
See geneontology/go-site#617
@cmungall
Copy link
Member Author

@TomConlin
Copy link

Dipper's curie_prefix to base_iri mapping file is:

https://github.com/monarch-initiative/dipper/blob/master/dipper/curie_map.yaml

Monarch app should also use it although I am not sure it does everywhere it could.

curie_map.yaml could also stand a shakedown for

  • remove unused key-values if any
  • moving http: to https: where possible
  • keeping prefixes short
  • remove non-boring char / consistency i.e. ditch hyphens and underscores

@cmungall
Copy link
Member Author

Thanks Tom!

Summary of where we are in GO

https://github.com/geneontology/go-site/blob/master/metadata/db-xrefs.yaml is the source authority. See https://github.com/geneontology/go-site/pull/620/files

This is used to generate https://github.com/prefixcommons/biocontext/blob/master/registry/go_context.jsonld, but we'll actually publish the jsonld context as part of the GO pipeline.

The prefixcommons repo is a good place to go for getting diffs between any two contexts

@TomConlin
Copy link

Oh what a mess this is, prefix case differences, conflicting cases for uris, straight up prefix hijacking ...
I'm sorry but I cannot not be taking this on right now.

cmungall added a commit to geneontology/minerva that referenced this issue May 18, 2018
cmungall added a commit to geneontology/minerva that referenced this issue May 18, 2018


Makefil added to clarify how the json ld contexts are built.

Now using go and obo context only
cmungall added a commit that referenced this issue May 18, 2018
cmungall added a commit to geneontology/neo that referenced this issue May 18, 2018
cmungall added a commit that referenced this issue May 19, 2018
@lpalbou
Copy link
Contributor

lpalbou commented Jun 6, 2018

This is a blocking issue for me on the GO-CAM site. For reference:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT * WHERE {

  #BIND(<http://identifiers.org/uniprot/Q9WTW1> as ?GP) .
  BIND(<http://identifiers.org/uniprot/P34913> as ?GP) .

  ?GP ?pred ?obj .
} 
LIMIT 10

Q9WTW1 (Rat) will have no information, just stating it is an owl:class
P34913 (Human) will have some information (obo:id, rdfs:label)

Other cases:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE {

#  BIND(<http://identifiers.org/uniprot/A8IV67> as ?gpuri)    # has nothing
#  BIND(<http://identifiers.org/uniprot/P10499> as ?gpuri)    # just has ?obj = owl:Class
#  BIND(<http://www.informatics.jax.org/accession/MGI:MGI:1316740> as ?gpuri)  # has owl:Class, oboInOwl:id, rdf:type, rdfs:label

  BIND(<http://identifiers.org/uniprot/P34913> as ?gpuri)     # has possibly all information (dbxref, synonym, label, subclassOf, etc)

  ?gpuri ?pred ?obj .
} 
LIMIT 10

Which affects more complex queries (e.g. to get the recommended name of a gene, or its taxon):

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> 
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX metago: <http://model.geneontology.org/>

PREFIX enabled_by: <http://purl.obolibrary.org/obo/RO_0002333>
PREFIX in_taxon: <http://purl.obolibrary.org/obo/RO_0002162>

SELECT distinct ?identifier ?name ?species

WHERE 
{
#  GRAPH metago:586fc17a00000705 {
  GRAPH metago:581e072c00000295 {
    ?s enabled_by: ?gpnode .    
    ?gpnode rdf:type ?identifier .
    FILTER(?identifier != owl:NamedIndividual) .         
  }

  ?identifier rdfs:subClassOf ?v0 . 
  ?identifier rdfs:label ?name .

  ?v0 owl:onProperty in_taxon: . 
  ?v0 owl:someValuesFrom ?taxon .
  ?taxon rdfs:label ?species .      
}

this query works for the second model, but does not work for the first model (xxx705). In the first model, the ?identifier is referring to a flat class without any subclass ?v0

@balhoff
Copy link
Member

balhoff commented Jun 7, 2018

@lpalbou I don't think your problem relates to identifier prefixes. Q9WTW1 is simply not in NEO at all.

kltm added a commit to geneontology/noctua-models that referenced this issue Jun 26, 2018
@kltm
Copy link
Member

kltm commented Jun 26, 2018

@cmungall @balhoff I believe that this is clear now?

@lpalbou
Copy link
Contributor

lpalbou commented Aug 3, 2018

@cmungall thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants