Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Why doesn't DPV reference existing W3C ontologies? #55

Closed
justin2004 opened this issue Oct 6, 2022 · 8 comments
Closed

Question: Why doesn't DPV reference existing W3C ontologies? #55

justin2004 opened this issue Oct 6, 2022 · 8 comments

Comments

@justin2004
Copy link

e.g.
PROV has :PrimarySource and this (DPV) has :DataSource which are clearly closely related when you read about them but the newcomer (DPV) doesn't reference the existing PROV. Why not?

W3C ontologies would be much more useful and something to be excited about if they were all built upon a common base (an upper ontology (top level ontology) like Gist, BFO, etc.).

@coolharsh55
Copy link
Collaborator

Hi. This is the general idea of #31, i.e. to create such alignments or mappings between DPV concepts and other well known vocabularies (including W3C). I have now added PROV to that list as well. As for why we haven't done so - lack of volunteers / available time - which we (always) welcome to have this be done.

Specifically for PROV, the concept PrimarySource is relevant, but not the same as DataSource unless you are creating provenance records - where you first want to express dpv:PersonalData is a subclass of prov:Entity, then equate prov:Activity with dpv:Processing and so on. An example of this is https://w3id.org/GDPRov#PersonalDataEntity However, this can also result in some unintended issues - for example where personal data is expressed as a common category (e.g. "email data") that has instances or artefacts (e.g. "x&y.com"), the use of prov:Entity may not be accurate, which is why in GDPRov, I used P-Plan to distinguish the category from instance using https://w3id.org/GDPRov#PersonalData. So such (careful) considerations are needed for each external vocabulary to be aligned with DPV concepts - especially because of potential legal implications/interpretations.

For this reason, #31 specifies the plan to provide aligns in separate files outside the main vocabulary. It is also to enable you to 'pick and choose' which assertions you want to import into your use-case. For e.g. if you don't use PROV, you don't import that alignment.

@justin2004
Copy link
Author

the concept PrimarySource is relevant, but not the same as DataSource unless you are creating provenance records

I didn't believe they were the same I just thought they were closely related.

Usually when people talk about ontology alignment they think they are interested in equivalents (exact matches) and in Issue #31 there are discussions about mappings (edges to and from equivalents). I think it is going to be hard to denote relevance when the matches are not exact without resorting to something vague like rdfs:seeAlso.

In #31 you are thinking about approaches:

mappings can be optionally in RDF (how to express these? Is there a mapping ontology?), otherwise for simplicity just create a spreadsheet and use that to populate the HTML docs

The best way I can think of to allow ontologies to reference one another is to define them in terms of a top level ontology (TLO). Then you wouldn't have to map DPV to PROV, DPV to DCAT, etc. Instead, it would be the case that the definition of prov:PrimarySource might reference something like tlo:comesFromAgent and tlo:Content (and perhaps other primitives in the TLO too) and then the definition of dpv:DataSource might also reference tlo:comesFromAgent and tlo:Content.

That would make prov:PrimarySource and dpv:DataSource connected from the bottom (in their construction) rather than from the top (after they are constructed). To do the latter you often have to use loosey-goosey predicates like rdfs:seeAlso because you don't often have exact matches.

It seems much easier to make integration of domain specific ontologies like DPV a byproduct of expressing them with a TLO than trying to integrate them after they are created.

What do you think?

@coolharsh55
Copy link
Collaborator

Sounds like a good idea, and fits the rationale for creating top level ontologies. But someone has to do this work, and it may turn out to be a lot of work. If someone is available to do this, that's great. Otherwise it'll end up being a pending task for a long time.

And even if we do have a common TLO, there is a lot of value in directly expressing the relation between two concepts for use cases and implementation. So my thoughts are that a SKOS based mapping (exact match, narrower, broader) between two ontologies would help see how they align would be the least amount of work to provide what is needed.

@justin2004
Copy link
Author

Some candidate TLOs already exist (such as Gist). The work would be: when someone starts to develop a domain-specific ontology to express each thing using, say, Gist. Maybe it is already too late for DPV.

there is a lot of value in directly expressing the relation between two concepts for use cases and implementation.

That is fair. When it can be done it is valuable. But many ontologies don't consist of (explicit) skos concepts so it is less clear how to align those using broader, narrower, etc.

I think building domain-specific ontologies using a common TLO casts a much wider integration net.

@coolharsh55
Copy link
Collaborator

For DPV, the concepts and structure came from the SPECIAL H2020 project, and the first time I/we came across Gist was much after the core model had been established. So yes it was too late to pivot to using a TLO. There is also the work by SEMIC on core vocabularies that model some concepts, similar to Gist. However the main novelty of DPV is what is not being covered by these existing TLOs - things such as legal basis and technical measures. So in that sense, the core concepts in DPV are intended to be a TLO - for DPV.

I think building domain-specific ontologies using a common TLO casts a much wider integration net.

Yes, but again, it needs someone to create those, and then someone to integrate/align those with DPV. Its sort of a circular problem that gets easier to deal with if there are enough people to do the work and they stick around for enough time to see to it for completion. The role of DPV is also to demonstrate that we need a TLO for all the data / privacy / legal stuff - so that it gains attention and kickstarts efforts either to improve DPV or create a TLO standard vocabulary for the domain.

But many ontologies don't consist of (explicit) skos concepts so it is less clear how to align those using broader, narrower, etc.

IMHO its still fine to do the alignment itself using SKOS (because its simpler), and to then convert it into semantics later. For example, DPV-SKOS uses SKOS, DPV-OWL uses OWL - but Gist uses OWL2. There's no common method to align these without creating new semantics. Instead, for the initial human work of alignment, SKOS mappings to show how much coverage the two have in comparison is easier to do and manage - even for non-sem-web experts, and then this is used for specific semantics - so if we want OWL then rdfs:subClassOf, and so on. At least that's my thinking so far.

@justin2004
Copy link
Author

justin2004 commented Oct 7, 2022

However the main novelty of DPV is what is not being covered by these existing TLOs - things such as legal basis and technical measures.

Most of the things I see in DPV-OWL could be defined using the primitives in Gist. So that in sense Gist "covers" them (from generic end) but Gist, of course, does not go to the same depth/specificity that DPV does (because it isn't designed to do that).

it needs someone to create [TLOs], and then someone to integrate/align those with DPV.

Many TLOs already exist. I think the sequence "create a domain ontology then align it with some existing TLO" will not work well. I think the the sequence probably needs to be "pick a TLO then create a domain ontology using those primitives." The use of a TLO guides the slicing of the domain; I don't think you can bolt that on after the fact.

DPV-OWL uses OWL - but Gist uses OWL2. There's no common method to align these without creating new semantics.

I'm not sure that is true. OWL2 is backwards compatible with OWL1.

The role of DPV is also to demonstrate that we need a TLO for all the data / privacy / legal stuff

That's a fair goal. Though I think it would have been much more interesting to express DPV using the primitives in an existing TLO.

@coolharsh55
Copy link
Collaborator

Agree about it being difficult/late to integrate a TLO at this point in time.

I'm not sure that is true. OWL2 is backwards compatible with OWL1.

I meant DPV has SKOS and OWL variants with different semantics, so comparing that with gist is not as easy and can result in different interpretations if not done carefully. In this, the SKOS one is easier and quicker to do because its basically looking at what concepts can we use in gist for each concept in DPV - which is also helpful as an initial exercise for doing an OWL alignment later.

@coolharsh55
Copy link
Collaborator

Hi. If there are no further comments or pending queries, I will close this issue after NOV-22. The discussion of reusing vocabularies can continue here, or in #31.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants