-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use case: Dataset size characteristics #161
Comments
This looks like a clear gap in DCAT capabilities. Could UCR team extract Requirements from this? |
@dr-shorthair , @VladimirAlexiev , I think this use case relates very much to the following reqs:
and also to the following UC: |
So both this one (ID51) and 5.44 (ID44) talk of subsets but also about other topics. |
The proposal seems to merge two concerns: 1) expressing the structure of data sets, like modeled in DATS (hasPart) and 2) indicating the size property of the composite or the individual parts. My suggestion is to rephrase this UC to cover only 2) and create and link here a new UC on behalf of 1) |
hi @jpullmann! I've reread this proposal and I agree with you. As I wrote "subsets are such a crucial topic, they should be split into their own requirement". Requirement 5.44 (ID44) also talks of subsets, but for another reason. Someone needs to take subset characterization parts from this one (ID51), 5.44 (ID44), and your suggestion (DATS hasPart) and consolidate them. I could try it but I'm not sure I can capture other people's suggestions adequately. Are there designated editors in this WG? |
Editors names on each draft https://www.w3.org/2017/dxwg/wiki/Main_Page#Deliverables |
Relationship of a dataset to subsets is part of #81 I've renamed it to reflect that focus. |
As per the last comment, assuming that the related datasets (sub-datasets) are considered in the requirement #81, should this UC be added to the UCR document focusing on the dataset size characteristics where the main requirements would be around (copied from above):
Ping to @fanieli @jpullmann @rob-metalinkage |
@VladimirAlexiev are you interested in reviving this? |
Unless there are any objections, I propose we close this issue. |
Noting no objections, I'm closing this issue. |
Submitting a new USE CASE:
Dataset subsets and size characteristics
Status:
Identifier: ID51 (proposed)
Creator: Vladimir Alexiev, Ontotext
Deliverable(s): DCAT1.1
Tags
semantics statistics size
Stakeholders
Data consumers often need to know how many of what sort of entities are included in a dataset.
In an aggregation scenario, different subsets (parts of a dataset) need to be expressed, eg because they come from different data providers.
Eg in the euBusinessGraph project we have a need to describe Company datasets by different providers,
what properties are included in each (eg
ebg:isStartup, org:orgActivity
),and some partition info eg "the dataset covers jurisdiction Italy" or "the dataset has 1000 Italian startups"
(i.e.
rov:RegisteredOrganization
withebg:isStartup=true
and jurisdiction Italy)Problem statement
DCAT 1.0 has only a property
dcat:byteSize
, which is pretty useless to describe any aspect of dataset content or value.And it has no means of expressing subsets.
Existing approaches
VOID statistics includes these
void:
counting props:triples, entities, classes, properties, distinctSubjects, distinctObjects, documents
.Very importantly, these can be used on subsets such as classPartition and propertyPartition, which provides very powerful means to describe exactly what kinds of entities are present, and how many are in the dataset.
Thus I believe that subsets are instrumental in expressing the fine-grained content of a dataset.
Links
Schema issue https://github.com/schemaorg/schemaorg/issues/1855
Requirements
Ability to express the fine-grained content of a dataset:
Notes:
ebg:isStartup=true
Related use cases
ID33, ID7, RDSAT, RSS.
This one could be merged into ID33 to provide further details.
The text was updated successfully, but these errors were encountered: