Expand the data model #1736

timrobertson100 · 2022-02-04T08:21:32Z

As GBIF explores capabilities with a new data model we wish to produce exemplar datasets that demonstrate the output using the IPT.

This is an evolutionary change from the current model and removes the constraints of the star-schema inherent in the DwC-A. It is desirable to minimise the impact of these changes to the user community, and so we are looking to adapt the IPT in a manner that will remain familiar.

It is envisaged that:

The table schemas for the new model will be available, in a similar manner to those on rs.gbig.org today. The format of these is yet to be decided, but could be defined using XML (as per today) or by using Frictionless data or Avro schema formats.
A user of the IPT will be able to upload spreadsheets, or connect to a database as they do currently
During data mapping, the user can select the target table to map data to in a similar manner to the current core and extension. The difference however, is that the table arrangement may not be in a star-format
On data publishing the IPT will prepare a Zip file (initially) containing the converted CSV files with header rows, an EML file as it does today, and a meta file that describes the relationships between the tabular data. In the first implementation, we should prepare this meta file in the Frictionless data package format. This may be revised to e.g. the W3C CSV on the web format or even Avro formats as explorations develop.
During the archive generation, the IPT will continue to perform key validation checks, including the existing and uniqueness of the necessary IDs, and checking the referential integrity of the relationships.
An installation of this branch (v3) of the IPT be available for those working on the data model to test.

peterdesmet · 2022-02-04T09:22:47Z

Nice! Happy to see this on the roadmap and happy to see so many improvements in the IPT by @mike-podolskiy90.

mdoering · 2022-02-08T13:42:35Z

It would be great if the IPT would then also allow to generate ColDP which is for most parts very close to frictionless data. There is even a frictionless tabular-data-package generated by the API that contains all possible fields for all possible entities.

Contrary to DwC-A ColDP does not use a semantic mapping of the data files but instead uses column headers and filename conventions to identify the terms/entities.

CecSve · 2022-12-22T11:07:57Z

5. During the archive generation, the IPT will continue to perform key validation checks, including the existing and uniqueness of the necessary IDs, and checking the referential integrity of the relationships.

This is relevant for a question we received through the portal feedback, so it is great to see it will be incorporated in the new data model.

mike-podolskiy90 self-assigned this Feb 4, 2022

mike-podolskiy90 added this to the 3.0 milestone Mar 8, 2022

mike-podolskiy90 mentioned this issue Mar 9, 2022

Schema management for the new data model #1753

Closed

mike-podolskiy90 added the Epic label Mar 17, 2022

timrobertson100 mentioned this issue Apr 26, 2022

Implement a CLI gbif/dwca-io#53

Open

timrobertson100 mentioned this issue Aug 26, 2022

Attaching images to events (camera traps) gbif/portal-feedback#4216

Open

CecSve mentioned this issue Dec 23, 2022

Check for references to core IDs that do not exist #1246

Open

mike-podolskiy90 modified the milestones: 3.0, Frictionless (Camtrap DP / ColDP / Material DP) Jul 19, 2023

mike-podolskiy90 closed this as completed Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand the data model #1736

Expand the data model #1736

timrobertson100 commented Feb 4, 2022 •

edited

Loading

peterdesmet commented Feb 4, 2022

mdoering commented Feb 8, 2022

CecSve commented Dec 22, 2022

Expand the data model #1736

Expand the data model #1736

Comments

timrobertson100 commented Feb 4, 2022 • edited Loading

peterdesmet commented Feb 4, 2022

mdoering commented Feb 8, 2022

CecSve commented Dec 22, 2022

timrobertson100 commented Feb 4, 2022 •

edited

Loading