Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand the data model #1736

Closed
timrobertson100 opened this issue Feb 4, 2022 · 3 comments
Closed

Expand the data model #1736

timrobertson100 opened this issue Feb 4, 2022 · 3 comments
Assignees
Labels
Milestone

Comments

@timrobertson100
Copy link
Member

timrobertson100 commented Feb 4, 2022

As GBIF explores capabilities with a new data model we wish to produce exemplar datasets that demonstrate the output using the IPT.

This is an evolutionary change from the current model and removes the constraints of the star-schema inherent in the DwC-A. It is desirable to minimise the impact of these changes to the user community, and so we are looking to adapt the IPT in a manner that will remain familiar.

It is envisaged that:

  1. The table schemas for the new model will be available, in a similar manner to those on rs.gbig.org today. The format of these is yet to be decided, but could be defined using XML (as per today) or by using Frictionless data or Avro schema formats.

  2. A user of the IPT will be able to upload spreadsheets, or connect to a database as they do currently

  3. During data mapping, the user can select the target table to map data to in a similar manner to the current core and extension. The difference however, is that the table arrangement may not be in a star-format

  4. On data publishing the IPT will prepare a Zip file (initially) containing the converted CSV files with header rows, an EML file as it does today, and a meta file that describes the relationships between the tabular data. In the first implementation, we should prepare this meta file in the Frictionless data package format. This may be revised to e.g. the W3C CSV on the web format or even Avro formats as explorations develop.

  5. During the archive generation, the IPT will continue to perform key validation checks, including the existing and uniqueness of the necessary IDs, and checking the referential integrity of the relationships.

  6. An installation of this branch (v3) of the IPT be available for those working on the data model to test.

@peterdesmet
Copy link
Member

Nice! Happy to see this on the roadmap and happy to see so many improvements in the IPT by @mike-podolskiy90.

@mike-podolskiy90 mike-podolskiy90 self-assigned this Feb 4, 2022
@mdoering
Copy link
Member

mdoering commented Feb 8, 2022

It would be great if the IPT would then also allow to generate ColDP which is for most parts very close to frictionless data. There is even a frictionless tabular-data-package generated by the API that contains all possible fields for all possible entities.

Contrary to DwC-A ColDP does not use a semantic mapping of the data files but instead uses column headers and filename conventions to identify the terms/entities.

@CecSve
Copy link
Contributor

CecSve commented Dec 22, 2022

5. During the archive generation, the IPT will continue to perform key validation checks, including the existing and uniqueness of the necessary IDs, and checking the referential integrity of the relationships.

This is relevant for a question we received through the portal feedback, so it is great to see it will be incorporated in the new data model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants