Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promote DataCite-compatible languages support to the specs #925

Open
roll opened this issue Apr 27, 2024 · 2 comments
Open

Promote DataCite-compatible languages support to the specs #925

roll opened this issue Apr 27, 2024 · 2 comments

Comments

@roll
Copy link
Member

roll commented Apr 27, 2024

Overview

As we already have Languages recipe, and there is a de-facto standard way to support languages in DataCite, we might go forward and finally make it to the specs.

cc @augusto-herrmann

@augusto-herrmann
Copy link
Member

@roll does DataCite even use Data Packages or Table Schema to begin with? I have skimmed their documentation and all of their examples are in XML. Also, their language support seems to describe the language of the resource, just like the current Table Schema pattern you linked to above, not the language of the metadata.

What I miss is a way to describe the metadata (resource title and description, column names and descriptions) in multiple languages, while the data itself remains in a single language.

Example

Metadata is provided in multiple languages.

in animals.datapackage.en.yaml:

resources:
  - name: animals
    path: animals.csv
    title: Animals
    schema:
      fields:
        - name: id
          type: integer
        - name: animal
          title: Animal species name
          type: string

in animals.datapackage.ru.yaml:

resources:
  - name: animals
    path: animals.csv
    title: Животные
    schema:
      fields:
        - name: id
          type: integer
        - name: animal
          title: Название вида животного (на английском языке)
          type: string

The csv file (the data itself) has only one version, in English:

id,animal
1,cat
2,dog
3,giraffe
4,bat
5,leopard
6,lion
7,tiger
8,elephant
9,panda
10,rabbit
11,chicken
12,cow
13,horse
14,sheep

Pattern

This undocumented pattern already works. We already use it.

The problem is, the typing information (integer, string) and other non-language specific metadata (e.g. null values, validation rules, etc.) have to be repeated in each data package metadata file. That's bad for maintenance, as types and validation rules may evolve and you have to manually keep track of those across several versions of the data package metadata file and keep them in sync. It would be great if I could define those technical metadata only once and in one place.

@roll
Copy link
Member Author

roll commented Apr 29, 2024

@augusto-herrmann
Thanks a lot for writing it down! Just trying to gather all the information now

@roll roll added this to the v2.1 milestone Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants