Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add resources to datapackage.json #73

Closed
augusto-herrmann opened this issue Jan 19, 2016 · 14 comments
Closed

add resources to datapackage.json #73

augusto-herrmann opened this issue Jan 19, 2016 · 14 comments
Assignees
Labels
Data model Changes to schema and how to represent data

Comments

@augusto-herrmann
Copy link
Collaborator

The datapackage.json file describes the publicbodies dataset according to the Tabular Data Package specification. It currently references only the data/gb.csv resource. Should be updated with references to all resources in the dataset.

Note: considering each resource has to list all fields in the schema, in order to save some repetitive work, this probably should be done preferably after deciding on the pending changes to the schema (e.g. #61 #65 #68).

@rufuspollock
Copy link
Member

@augusto-herrmann great. I also note that Data Package spec supports shared schema http://dataprotocols.org/data-packages/#resource-schemas

@todrobbins
Copy link
Contributor

👍

@augusto-herrmann I can update the datapackage.json, unless you are already in process. Thanks for bringing this to the forefront!

@augusto-herrmann
Copy link
Collaborator Author

A very pertinent observation, @rgrp, thanks!

I only really looked into this matter today. It seems the current datapackage.json is not passing schema validation with goodtables.

$ goodtables schema data/gb.csv --schema datapackage.json 
...
  File "... /local/lib/python2.7/site-packages/goodtables/processors/schema.py", line 91, in schema_model
    raise e
jsontableschema.exceptions.InvalidSchemaError

But goodtables does not give a detailed description on what the actual problem preventing validation was.

@todrobbins, feel free to assign this issue to yourself if you're already into it.

@danfowler
Copy link
Contributor

Hi @augusto-herrmann goodtables expects a JSON Table Schema (example), not the full datapackage.json.

That being said, several updates to the datapackage.json do need to be made to match the current specification. For example, license should probably be changed to an object (not an array), a field in the resources array has a name not an id, and probably some other issues.

@rufuspollock
Copy link
Member

@danfowler great points. As an aside re goodtables can we raise an issue on goodtables about it supporting tabular data packages out of the box?

@augusto-herrmann
Copy link
Collaborator Author

A very keen observation, @danfowler.
+1 for @rgrp's suggestion. It would be very useful to check for correctness of tabular data packages on the CLI with goodtables.

@todrobbins todrobbins self-assigned this May 6, 2017
@todrobbins todrobbins added the Data model Changes to schema and how to represent data label May 6, 2017
@todrobbins
Copy link
Contributor

@augusto-herrmann @danfowler @rufuspollock did Tabular Data Package support ever land in Good Tables? I'm currently working on a review of our datapackage.jsonand would appreciate any new insights to inform that process.

@todrobbins
Copy link
Contributor

@pwalsh can you confirm? Looks like frictionless-data/goodtables-py #66 was merged and tested.

@danfowler
Copy link
Contributor

Hi @todrobbins goodtables-py and goodtables.io can both work with the datapackage.json.

@danfowler
Copy link
Contributor

@todrobbins @augusto-herrmann @rufuspollock @pwalsh so now that goodtables.io supports Data Packages, we need to update this Data Package to the latest specs: http://specs.frictionlessdata.io/. Goodtables.io doesn't like how it is currently structured:

screen shot 2017-05-10 at 13 50 36

@augusto-herrmann
Copy link
Collaborator Author

I tried updating datapackage.json to current specification on a local branch. On my local machine it validates:

$ datapackage validate datapackage-with-repeated-schema.json
Data package descriptor is valid

But goodtables.io never finishes the job (and I have no idea why it would take long, anyway, considering that on my local machine it validates quite fast).

Also, some of the CSV files do present validation errors with the goodtables CLI. gr.csv, np.csv and us.csv are reported to have duplicate rows. se.csv has a blank header. nz.csv is reported to have an encoding problem. It would be nice to have an individual badge for each file. Is this possible with goodtables.io?

@augusto-herrmann
Copy link
Collaborator Author

@augusto-herrmann great. I also note that Data Package spec supports shared schema http://dataprotocols.org/data-packages/#resource-schemas

@rufuspollock that link no longer works. I could not find a "shared schema" anywhere in the current specifications. I have thus created a topic on the Frictionless Data forums to discuss it.

augusto-herrmann added a commit to augusto-herrmann/publicbodies that referenced this issue Aug 6, 2018
todrobbins added a commit that referenced this issue Aug 7, 2018
updating datapackage.json to current specs & adding schema as per #73
@augusto-herrmann
Copy link
Collaborator Author

With this PR datapackage.json is up to speed.

@todrobbins
Copy link
Contributor

Again: thank you, @augusto-herrmann!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data model Changes to schema and how to represent data
Projects
None yet
Development

No branches or pull requests

4 participants