Skip to content
This repository has been archived by the owner on Feb 1, 2023. It is now read-only.

Configuring

Vitor Baptista edited this page Nov 30, 2017 · 2 revisions

You can configure goodtables.io via a .goodtables.yml file in the root directory. For example, you can define:

  • Which files goodtables should validate
  • Which spreadsheet page should be validated
  • What delimiter your CSV file uses (e.g. ;)
  • Which validation checks should be executed

Defining the files to validate

By default goodtables validates all files with extension CSV, ODS, XLS, or XLSX, and all files named datapackage.json.

Do we have this default? Sounds sensible.

You can overwrite the default files in .goodtables.yml:

files:
  - source: data1.csv
    schema: schema1.json
  - source: data2.xls
    schema: schema2.json

Alternatively, you can define a pattern like:

files: '*.csv'

How do they define the schema here?

You can also configure how the file is loaded using the options:

Option Description
format The file format (csv, xls, ...)
encoding The file encoding (utf-8, ...)
skip_rows Either the number of rows to skip, or an array of strings (e.g. #, //, ...). Rows that begin with any of the strings will be ignored.

Validating data packages

By default goodtables validates all files named datapackage.json.

You can overwrite this default in goodtables.yml:

datapackages:
  - report1/datapackage.json
  - report2/datapackage.json

Validating CSV files with custom dialects

You can configure how the CSV file is loaded by adding one of the following options on .goodtables.yml:

files:
  - source: data.csv
    delimiter: ;
    doublequote: True
    escapechar: \
    lineterminator: \r\n
    quotechar: "

The entire list of options can be found on https://docs.python.org/3.6/library/csv.html#csv-fmt-params.

Defining the spreadsheet page to validate

By default goodtables validates the first sheet of a spreadsheet.

You can overwrite the default sheet in .goodtables.yml:

files:
  - source: data.xlsx
    sheet: 3

Inferring schema

By default goodtables does not infer the data schema. You can enable inferring in .goodtables.yml:

settings:
  infer_schema: True
  infer_fields: True

Goodtables will infer the schema of all files and columns that don't have an explicit schema.