-
Notifications
You must be signed in to change notification settings - Fork 16
Configuring
You can configure goodtables.io via a .goodtables.yml
file in the root directory. For example, you can define:
- Which files goodtables should validate
- Which spreadsheet page should be validated
- What delimiter your CSV file uses (e.g.
;
) - Which validation checks should be executed
By default goodtables validates all files with extension CSV, ODS, XLS, or XLSX, and all files named datapackage.json
.
Do we have this default? Sounds sensible.
You can overwrite the default files in .goodtables.yml
:
files:
- source: data1.csv
schema: schema1.json
- source: data2.xls
schema: schema2.json
Alternatively, you can define a pattern like:
files: '*.csv'
How do they define the schema here?
You can also configure how the file is loaded using the options:
Option | Description |
---|---|
format | The file format (csv, xls, ...) |
encoding | The file encoding (utf-8, ...) |
skip_rows | Either the number of rows to skip, or an array of strings (e.g. # , // , ...). Rows that begin with any of the strings will be ignored. |
By default goodtables validates all files named datapackage.json
.
You can overwrite this default in goodtables.yml
:
datapackages:
- report1/datapackage.json
- report2/datapackage.json
You can configure how the CSV file is loaded by adding one of the following options on .goodtables.yml
:
files:
- source: data.csv
delimiter: ;
doublequote: True
escapechar: \
lineterminator: \r\n
quotechar: "
The entire list of options can be found on https://docs.python.org/3.6/library/csv.html#csv-fmt-params.
By default goodtables validates the first sheet of a spreadsheet.
You can overwrite the default sheet in .goodtables.yml
:
files:
- source: data.xlsx
sheet: 3
By default goodtables does not infer the data schema. You can enable inferring in .goodtables.yml
:
settings:
infer_schema: True
infer_fields: True
Goodtables will infer the schema of all files and columns that don't have an explicit schema.