To run the whole pipeline at once:
bash pipeline.sh
This runs (in order):
- Clean v.1.0 for encoding/parsing errors: scripts/clean-gateway.
- Process new data to add to the database: scripts/mulder.R and scripts/tagus.R.
- Extract species names: scripts/extract-species-names.R.
- Query the species names against GBIF: harmonize-taxonomy.py.
- Combine v.1.0 with new data: scripts/combine.R.
- Harmonize taxonomy of the database: harmonize-taxonomy.R.
- Saves the new database as gateway-v.2.0.csv.
- Create summary tables to be displayed on the website: scripts/summarize.R.
- Display some summary statistics on the terminal.
Some of this steps can take some time. To avoid re-running already completed steps, once the step is completed succesfully an hidden (empty) file is added to the steps folder. Steps that have such files will not be re-ran. You can re-run the whole pipeline from scratch specifying the option --clean:
bash pipeline.sh --clean
To see available options and usage: bash pipeline.sh --help
.