A Krawler based service to download data from French open portal Hub'Eau
The k-hubeau-hydro jobs allow to scrape hydrometric data from the following api: http://hubeau.eaufrance.fr/page/api-hydrometrie. The downloaded data are stored in a MongoDB database and more precisely in 2 collections:
- the
observations
collection stores the observed data:- the water level
H
in meter (m) - the water flow
Q
in cubic meter per second (m3/s)
- the water level
- the
stations
collection stores the data of the stations
The project consists in 3 jobs:
- the
stations
job scrapes the stations data according a specific cron expression. By default, every day at midnight. - the
observations
job scrapes the observations according a specific cron expression. By default every 15 minutes. - the
prediction
job generates the predictions about future water levels.
Variable | Description |
---|---|
DB_URL |
The database URL. The default value is mongodb://127.0.0.1:27017/hubeau |
DEBUG |
Enables debug output. Set it to krawler* to enable full output. By default it is undefined. |
Variable | Description |
---|---|
DB_URL |
The database URL. The default value is mongodb://127.0.0.1:27017/hubeau |
TTL |
The observations data time to live. It must be expressed in seconds and the default value is 604 800 (7 days) |
HISTORY |
The duration of the observations data history the job has to download. It must be expressed in milliseconds and the default value is 86 400 000 (1 day) |
TIMEOUT |
The maximum duration of the job. It must be in milliseconds and the default value is 1 800 000 (30 minutes). |
DEBUG |
Enables debug output. Set it to krawler* to enable full output. By default it is undefined. |
The k-hubeau-piezo jobs allow to scrape piezometric data from the following api: http://hubeau.eaufrance.fr/page/api-piezometrie. The downloaded data are stored in a MongoDB database and more precisely in 2 collections:
-
the
observations
collection stores the observed data:- the water table level
profondeur_nappe
in meter (m) - the water table level in ngf format
niveau_eau_ngf
in meter (m)
- the water table level
-
the
stations
collection stores the data of the stations- the field
DATE_FIN_MESURE
is used to define older stations as inactive (is_active: false
) and should not be requested by the observations job.
- the field
The project consists in 2 jobs:
-
the
stations
job scrapes the stations data according a specific cron expression. By default, every day at midnight. -
the
observations
job scrapes the observations according a specific cron expression. By default every hour at 15 minutes.
Variable | Description |
---|---|
DB_URL |
The database URL. The default value is mongodb://127.0.0.1:27017/hubeau |
CODE_DEP |
list of department codes to filter the stations. (ie: "75", "92" ), default is all 101 french departments |
DATE_FIN_MESURE |
Deadline defining all older stations as inactive, default is 2022-01-01 |
DEBUG |
Enables debug output. Set it to krawler* to enable full output. By default it is undefined. |
Variable | Description |
---|---|
DB_URL |
The database URL. The default value is mongodb://127.0.0.1:27017/hubeau |
TTL |
The observations data time to live. It must be expressed in seconds and the default value is 604 800 (7 days) |
HISTORY |
The duration of the observations data history the job has to download. It must be expressed in milliseconds (should be full days) and the default value is 86 400 000 (1 day) |
TIMEOUT |
The maximum duration of the job. It must be in milliseconds and the default value is 1 800 000 (30 minutes). |
DEBUG |
Enables debug output. Set it to krawler* to enable full output. By default it is undefined. |
We personally use Kargo to deploy the service.
Please refer to contribution section for more details.
This project is sponsored by
This project is licensed under the MIT License - see the license file for details