databird

Periodically retrieve data from different sources.

The databird package only provides a framework to plan and run the tasks needed to keep a local data-file-store up do date with various remote sources. The remote sources can be anything (e.g. FTP Server, ECMWF, HTTP Api, SQL database, ...), as long as there is a databird-driver available for the specific source.

Usage

Databird is configured with configuration files and invoked by

$ databird retrieve -c /etc/databird/databird.conf

# or (as the above is the default)
$ databird retrieve

You can store the configuration files anywhere and for example run the above command periodically as cron job.

Also, some rq workers are required:

$ rq worker databird

This will start one worker. You should use a supervisor to start multiple workers.

Configuration

The following example configuration defines a repository, which is populated with daily GNSS data from ftp://cddis.nasa.gov/gnss/data/daily/.

The main configuration file (usually databird.conf) could look like that:

general:
  root: /data/repos # root path for data repositories
  num-workers: 16   # max number of async workers
  include: "databird.conf.d/*.conf"  # include config files

Generally you can configure anything in any file, as all configuration files are merged to one configuration tree. The include option is an exception, as it can only be declared in the top config file.

Then in databird.conf.d/cddis.conf you can configure a profile and a repository:

profiles:
  nasa_cddis:
    driver: standard.FtpDriver
    configuration:
      host: cddis.nasa.gov
      user: anonymous
      password: ""
      tls: False
       
repositories:
  nasa_gnss:
    description: Data from NASAs Archive of Space Geodesy Data
    profile: nasa_cddis
    period: 1 day
    delay: 2 days
    start: 2019-01-01
    targets:
      status: "{time:%Y}/cddis_gnss_{iso_date}.status"
    configuration:
      user: anonymous  # this could override 'user' from profile
      root: "/gnss/data/daily"
      patterns:
        status: "{time:%Y}/{time:%j}/{time:%y%j}.status"

When calling databird with this configuration the following is achieved:

A repository in the folder /data/repos/nasa_gnss/ is created
For every day, a file like 2019/nasa_gnss_2019-01-20.status is expected
If that file is missing, retrieve it from ftp://cddis.nasa.gov/gnss/data/daily/2019/020/19020.status
If there are many files missing, the data is retrieved asynchronously

This example used the standard.FTPDriver.

Monitoring

Use databird webmonitor [PORT] to start the web interface.

Since databird uses RQ for managing jobs, you also check the options at RQ/docs/monitoring.

Drivers

Anyone can write drivers (see below). Currently, the following drivers are available:

Included:

standard.FilesystemDriver: Retrieve data from the local filesystem
standard.CommandDriver: Run an arbitrary shell command
standard.FtpDriver: Retrieve data from an FTP server

Climate:

climate.EcmwfDriver: Retrieve data from the European Centre for Medium-Range Weather Forecasts (ECMWF) via their API
climate.C3SDriver: Retrieve data from the Copernicus Climate Change Service (C3S) via their API
climate.GesDiscDriver: Retrieve data from the NASA EarthData GES DISC service.

Development

Create a Python environment and activate it

$ python3 -m venv . && source bin/activate

Install the development environment:

(databird) $ pip install -r requirements-dev.txt

Writing a new driver

Drivers are published in a namespace package databird-drivers. Everyone can develop drivers and share them.

Install databird and run mr.bob to create a new driver package:

(databird) $ cd $HOME/projects
(databird) $ python -m mrbob.cli databird.blueprints:driver

After answering some questions, a new directory databird-driver-<chosen_name> is created. Lets asume <chosen_name> = foo, then your driver is usually implemented in databird/drivers/foo/foo.py in a class named FooDriver(). Until more documentation is available, you have to look at the code to figure out how to write a driver.

Other people will be able to use it with driver: foo.FooDriver.

Tell me if you wrote a new driver, so I can include it in the list.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
databird		databird
databird_drivers		databird_drivers
examples		examples
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements-dev.txt		requirements-dev.txt
requirements-test.txt		requirements-test.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

databird

Usage

Configuration

Monitoring

Drivers

Development

Writing a new driver

About

Releases

Packages

Languages

License

jonas-hagen/databird

Folders and files

Latest commit

History

Repository files navigation

databird

Usage

Configuration

Monitoring

Drivers

Development

Writing a new driver

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages