Skip to content

MariDataHarvest is a tool for scrapping and harvesting Automatic Identification System (AIS) data provided by marinecadastre then appending it with the weather and environment conditions provided by CMEMS and RDA at each geographical and UTC timestamp point.

License

Notifications You must be signed in to change notification settings

52North/MariDataHarvest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MariDataHarvest

TODOS

  • List only required/directly used dependencies in dev branch without versions, unless they are required in THIS version. Split requirements.txt and list only directly used ones.

    • EnvironmentalData
    • Harvester
    • EnvDataServer
  • Solve context path problem (via configuration or similar)

  • Code-Cleaning:

    • Remove unused imports
    • [] functions, variables
  • Move every config variable to config.py

  • Establish central location for variable definition: keep in weather.py; EnvDataServer/app.py accesses them via python features; page.js accesses theme via light weight endpoint implemented in app.py Convert to dictionary and provide helper functions to retrieve the required representation

  • Establish dev branch

  • Add Package/Component Overview to root README.md Document, who is using what part of the utilities package.

  • Comment wind variable in webui (and API) page.js:95 comment EnvDataServer/app.py:159 remove Wind from list of variables if not configurable

  • WebUI: set time bbox to current day (limit input to hours only, no minutes, if easily possible)

  • Store last request in cookie or Browser-Storage

  • History Back does not work

  • Replace MOTU bei OPeNDAP

  • Dockerfile of harvester:

    • reuqirements.txt -> new name
    • copy package and the dependency packages (utilities, environmentData)
    • adjust CMD to match new python package structure
      • add env pythonpath
      • change CMD to variant with "[" s
  • Identify useful k8s/proxy timeout or switch to asynchronous communication pattern with managed request queue

Description

MariDataHarvest is a tool for scrapping and harvesting Automatic Identification System (AIS) data provided by marinecadastre then appending it with the weather and environment conditions provided by CMEMS and RDA at each geographical and UTC timestamp point. In the following is a description of the datasets used:

  • Datasets Description

    Datasets Description

  • Variables Description

    Variables Description

This tool is developed with in the MariData project.

Requirements

MariDataHarvest requires python 3 and pip to run. You can install all python requirements with the following command:

pip install -r requirements.txt

For a detailed list, see section License below.

Usage

The script requires accounts for the following webservices:

The credentials of these services MUST be entered into a file called .env.secret as outlined here:

UN_CMEMS=
PW_CMEMS=
UN_RDA=
PW_RDA=

Start harvesting with the following command:

python main.py --year=2019 --minutes=30 --dir=C:\..
  • year: the year(s) to download AIS-data. Expected input a year 'YYYY' , a range of years 'YYYY-YYYY' or multiple years 'YYYY,YYYY,YYYY'.

  • minutes: is the sub-sampling interval in minutes.

  • dir: the absolute path where to keep data. If empty, the directory is same as the project directory.

  • optional arguments:

    • step: starts the script at a specific step:

      1. Download,
      2. Subsample,
      3. Appending weather data.

      If step equals 0 (default value), the script runs all steps starting from step 1.

    • clear: clears files of year ONLY after step 2 is done.

    • depth_first: runs all steps for each file, which automatically deactivates step argument.

Development

Start the EnvDataAPI services locally for testing using the following command in the EndDataServer directory:

export PYTHONPATH="$PYTHONPATH:.:EnvironmentalData:utilities"
python ./EnvDataServer/app.py

Docker

You can use the Dockerfile to build a docker image and run the script in its own isolated environment. It is recommended to provide a volume to persist the data between each run. You can specify all arguments including the optional ones as environment variables when creating/starting the container as outlined in the following. The labels used are following the Image And Container Label Specification of 52°North.

  1. Build:

    docker build \
       --build-arg GIT_COMMIT=$(git rev-parse -q --verify HEAD) \
       --build-arg BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ") \
       --file Dockerfile.harvester \
       -t 52north/mari-data_harvester:1.0.0 .

    Ensure, that the version of the image tag is matching the version in the Dockerfile, here: 1.0.0.

  2. Create named volume:

    docker volume create \
       --label org.52north.contact=dev-opser+mari-data_harvester@example.org \
       --label org.52north.context="MariData Project: Data Harvesting Script" \
       --label org.52north.end-of-life=$(date -d "+365 days" -u +"%Y-%m-%dT%H:%M:%SZ") \
       mari-data-harvester_data
  3. Run:

    docker run \
       --label org.52north.contact=dev-opser+mari-data_harvester@example.org \
       --label org.52north.context="MariData Project: Data Harvesting Script" \
       --label org.52north.end-of-life=$(date -d "+365 days" -u +"%Y-%m-%dT%H:%M:%SZ") \
       --label org.52north.created='$(date -u +"%Y-%m-%dT%H:%M:%SZ")' \
       --volume mari-data-harvester_data:/mari-data/data \
       --volume .env.secret:/mari-data/EnvironmentalData/.env.secret:ro \
       --env-file docker.env \
       --name=mari-data_harvester \
       --detach \
       52north/mari-data_harvester:1.0.0 \
       && docker logs --follow mari-data_harvester

    with docker.env containing the following information:

    YEAR=2015-2021
    MINUTES=5
    DATA_DIR=/mari-data/data
    STEP=0
    DEPTH_FIRST=--depth-first
    CLEAR=--clear

Deployment

Use the following command to send the code to any server for building the image (or clone this repository using git clone...) and run it:

rsync --recursive --verbose --times --rsh ssh \
   --exclude='AIS-DATA' --exclude='*.tmp' \
   --exclude='*.swp' --exclude='.vscode' \
   --exclude='__pycache__' --delete . \
   mari-data-harvester.example.org:/home/user/mari-data-harvester

Web Access to the Data

We are using an nginx container to provide web access to the generated data. It requires an external service to maintain the ssl certificates. The according data volume is mounted read-only and externally provided, hence docker-compose does not create it with a project prefix.

Just execute the following command in root folder of the repository to start the service:

docker-compose up -d --build && docker-compose logs --follow

The data is available directly at the server root via https. All requests to http are redirected to https by default.

Contact

License

This application is licensed under the GPLv2 (see LICENSE).

Name Version License
Deprecated 1.2.13 MIT License
Flask 1.1.4 BSD License
Flask-Limiter 1.5 MIT License
Jinja2 2.11.3 BSD License
MarkupSafe 1.1.1 BSD License
Paste 3.5.2 MIT License
PyYAML 5.4.1 MIT License
Werkzeug 1.0.1 BSD License
beautifulsoup4 4.11.1 MIT License
certifi 2022.12.7 Mozilla Public License 2.0 (MPL 2.0)
cftime 1.6.2 MIT License
charset-normalizer 3.0.1 MIT License
click 7.1.2 BSD License
idna 3.4 BSD License
itsdangerous 1.1.0 BSD License
joblib 1.2.0 BSD License
limits 3.2.0 MIT License
motuclient 1.8.8 GNU Lesser General Public License v3 (LGPLv3)
netCDF4 1.6.2 MIT License
numpy 1.24.1 BSD License
packaging 23.0 Apache Software License; BSD License
pandas 1.5.3 BSD License
protobuf 4.21.12 3-Clause BSD License
python-dateutil 2.8.2 Apache Software License; BSD License
python-dotenv 0.17.0 BSD License
pytz 2021.1 MIT License
requests 2.28.2 Apache Software License
scikit-learn 0.24.0 new BSD
scipy 1.10.0 BSD License
siphon 0.9 BSD License
six 1.16.0 MIT License
soupsieve 2.3.2.post1 MIT License
threadpoolctl 3.1.0 BSD License
typing_extensions 4.4.0 Python Software Foundation License
urllib3 1.26.14 MIT License
waitress 2.1.2 Zope Public License
wrapt 1.14.1 BSD License
xarray 0.17.0 Apache Software License
generate license list
docker compose run --interactive --rm api /bin/bash \
   -c "pip install --no-warn-script-location --no-cache-dir pip-licenses > /dev/null && .local/bin/pip-licenses -f markdown"

Funding

Project/Logo Description
MariData MariGeoRoute is funded by the German Federal Ministry of Economic Affairs and Energy (BMWi)BMWi

About

MariDataHarvest is a tool for scrapping and harvesting Automatic Identification System (AIS) data provided by marinecadastre then appending it with the weather and environment conditions provided by CMEMS and RDA at each geographical and UTC timestamp point.

Topics

Resources

License

Stars

Watchers

Forks