We want to download several layers from geoserver every day in order to create snapshots that can be used as geo-temporal data later on.
- support for GML files only
- create shapshots once per day or less frequently
- retry download if failed (but only up to some retry limit, e.g max 3 times)
- use content from previous day if failed to download (but keep track of it)
- the data can potentially be several GB of XML
- store the snapshots efficiently (compressed, only diffs ...)
- use cron to trigger the download every N minutes
- when triggered, only a single GML file should be downloaded at a time
- keep low profile - avoid parallel downloads, limit transfer rate
- every file is downloaded only once per day
- we need a URL of the geoserver and a directory DIR where the layers will be downloaded
script download_layer.sh URL DIR
:
- picks the oldest
*.gml
file from DIR - downloads the corresponding layer from URL using WFS as GML (XML)
- formats the XML files using
xmllint --format
- removes
fid
XML attributes (because they are always newly generated by the geoserver) - for every
*.gml
file a corresponding*.meta
file will be generated which contains some accounting information about the download
duplicity
:
- a useful tool for incremental backups (see usage examples below)
duplicity -vi --allow-source-mismatch --no-encryption path/to/src/dir file://path/to/snapshot/dir
-vi
= verbosity level is "info"--allow-source-mismatch
allows that the names of source dirs can be changed
duplicity colletion-status file://path/to/my/snapshot/dir
duplicity restore --no-encryption --time 2016-06-30T11:00:00 file://path/to/snapshot/dir path/to/output/dir
Assuming we want to compare differences between directories dir1
and dir2
and that we want to ignore files matching a pattern *.meta
:
diff -x '*.meta' dir1 dir2 | diffstats
Output should look like this:
include/net/bluetooth/l2cap.h | 6 ++++++
net/bluetooth/l2cap.c | 18 +++++++++---------
2 files changed, 15 insertions(+), 9 deletions(-)
Assuming you are in some directory which contains files and the prefix is "PREFIX" (This is just a quick and dirty method, there is certainly a better way to do so)
find . | while read F; do mv $F ${F#./PREFIX}; done