Skip to content

Schema Daily Data Dumps

Greg Malcolm edited this page Jun 21, 2016 · 5 revisions

For ETLing convenience a zipfile dump of each api resource will be available in the form of a zipped json file. It will of course need to be supplemented with data from the current day pulled from the live api. The format will be almost identical to the "index" action for each resource.

Why not just pull it all through the api? To avoid pagination when you need to download everything. Also this way should be a little less stressful on resources.

Each resource has 2 actions:

  • download - downloads a json zipfile
  • md5 - give the md5 sum for the zipfile. So you can confirm that the download arrived intact.

Example of how this would be used.

Let's say we want to populate a Stars table from scratch.

  1. Download the stars json zipfile

http://api.edmaterializer.com/api/v4/stars/download

The download will be of mimetype application/zip.

Note: It may pause a little at first if it has to regenerate the zip file. It regenerates every day or after each web app deployment.

  1. (Optional) Verify the zipfile is correct by comparing md5 sums. The sum is here:

http://api.edmaterializer.com/api/v4/stars/md5

There are lots of ways to run md5sum checks. Some details are here:

https://en.wikipedia.org/wiki/Md5sum

  1. Unzip, extract json data

  2. Supplement with today's data from api:

http://api.edmaterializer.com/api/v4/stars?updated_after=2016-16-03

Note: Don't forget to adjust the data in the query param. Also there may be a few repeat entries.

So the full list is (as of v4):

http://api.edmaterializer.com/api/v4/stars/download http://api.edmaterializer.com/api/v4/worlds/download http://api.edmaterializer.com/api/v4/surveys/download http://api.edmaterializer.com/api/v4/world_surveys/download http://api.edmaterializer.com/api/v4/basecamps/download

Clone this wiki locally