Skip to content
This repository has been archived by the owner on Apr 14, 2023. It is now read-only.

sidewalklabs/oldto

Repository files navigation

OldTO

OldTO was a site that showcased historic photographs of Toronto by placing them on a map.

You can read more about it on the Sidewalk Labs Blog.

Here's a screen recording of what OldTO looked like (YouTube):

Screen recording of OldTO

While the OldTO is no longer hosted by Sidewalk Labs, the source code is all available in this repo and it is possible to run it yourself. The instructions below describe how to do this.

How it works

OldTO begins with data from the Toronto Archives, which you can find in data/images.ndjson.

To place the images on a map ("geocode" them), we use a list of Toronto street names and a collection of regular expressions which look for addresses and cross-streets. We send these through the Google Maps Geocoding API to get latitudes and longitudes for the images. We also incorporate a set of points of interest for popular locations like the CN Tower or City Hall.

Development setup

Setup dependencies (on a Mac):

brew install coreutils csvkit

OldTO requires Python 3. Once you have this set up, you can install the Python dependencies in a virtual environment via:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Running the site

Server

The data for the OldTO site is served via a Python API server. Start by running this:

source venv/bin/activate
oldtoronto/devserver.py data/images.geojson

If you've generated geocodes in a different location, change data/images.geojson to that.

Web application

The OldTO site lives in oldto-site. In order to build it, you'll need the yarn package manager. Instructions on setting that up at https://yarnpkg.com/.

You'll also need to get a Google Maps API key. Once you've done this, set the enviroment variable GMAPS_API_KEY to your own api key:

export GMAPS_API_KEY=...

Webpack needs this to build the site when you run yarn webpack. You can spin it up by running it locally using http-server (install with npm install -g http-server).

cd oldto-site
yarn          # install dependencies
yarn webpack  # bundle JavaScript and build site
cd dist
http-server --proxy=http://localhost:8081

Then visit http://localhost:8080/ to browse the site.

To iterate on the site, use yarn watch:

cd oldto-site
yarn watch &
cd dist
http-server --proxy=http://localhost:8081

Generating new geocodes

First, add your Google Maps API key to the file oldtoronto/settings.py.

Next, you'll first want to download cached geocodes from here. Unzip this file into cache/maps.googleapis.com. This will make the geocoding pipeline run faster and more consistently than geocoding from scratch.

With this in place, you can update images.geojson by running:

make

Note, to run the makefile on an OSX machine you will probably want to install md5sum, which can be done by running:

brew update && brew install md5sha1sum

Analyzing results and changes

Before sending out a PR with geocoding changes, you'll want to run a diff to evaluate the change.

For a quick check, you can operate on a 5% sample and diff that against master:

oldtoronto/geocode.py --sample 0.05 --output /tmp/geocode_results.new.5pct.json
oldtoronto/diff_geocodes.py --sample 0.05 /tmp/geocode_results.new.5pct.json

To calculate metrics using truth data (must have jq installed):

grep -E  "$(jq '.features[] | .id' data/truth.gtjson | sed s/\"//g | paste -s -d '|' )" data/images.ndjson > data/test.images.ndjson
oldtoronto/geocode.py --input data/test.images.ndjson
oldtoronto/generate_geojson.py --geocode_results data/test.images.ndjson --output data/test.images.geojson
oldtoronto/calculate_metrics.py --truth_data data/truth.gtjson --computed_data data/test.images.geojson

To debug a specific image ID, run something like:

oldtoronto/geocode.py --ids 520805 --output /tmp/geocode.json && \
cat oldtoronto/geocode.py.log | grep -v regex

If you want to understand the differences between two images.geojson files, you can use the diff_geojson.py script. This file will create a series of .geojson files showing differences between an A and B GeoJSON. This is useful for using with the data collected to the corrections google forms. Use those along with the check_changes_using_* scripts.

Once you're ready to send the PR, run a diff on the full geocodes.

Update street names

To update the list of street names, run:

oldtoronto/extract_noun_phrases.py streets 1 > /tmp/streets+examples.txt && \
cut -f2 /tmp/streets+examples.txt | sed 1d | sort > data/streets.txt