Most data is obtained through one of 3 methods
- A) Scraping data from council websites
- B) Spreadsheets / csv file datasets sent in by local authorities and community groups
- C) Data extracted from the crowd-sourced OpenStreetMap dataset (see Data/query.xml for the query used)
- Some data sources come with lat / lng coords but some don't
- For those that don't I geocode the addresses
- If you add a google geocoding API key to
api_key.txt
the google geocoding API will be used for geocoding, otherwise it will default to Nominatim geocoding
The aim is to extract as much detailed public toilet data from a webpage as possible, and process it into a form that is consumable by the Toilets4London REST Api. At the moment the workflow is that I regularly run the scripts, check their output against the previous run and see whether anything has changed. If yes, I delete the changed toilets from the API and re-POST the new ones, or if only one or two have changed, manually update the required records. To add new toilets, I simply POST the new batch of toilets to the Api for integration into the database.
Some of the web scraping scripts are semi-manual, meaning that they require me to use Google Maps or similar to find the lat / lng coordinates for the address. This occurs when councils do not add lat / lng coordinates to their website (for example as they do not display a map) and don't offer full, geocodable addresses (for example if the toilet is in a park).
You can see the outputs of all the scripts in the Data/ directory. These outputs are what I will use to compare when I next run the scripts to keep the API updated. Git diffs help a lot with this.
DataFetcher.py
has a dictionary of functions that can be called to extract data from the various sources
- Data should be processed to end up in the following format, so that it can be uploaded to the toilets4london API
{
"data_source": "[URL] [DATE]/ Data sent in by [ORGANISATION] on [DATE]",
"borough": "[See Data/boroughs.txt for acceptable borough names]",
"address": "2-6 Woodgrange Road London E7 0QH",
"opening_hours": "7am-6pm Monday to Friday",
"name": "The Gate Community Neighbourhood Centre and Library",
"baby_change": true,
"latitude": 51.5485568,
"longitude": 0.024924,
"wheelchair": true,
"fee": "20p",
"open": true
}
- If the toilet is currently closed but may be open in the future, the
open
field can be set tofalse
- If fee is left out, it defaults to a free toilet
borough
,latitude
andlongitude
are required fields
If unchecked this means I could neither find a reliable way to extract toilet data from that council's website / another data source using a script, nor did that council send me a dataset. This doesn't mean I have no data for that borough. It just means instead I use the generic sources listed below, manual uploads and locations suggested through the app.
- City of London
- Barking and Dagenham
- Barnet
- Bexley
- Brent
- Bromley
- Camden
- Croydon
- Ealing
- Enfield
- Greenwich
- Hackney
- Hammersmith and Fulham
- Haringey
- Harrow
- Havering
- Hillingdon
- Hounslow
- Islington
- Kensington and Chelsea
- Kingston upon Thames
- Lambeth
- Lewisham
- Merton
- Newham
- Redbridge
- Richmond upon Thames
- Southwark
- Sutton
- Tower Hamlets
- Waltham Forest
- Wandsworth
- Westminster
- Barking (haven't found live source)
- Barnet
- Camden
- Hillingdon
- Lambeth
- Lewisham
- Merton
- Newham
- Redbridge
- Richmond
- Southwark
- Sutton
- Wandsworth
Croydon council's website says
Due to ongoing concerns in relation to the COVID-19 pandemic Croydon Council are not currently in a position to introduce the planned Community Toilet Scheme across the borough.
and also
Croydon Council will provide an update on the Community Toilet Scheme in due course.
Couldn't find any good primary sources to scrape data from for these boroughs - let me know if you do :)
- Healthmatic (public toilet contractor) sent in a spreadsheet
- Sainsbury's store locator website
- Transport for London API
- OpenStreetMap
- All OpenStreetMap data is © OpenStreetMap contributors
- OpenStreetMap data is available under the Open Database Licence