An osm2pgsql multi-backend style designed to simplify consumption of OSM data for rendering, export, or analysis.
ClearTables is currently under rapid development, and schema changes will frequently require database reloads.
- osm2pgsql 0.90.1 or later. Early versions after 0.86.0 may still work with bugs.
- Lua, required for both osm2pgsql and testing the transforms
- PostgreSQL 9.1 or later
- PostGIS 2.0 or later
- Python with PyYAML
- Make. Any version of Make should work, or the commands are simple enough to run by hand.
make
createdb ct
psql -d ct -c 'CREATE EXTENSION postgis; CREATE EXTENSION hstore;'
cat sql/types/*.sql | psql -1Xq -d ct
# Add other osm2pgsql flags for large imports, updates, etc
osm2pgsql -d ct --number-processes 2 --output multi --style cleartables.json extract.osm.pbf
cat sql/post/*.sql | psql -1Xq -d ct
Replace ct
with the name of your database if naming it differently.
osm2pgsql will connect to PostgreSQL once per process for each table, for a total of processes * tables connections.
If PostgreSQL max_connections
is increased from the default, --number-processes
can be increased. If --number-processes
is omitted, osm2pgsql will
attempt to use as many processes as hardware threads.
These are still a bit vague, and might be split into principles and practices
-
Simplify data for the consumer
-
Use PostgreSQL types other than
text
if appropriate -
Use boolean for yes/no values
-
Use enum types where there's a limited list of possibilities independent of data to be included, or a well defined ordering
Addresses and buildings have a many-to-many relationship. Multiple addresses inside one building are very common, and multiple buildings in one address can be found. If rendering, a separate table is fine, and if doing an analysis these cases need to be considered which requires joins.
A road may have multiple refs, and it's wrong to ignore this. To pretend that
there's only one ref, use SQL like array_to_string(refs, E'\n')
or
array_to_string(refs, ';')
. The latter will reform the ref tag as it was in
the original data.
ClearTables uses the hstore type but doesn't support --hstore
.
-
The goal of ClearTables is to abstract away OSM tagging. Copying all the tags to the output is contrary to this.
-
Copying all tags is technically possible, but wouldn't be done with
--hstore
, instead it would be done similar to the names column. The--hstore
option doesn't work well when using custom column names which may collide with OSM tags. -
With tables for different types of features fine-grained selection of appropriate columns is possible and hstore isn't necessary.
-
Values within a hstore are untyped which is contrary to the principle of using appropriate types.
Bug reports, suggestions and (especially!) pull requests are very welcome on the Github issue tracker. Please check the tracker to see if your issue is already known, and be nice. For questions, please use IRC (irc.oftc.net or http://irc.osm.org, channel #osm-dev) and http://help.osm.org.
- 2sp for YAML, 4sp for Lua
tags
are OSM tags,cols
are database columns- Space after function name when defining a function, e.g.
function f (args)
- Tests for all Lua functions except ones which are only tail calls
- Use
_polygon
and_point
suffix when there will be two tables holding the same type of object represented differently (e.g. most POIs) - Use
_area
when there isn't a corresponding_point
table for the same object, but there is another table for points or lines of a similar class but different objects (e.g.wood_areas
for forests andwood_line
for rows of trees)
- Always set columns to strings, even if they're only true/false. It's unwise to count on anything else making it from Lua to C to C++ to PostgreSQL. This lets PostgreSQL do the only coversion.
- Test particular columns of a transform function instead of the entire output table, e.g.
assert(transform({foo="bar"}).baz == "qux")
instead ofassert(deepcompare(transform({foo="bar"}), {baz="qux"}))
.
Issues tagged with new column are often good ones to get started with. Issues tagged experimental are focused on researching new best practices and state of the art.
- GeoFabrik Shapefiles in both their free schema and commercially produced schema
- CartoDB OSM POIs