Skip to content

Log of steps to reproduce #291

Vicky Vergara edited this page Mar 25, 2015 · 2 revisions
  1. got the data: http://www.geofabrik.de/data/download.html

  2. Find out how to install the data into the database (fail is the first 5 or 6 possible links don’t look promising)

      createdb -E UTF8 mexico
      psql mexico
          create extension postgis;
      sudo apt-get install osm2pgsql
      osm2pgsql -help
      osm2pgsql --slim -a -d mexico mexico-latest.osm.pbf

I have the osm data now, could view it using qgis, and:

 \dS planet_osm_roads

               Table "public.planet_osm_roads"
       Column       |            Type             | Modifiers 
--------------------+-----------------------------+-----------
 osm_id             | bigint                      | 
 oneway             | text                        | 
 way                | geometry(LineString,900913) | 

lots of columns!!!! I see this very interesting http://wiki.openstreetmap.org/wiki/Way

I am analysing the data I get:

select oneway,count(*) from planet_osm_roads group by oneway;
 oneway | count 
--------+-------
        | 28491
 no     |  3231
 1      |     8
 -1     |   874
 yes    | 45272
 no;yes |     1
(6 rows)
explicitly there are more 1 way roads than 2 way roads

select oneway,count(*) from planet_osm_line group by oneway;
 oneway | count  
--------+--------
        | 737887
 no     |  29602
 1      |     15
 -1     |   3291
 yes    |  89727
 no;yes |      2

http://wiki.openstreetmap.org/wiki/Key:oneway 737887 records don’t have info about direction in lines

  • Also analysing with Qgis:

I can see the street where I live, but has no name.

Select count(*) from planet_osm_roads where name is NULL;
 count 
-------
 42781

select count(*) from planet_osm_roads where name is not NULL;
 count 
-------
 35096

select count(*) from planet_osm_line where name is NULL;
 count  
--------
 599804
  • Thinking on what I need and can do:

    • queries with the name involved its out of the question.

    • If I want to route cars its directed graph, if I want to route pedestrians is undirected.

    • I want to follow the structure of http://docs.pgrouting.org/2.0/en/doc/src/developer/sampledata.html#sampledata because they use that in the majority of the functions, so it is wise thing to follow the structure: osm_id source target cost reverse_cost way the geometry is there but with name way so to avoid doing way as the_geom I need to change the name.

      Better: in a new table, with only the data relevant for pgRouting, so I will copy the structure in the sample data, X,Y I can get them later based on the geometry. except that I will have an additional column osm_id bigint to reference the original data if necessary.

    • I will use only planet_osm_line has 860,524 records

    • to simplify the problem (for issue purposes) Can I ignore the fact that there are railways, waterways, etc. and think of them as roads?. (maybe its on those that the NULL applies, going to check this) decided to get rid of them. 2 options:

      • logically delete them with: using set cost = -1, reverse cost = -1
      • physically deleting them, (opting for this)
create table deleted as select *  from planet_osm_line
 where ("waterway" is not null or "aerialway"  is not null 
     or "aeroway"  is not null or "railway" is not null
     or "route" is not null or boundary is not null);
delete from  planet_osm_line 
 where ("waterway" is not null or "aerialway"  is not null 
     or "aeroway"  is not null or "railway" is not null
     or "route" is not null or boundary is not null);

source and target columns are going o be filled up by pgRouting so I have to decide what cost, and reverse cost means. Based on the length of the geometry which is a distance, very practical, because in case I want to base the costs on time, an operation cost/speed and give that as cost.

  • Have 4 cases:
    • Tagged as one way: "oneway" = 'yes' or "oneway" = '1'

      ```
      cost = length
      reverse_cost = -1
      ```
      
    • Tagged as one way with geometry reversed: "oneway" = '-1'

      ```
      cost = -1
      reverse_cost = length
      ```
      
    • Tagged as two way: "oneway" = 'no'

      ```
      cost = length
      reverse_cost = length
      ```
      
    • Not Tagged and contradictory tagging: "oneway" is null or "oneway" = 'no;yes' , this is a problem,

select count(*) from planet_osm_line where ("oneway" is null or  "oneway" = 'no;yes');
 708488

select count(*) from planet_osm_line where not ("oneway" is null or  "oneway" = 'no;yes');
 122533

The mayority of the lines are not tagged, so the decision on how I am going to handle cost and reverse_cost I will make it by inspecting the data, and some numbers and some reasoning

  • by inspection: I compared the places I know in the Qgis map I made, vs google maps, so by visual inspection the untaged streets are one way.

  • some numbers:

The following table, 14.75% of the data is tagged, 85% is not. So what "guessing" combination for cost, reverse_causes the least error? One way streets are 73%, with reverse geometry is 2%, and two way is 24%, I those percentages hold on the untagged streets, and choose two-way then I 64% of the data is going to be wrong, if I choose reversed one way 82% is going to be wrong, if I choose one-way 22% is going to be wrong. So best is to choose 1 way.

| ------ |--------- |---------| -------- | ----- | ------- | --------- |
| oneway | # of     |   % of  |   tag    | # of  | % of    | % of      |
| status | records  | records |          | recs  | recs    |total recs |
|        |   A      |  B/T    |          |  C    | C/A*100 | C/T*100   |
| ------ |--------- |---------| -------- | ----- | ------- | --------- |
|        |          |         | one way  | 89685 |  73.2%  | 10.8%     |
| tagged |   122533 | 14.75%  | rev. one |  3281 |   2.7%  |  0.3%     | 
|        |          |         | two way  | 29567 |  24.1%  |  3.5%     |
| ------ |--------- |---------| -------- | ----- | ------- | --------- |--------- |
|                                                                      | estimated|
| ------ |--------- |---------| -------- | ----- | ------- | --------- |--------- |
|        |          |         | one way  |       |  73.2%  | 62.4%     | 518613   |                      
| un     |   708488 |  85.25% | rev. one |       |   2.7%  |  2.3%     |  19129   |
| tagged |          |         | two way  |       |  24.1%  | 20.5%     | 145240   |
| ------ |--------- |---------| -------- | ----- | ------- | --------- |--------- |
| Total  | T=831021 | 100%    |                            | 99.8%     |
| ------ |--------- |---------| -------- | ----- | ------- | --------- |
  • By reasoning

    If the data is most of the time collected by people on cars, then osm will receive the sequence of points in the order the car is circulating, so the direction of the geometry indicates the direction of a circulating car. So, going in the direction of the geometry goes with the flow of the cars. And to see if its two way, maybe not enough valid data osm has received to verify its two way.

Decision: I'll treat un-tagged as one way streets as follows

          cost = length
          reverse_cost = -1

With that, 88%+- of the records are expect to be correct.

Clone this wiki locally