Replies: 4 comments 3 replies
-
So the update via --append finished and it is "up to eight times slower", in the concrete case 48hrs instead of 6hrs (same data size to be processed from diffs). Some observations: even with the Did another test on a different server with higher single core performance but less cores (also nvme disks), here is the result of the osm2pgsql --append (with number-processes 6):
So that one took only 29hrs instead of 6hrs but this is still too slow. The problem is at the reading input file stage, which takes forever. The later stages are blazing fast. At the reading input file stage, I do not see any real resource consumption, neither cpu nor ram nor disk i/o. In later stages i/o goes up to 500+MB/s read and write, so that stage is running fast when using multiple cores. By the way: the initial planet import (--create) took about 7 to 8hrs. So - as this is posted in q&a - my questions to tackle these problems:
|
Beta Was this translation helpful? Give feedback.
-
The fixation on --number-processes was a misleading path, everything seems to be fine with --number-processes. I now downgraded osm2pgsql from 1.8.1 to 1.60 for testing but to no avail. --append is still about 10 times slower than on an older osm2pgsql setup with postgresql 10, osm2pgsql 1.20 and postgis 2.4. Tried both the settings for postgresql as used in that old setup as well as the ones recommended at osm2pgsql.org. Initial import (--create mode) is way faster than the older setup, but the hardware is also better. It's like twice as fast, about 7 to 8 hours for a planet import. But the --append mode unbearable slow with like 0.5k/s node processing instead of 5k/s node processing. Overall --append operation is like 5 to 10 times slower. I still have the feeling of being stupid in missing sth. obvious.... E.g. as this is a test with osm-carto, I haven't applied the indexes.sql from osm-carto yet, but I thought only osm2pgsqls indexes are important for osm2pgsql processes and the style-specific ones are just for the rendering... |
Beta Was this translation helpful? Give feedback.
-
Just to throw in another anecdote here, in response to
I am NOT seeing osm2pgsql 1.6.0 as being slower than previous versions. I run it in several places, some of which are turned off most of the time, and therefore have to "catch up" in append mode when turned back on, essentially following https://switch2osm.org/serving-tiles/updating-as-people-edit-pyosmium/ . |
Beta Was this translation helpful? Give feedback.
-
Okay, first to get some stuff out of the way that's probably not the problem:
Now something which might be a problem: You are using quite a large node cache ( And finally: Where are those changes coming from? Your changes look rather large (44 mio nodes), probably something like two weeks of changes in OSM or so. If you work with changes that large you are almost certainly better off updating the data file (with osmium or so) and then doing a full re-import in non-slim mode every time you want to update. Updates have always been several orders of magnitude slower than imports, so working with updates really only makes sense if you need to keep up-to-date with minutely or, at most, daily diffs. I tried an import/update with settings similar to yours (but without the cache) and I am getting 3k/s for the nodes. That's not great but it is in line with what I'd expect and what you mention you had before. Of course these numbers are not really comparable between systems, but we are only talking about the order of magnitude here. Now what is interesting is looking at where this time goes. One thing I measured was the time it took to COPY blocks of data into the You can clearly see two phases, in the first phase we are probably updating existing nodes which takes more and more time for each block. I don't know why that is and need to investigate further, but it supports my argument from above that you want to avoid large changes. If you absolutely have to work with changes it might be better to feed them in in smaller chunks. In the second phase new nodes come in and the speed is okay again. There is definitely a lot of room for improving osm2pgsql here. But updating with large change files is, as I said, not recommended anyway and will always be slower than an import. That's why this use case doesn't have a high priority. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I'm currently testing out the new osm2pgsql version 1.8.1 but for some reason the data updating via --append is way slower than on an older dev setup.
The new setup uses:
osm2pgsql version 1.8.1
postgresql version 15.3
postgis version 3.3
as well as better hardware.
But for some reason the --append node processing speed is about 7 to 8 times slower than on an older setup that used:
osm2pgsql 1.2.0
postgresql 10.23
postgis 2.4
The command used is:
osm2pgsql --append --slim -d <dbname> --number-processes 10 -C 64000 --hstore --flat-nodes=<flatnodefile> --style=<stylefile> --tag-transform-script=<luatransformfile> <changes.osc.gz-file>
One difference:
On the new setup I compiled osm2pgsql 1.8.1 with LuaJIT enabled - that actually results in an older Lua version used (5.1.4 through LuaJIT 2.1.0-beta3, which seems pretty old...) instead of Lua 5.2.4 without LuaJIT in osm2pgsql 1.2.0.
But the README states that LuaJIT should fasten the import - maybe it is fastening the import but not updates? Or that info is outdated as LuaJIT seems to be stale?
I'm also seeing not many resources being used by osm2pgsql while node processing, cpu, i/o and ram are practically not used at all.
Currently the update on the new osm2pgsql 1.8.1 runs with ~ 0.7k/s node processing, while the old setup runs with about 5k/s or more...
Postgresql tuning settings are about the same.
Beta Was this translation helpful? Give feedback.
All reactions