Provide random-access methods to PBF reader #367

BenWiederhake · 2023-11-19T03:53:41Z

Motivation: Why random-access?

Osmium is highly optimized for sequential access. That's awesome! Even for files with hundreds of megabytes it is usually good enough to simply execute multiple linear scans to collect all the necessary data. That seems to be pretty much what everyone seems to do, from RelationManager to osmium getid. However, beyond a certain input size and around a certain medium number of queries, linear scan becomes impractically slow.

The current solution seems to be to generate (= expensive in time) and keep around huge index files (= expensive in space). This is a great approach when there is a huge amount of queries, and the PBF file changes comparatively rarely. However, this is not a good option when there are only a few, complex queries, e.g. when walking the OSM object graph in a weird way.

Hence: Random access! It answers a single query much faster than a linear scan (but slower than a fully-indexed file), and is much cheaper in time and space than handling a fully-indexed file (but slightly more up-front effort and less efficient than a linear scan).

Or in short: Random access is pareto-optimal, and some number of cases a better choice than file-index or linear-scan. Please enjoy this terrible drawing of a pareto graph:

How does this work?

Random access exploits two separate properties:

Due to the design of protobuf and PBF, it is possible to do a very quick first pass that skips over all blocks without decoding them, thus only building a tiny index only of which PBF blocks exist, and where they start. This makes it easily possible to decode blocks on demand, instead of decoding everything always. This does not require any changes to the file format, not even any optional features!
Most public PBF files, especially the geographic exports distributed by geofabrik, contain/support Sort.Type_then_ID, which is an "optional feature". This way, one can look at a decoded block, and immediately know whether a desired OSMObject should be expected in an earlier block, or a later block.

Together, this enables in-memory binary search, O(log n). I'm sure I don't need to tell you, but for large n this is faster than O(n), and slower than O(1). With 72 GiB data, this certainly makes a difference!

In theory, there exist even more sophisticated approaches, like:

hardcoding certain assumptions about the distribution (e.g. "there are X times more nodes than relations"),
inventing a new "optional feature" for the file format, where the first and last OSMObject type and ID are copied in the HeaderBlock, and ideally also the exact byte offsets where each HeaderBlock can be found. This would completely eliminate the initial setup scan. It would also be perfectly backwards-compatible, because of how protobuf works.

… but I don't want to go there, at least not now. Osmium can already be sped up a lot, right now, with the existing and already-popular file features.

Is it really faster?

I haven't implemented everything I want to implement (and need for a different project), but the rough first data is very clear:

Reading planet.osm.pbf (72 GiB in the version I have) and building the block index takes 4 seconds from a cold cache, and roughly 68ms on the second run. (Remember, this only needs to read the header of each block, so it reads only about 0.2% of the file!)
MaxRSS is only 6 MiB. And I haven't even optimized anything yet! Theoretical memory consumption for planet is 989 KiB, as it contains only 42175 blocks, and the pbf_block_start struct is currently 24 bytes.
cachestats reports pages in cache: 43118/18629897 (0.2%), so that makes sense.
Running osmium getid planet-231002.osm.pbf n5301526002 -f opl takes roughly 2 minutes on first and second run, and has a MaxRSS of about 800 MiB.
Just to rub it in: That means random access is 1700 times faster, and uses 133 times less memory.

EDIT: I got the numbers for the "theoretical memory consumption" wrong: It's 24 bytes per entry, not 16 bytes. So the total is 989 KiB, not 659 KiB, for the entire planet.

This comparison isn't entirely fair, because the binary search isn't hooked up yet, and thus technically doesn't do the same job as getid. However, keep in mind that binary search will only need to read 16 blocks, and not 42175. Even if it ends up "only" 100 times faster, that would be a huge win.

Who?

I do not claim to be the first person to have this idea. See for example this (abandoned?) repository: https://github.com/peermaps/random-access-osm-pbf

I couldn't find any other implementation though; this seems to be a rare thing to be done. I believe that many people might benefit if libosmium had this feature; especially users of getid.

What's next?

This is my first contribution to this repository, and it goes against one central philosophy that libosmium has: sequential access. I tried my best to follow the desired style, but I'm sure there is room for improvement. Let's talk about it!

That is why I created this PR in draft mode.

Plus, there are many things I want to change before making this "public" as in "publish to stable":

There's a lot of code duplication, because some code is implemented as private methods, and I wanted to get a working prototype first, to test the idea.
Some code/comments still talk about "items", when it really should say "objects". My bad! I think now I understand the difference.
I want to implement a "quick and dirty" wrapper that doesn't cache anything, and make a simple osmium-getid clone from that. Just to show off the speed difference properly.

And in the longer future:

Maybe a "caching" wrapper that keeps around the last X decoded blocks, in case any new query happens to hit exactly the same block? However, that leads to endless discussions about which caching strategy is the best.
And finally, the holy grail: Trying to do this in some form of asynchronous, perhaps even multi-threaded form, and writing something that is always faster than osmium-getid for PBFs. But that's for later.

joto · 2023-11-19T10:09:04Z

Interesting work. I am not sure how suitable this is for Osmium, but it is certainly something that we can discuss. The main question for me is: What use case does this support and are they worth the extra effort? The getid use case is a rather niche use case. Anybody who does more than a few queries like this will use some kind of database, like OSMExpress. There are several implementations of specialised databases like this.

What I do like about this approach is that it still works with the PBF file and doesn't need an extra index on disk or so. That puts it into the realm of what could fit into libosmium. But that also limits is usefulness somewhat, because you have to build that internal index every time, which needs time, but from your experiments it seems that isn't a big issue. (I am wondering about that a bit, because PBF blocks are usually gzipped and need to be decoded to look at the contents which does take time, that isn't something that I'd want to do twice if I can avoid it.) Osmium already supports reading only the PBF blocks needed based on the object type (node, way, or relation) and that is used often. It doesn't remember where the blocks for the types are in the file though, because doing that didn't fit into the current IO architecture.

Regarding your ideas on possible changes to the PBF file format: In theory this is backwards compatible due to the nature of the protobuf-based file format. In practice though it is not that easy. There is at least one popular tool written in C that can only understand OSM PBF files as they are now and doesn't cope well with changes. Yes, technically it is not doing the right thing, but practically nobody wants to generate PBF files that are incompatible with that tool. I have run into that problem before. (We could still add something and make it optional for use, though.)

BenWiederhake · 2023-11-19T23:26:20Z

The use-cases are simple:

Everything that uses osmium and doesn't need a full linear scan
Everything that uses osmium and needs more than one linear scan, traditionally done using three full linear scans.
That also includes pyosmium, and any other wrappers

My particular use case that lead me to write this, is: "Scan all the objects, check some weird properties (that cannot be done through overpass), and for a handful of objects try to determine any location related to this object." So that's a linear scan, plus a few random lookups. Doing a linear scan for the random lookups is very expensive, and doesn't always answer the question (i.e. relations). I tried doing it with multiple linear passes, or abusing the osmium getid -r flag, but it's just so slow, much slower than reasonable.

Converting the pbf into a different database format seems silly: It adds an hour and consumes a lot of space. A linear scan plus random accesses should be finished before the database format conversion is done. Why do extra work when the data is already right there?

It seems that your biggest worry is "the extra effort": That's why I focused so much on the 68ms to index the entire planet. Building the internal index "every time" is only true for program startup – everything in osmium is slower than 68ms, and those can probably be optimized down. (In fact, the 68ms are a lie, because I ran the entire "io_test_pbf_randomaccess" test suite, so the 68ms also include the other tests in the file.)

because PBF blocks are usually gzipped and need to be decoded to look at the contents

You're right, that operation would take a lot of time, so the code simply doesn't do that when building the index. In fact, simply reading the compressed data also takes a significant amount of time, that's why the code doesn't even read the data at all when building the index! And also, that's why I added a seek function to util/file.hpp. The index only needs to know where all the blocks are, and only reads and decompresses blocks on demand.

(If one were to end up reading every single block of the file, this would indeed be slower than a linear scan. But that's not what random access is for.)

that isn't something that I'd want to do twice if I can avoid it

I agree! That's why I mentioned a "caching" wrapper that keeps around the last X decoded blocks.

possible changes to the PBF file format […] not that easy

Ah, that is unfortunate. But that would have been a separate topic anyway, random access doesn't need any new or unusual features. (It does need some form of ordering, which already is the case.)

joto · 2023-11-20T08:11:06Z

that's why the code doesn't even read the data at all when building the index!

I don't understand. How can you build an index without knowing the contents of the data? The index does a lookup from ID to the block the object with that ID is in, does it not? But for that I have to know what ID range is in that index, don't I?

BenWiederhake · 2023-11-20T14:11:39Z

The index contains the block starts, i.e. their byte offset in the file, not necessarily their logical contents. (The first item in a block will be cached if the block is ever loaded, but it is not known initially.)

Let's take a look at each entry (simplified for demonstration):

struct pbf_block_start {
    // Where is the block?
    size_t file_offset;
    uint32_t datasize;
    // If we have ever decompressed and parsed the block previously, what does it contain?
    osmium::item_type first_item_type_or_zero;
    osmium::object_id_type first_item_id_or_zero;
};
class PbfBlockIndexTable {
    std::vector<pbf_block_start> m_block_starts; // the index
    // ...

Maybe an example helps: Let's say we just opened a file, and it contains 10000 blocks. And now someone looks for node 123456. Because no pbf_block_start contains any hints, the middle block (number 4999) is loaded. Let's say we discover that it contains only objects that must be sorted strictly after node 123456. This halves the search space! (I.e. if node 123456 exists in the file, it has to be in any of the 5000 earlier blocks.) Next the algorithm would look at block number 2498, the middle of the remaining space. And so on, until a block is found that contains node 123456, or at least should contain node 123456 (if it doesn't, then we know with certainty that the file does not contain that node).

This may seem at first like a lot of effort, but note that:

the block start index (e.g. 1 MiB for planet) is in fact smaller than a single block (e.g. 3 MiB for planet), and iterating over it should be less effort than reading, decompressing, and parsing an entire block.
It takes only logarithmically many queries until the block is found. For 42175 blocks (= planet), this means around 15-16 decompressed blocks. In contrast to many thousand decompressed blocks in the linear setting.
Because always the middle "unloaded" block is chosen, we can expect that after a few random accesses, the index is populated well enough that far fewer blocks have to be decompressed.

Does this answer your questions?

joto · 2023-11-21T14:12:43Z

Okay, I understand you are creating the index lazyly. That make sense.

I small thing: You mentioned the planet having only 42175 blocks. That might be true for the planet you can download from OSM, which is not generated with libosmium. Libosmium will not store more than 8000 objects for blocks (another thing that could be improved in libosmium but is not easily done), so a planet generated by libosmium, say when you updated a downloaded planet will have something like 1.2 million blocks. That's nearly 30 times your number. But it doesn't matter that much because it is still only 30 MB for the index.

I want to come back to the use cases and the "extra effort". I think with misunderstood each other there. I am talking about programming and maintaining effort. Clearly you think the effort is worth it, you wouldn't do this otherwise. But I am thinking about which actual use cases this would help with and what the burden is on me to maintain this and on the users to understand the different interfaces that would be available and all that. There are some downsides to your approach: it only works with (sorted) PBF files, it isn't multithreaded as far as I understand it, it has a new interface, which means the developer has to decide which interface to use for which use case. I want to better understand which actual typical use cases this makes better. And querying a single id or a few ids from a file isn't a use case that comes up much. A far more typical use case is for instance: Find all ways tagged with xyz, find all the member nodes, and output all that to a file. (Basically what osmium tags-filter does.) Clearly it depends on the tag(s) you are looking for how much faster (or slower) your approach is. The fewer resulting objects there are in the end, the better your approach will probably be. But it would be interesting to see how, say, a filter for all buildings will turn out, because that's a sizable part of the whole planet file. Those numbers are important, because if your approach is always faster, great, we can always use it (as long as the input file is sorted PBF). We still need some extra code in tools to handle the non-PBF vs. PBF case, but that would be worth it. But if it is not always faster, we have to decide which approach to use in which case and all that.

Having said all of that I do think your approach has merit, even if it turns out that it isn't for everybody and every use case. So let's think about next practical steps here. First is: As you mention, we have to get rid of the code duplication. I think it should be possible to factor out low-level code from the current PBF reader into free functions that can the be used by the (updated) current PBF reader and by your code. Whatever we do we have to do something like this anyway, so we can bring this in in pieces small enough that I can review them. Then your other changes become much smaller. And you have already something you can work with in your code, albeit rather low-level. Then we have to think about the API for the new code, because once that's in a published libosmium version, it is hard to change, taking into account possible later additions like caching and so on.

…nternal Turns out, auto_grow::yes causes significant runtime overhead due to all the memmoves.

BenWiederhake · 2023-11-26T23:19:26Z

will have something like 1.2 million blocks

I also expect 1.2 million blocks, and doing osmium cat planet.osm.pbf -o planet-rewritten.osm.pbf the actual number seems to be 1205191.

Indexing planet.osm.pbf takes around 38ms according to hyperfine (with a bit more careful setup than before)
Indexing planet-rewritten.osm.pbf takes around 830ms according to hyperfine
Indexing planet.osm.pbf and looking up two objects takes around 0.86 seconds according to hyperfine
Indexing planet-rewritten.osm.pbf and looking up two objects takes around 1.03 seconds according to hyperfine
For comparison, a linear scan (e.g. osmium getid planet-rewrite.osm.pbf n123 -f opl) takes 1m55s, or 115 seconds.

Of course, loading the pages into memory would make this slower; but since we only read the BlockHeaders, they fit in the filesystem cache.

what the burden is on me to maintain this

That's a good point, and I'm aware what a pain "legacy stuff" can be.

I'll try to address some random points that first come to mind:

breaking the file format: It doesn't seem like PBF is going away any time soon. I'm less certain about the "Sort.Type_then_ID" feature, but if it vanishes then a lot of other things break, too. So this code doesn't add any new brittleness in this regard.
breaking the compilation: The implementation doesn't need any special language features, I don't see that as a problem.
breaking on kernel/libc changes: Although "lseek" is "new" to osmium, it is a very stable concept, and lots of other relies on it, e.g. to write file sizes or hashes/checksums after writing a file. So it is wildly unlikely that this will break.
becoming obsolete due to inefficiency: The code is already a lot faster than linear scan for its usecases, and linear scan is already optimized and parallelized. So if anything, the random-access implementation will probably become even faster in comparison to linear scan, not slower.
becoming obsolete due to linear storage devices: Random-access storage (e.g. NVMe, flash, RAM, etc.) are becoming more popular, not less, so I don't expect this to become an issue.
becoming obsolete due to lack of need: Anything that might cause this can be answered with "osmium as a whole would be doomed" or "just delete that class".

So I'm not really worried about it becoming a pain to maintain, even if I suddenly vanish.

Are there any particular pain points that you would like me to address?

it only works with (sorted) PBF files

Correct. It seems to me that expecting files to be sorted is not unusual; it is already assumed by default:

// namespace osmium::relations
template <typename TManager, bool TNodes, bool TWays, bool TRelations, bool TCheckOrder = true>
class RelationsManager : public RelationsManagerBase {

it isn't multithreaded

Correct, the prototype isn't multithreaded at all, simply because I first want to get a single-threaded version working. I'm currently looking at timing profiles more closely, and it seems like indeed a lot of time is spent in zlib and simply decoding buffers. Not a big surprise, but I just wanted to see it with my own eyes.

I think this approach can be multi-threaded, although it won't scale perfectly. Here's some free-form brainstorming on the topic of multi-threading, I haven't thought about this too deeply yet:

First, note that the prototype is still doing a lot of unnecessary work: We don't need to decompress and decode the entire block, just enough to determine the first OSMObject in it! Due to the format, the string table comes first in the data, and can't be easily skipped. It sounds like it's worth a shot to attempt, but not in the first prototype.
For a multi-threading example, if there are 16 cores and no blocks have ever been decoded before, then the algorithm could simple decode 16 blocks, picked uniformly throughout the file, e.g. if there are 170 blocks in total then pick every 10th block (such that we split the file into 17 nearly equal-sized intervals). This would do 4 binary search steps in the time of (roughly) 1 block decoding. So that's a speed-up of factor 4.
In general, I believe that it can be sped up by factor log2(number_of_cores) this way.
But there's also the effect that as soon as a significant portion of the index is populated (!= cached), the binary search can be done entirely on the index, so no blocks need to be decoded anymore, or only very few. So it's hard to nail down the exact performance benefit of multi-threading.
If one were to get very fancy, how about pre-fetching: The PbfBlockIndexTable constructor could kick off a background thread that pre-emptively starts decoding a handful of key blocks, just to get a head-start. That would reduce the initial binary search by one or two steps. Keep in mind, the binary search will usually be around 10 to 15 steps anyway; even for 1.2 million blocks, it will be 20.2 steps on average.

I hope this convinces you that this is not a big disadvantage.

Let's talk about the finer points of multi-threading later, I want to do it single-threaded first, as random-access already gives a huge boost in speed for some usecases.

it has a new interface, which means the developer has to decide which interface to use for which use case

Well, yes, but the developer needs to make some decisions anyway. How about this: "Use the normal, linear-scan based Handler classes first. If this is not fast enough, and you know you are working with ordered PBFs, then consider using random-access."

Basically what osmium tags-filter does

You're right, a simple linear-scan is still the best way to do a job of the type "just go through all objects and check some custom predicate". I don't think a filter for all buildings will benefit at all from random access, unless you already have a list of all the IDs at the start.

Maybe it helps if I expand on my example "Scan all the objects, check some weird properties (that cannot be done through overpass), and then for a handful of objects try to determine any location related to this object."

The user currently has the options:

Convert to a different database (see previous discussion)
Prepare the data with osmium add-locations-to-ways: Same issue, it takes a lot of time to convert the file, and it doesn't really solve the problem: Relations are tricky, and not necessarily all members can eventually resolve to a location (e.g. a way partially outside an extract), so the user's program needs to backtrack sometimes.
Do a linear scan to find the "handful of objects", and determine the node IDs / member IDs, and resolve those in a second pass. This requires keeping a potentially-large table in memory, and due to relations it still requires multiple passes. This is what I implemented before writing this PR, and in some cases it took five passes to resolve every object – and each pass is two minutes! Also, it's tedious implementation work to keep track of everything.
Or, with random-access, the user's program could just do a single pass, and whenever it encounters an "interesting" object, it can simply directly read from the file just the information it needs, i.e. one of the referenced nodes/ways/relations, and be done. (Unless that object is missing, in which case it just picks the next, and so on.)

I hope I could finally show why the "just do multiple passes" approach seems so tedious and wasteful to me in those cases.

Of course, beyond a certain percentage of objects (>5% I would guess), doing multiple passes is probably faster.

if your approach is always faster, great, we can always use it

No, random-access is not "always faster". Is that perhaps the misunderstanding here? I do not claim that random-access is strictly better in all cases, just that it is faster (by large factors) for some reasonable usecases.

we have to get rid of the code duplication

Splitting this into multiple, easy-to-review PRs sounds great! That's one of the reasons I created this PR in draft mode.

I think I'll split it like this:

Implement and test file seek, extract common "read_exactly" for reuse #368 Introduce seeking to util/file.hpp
Buffers: Prevent bug-prone iteration, speed up accidentally-quadratic iteration #369 Introduce a non-accidentally-quadratic way to iterate nested Buffers (planet generated around 500 buffers per block, so this is a real problem – even the "rewritten" planet generated about 60 buffers per block!), maybe as Buffer::extract_nested_buffers()
- I'm not entirely sure why get_last_nested() even exists. Usages like these look suspicious, and I have a feeling that this might be an issue: https://github.com/osmcode/libosmium/blob/master/include/osmium/io/detail/input_format.hpp#L205-L210
The type and ID comparisons suck, as they aren't really generalizable properly, and while searching we don't have two OSMObjects available, and there already are several slightly different implementations. What do you think, should this remain a io::detail, or does it go into object_comparisons.hpp?
And then later the random access feature itself. ~~Perhaps it should go into osmium::experimental, right alongside the FlexReader? This should reduce your maintainence burden and also clearly mark it as a specialized feature.~~

The above commits also introduce a mechanism to pass auto-grow behavior through to the reader. Turns out, this isn't a good idea (it causes lots of memmoves, which consume several percent runtime), so it won't end up in any PR.

joto · 2023-12-01T17:05:44Z

We have too many open "threads" here and it gets really confusing. I'll try to address some of the issues...

The osmium::experimental namespace was a mistake from way back. It should never have ended up in there. So no, that's not an option. Either we add it (whatever it is) properly because I am convinced that it is the right approach, or it will not be added at all. We can do all sorts of experimenting in a branch, or develop the code outside libosmium first.
The nested buffers are needed if I don't want to (or can't) resize a buffer but I have more data that needs to go somewhere. This is ugly, but I don't have a better solution. If you can find one, great.
Not sure what you mean with The type and ID comparisons suck... and how this relates to this here?
The threading issue is only interesting in so far as it might affect the API. I am totally okay with doing something not threaded first and add threading later. But if it turns out later that the API is not enough to support some kind of threading, then we have a problem.
Before we go down more and more rabbit holes, I think you should try implement the tags-filter use case or something like it to see how this all performs in practice.

Well, yes, but the developer needs to make some decisions anyway. How about this: "Use the normal, linear-scan based Handler classes first. If this is not fast enough, and you know you are working with ordered PBFs, then consider using random-access."

Problem is that when you write generic code like in osmium-tool, you don't know all these things beforehand. You have to either let the user decide through some command line option and/or write code to handle each case, adding some magic to figure out which approach is probably going to be faster and all that. The need to do all that limits the usefulness of having several approaches in practice.

BenWiederhake · 2023-12-01T22:00:39Z

You're right, just scrolling over my previous comment took way too long. I'll try to be more succinct.

The nested buffers are perfectly fine; I'm rather worried about how they are accessed and iterated:
- Accidentally quadratic: Imagine a buffer that is nested to a depth of 100 (reading planet causes even higher depths, so this is real issue). The only way to access nested buffers is via get_last_nested(). So the first call does 100 derefs to get to the last (oldest) buffer, the second call does 99 derefs, the third call does 98 deref, … In total this ends up dereferencing quadratically many pointers, or d(d-1)/2 to be precise, where d is the initial depth. This type of issue is called "accidentally quadratic". I have no data on how much this actually impacts libosmium, but it sounds like a better interface (e.g. returning all nested buffers as a std::vector or something) can speed things up. This is what I intend to address in the next PR.
- It's a list, but many places just pop one element and then seem to assume that was all the nesting there is.
"The type and ID comparisons suck": Sorry, I should have phrased that differently. I meant that it's unfortunate that there has to be yet another function that implements "type then ID" sorting. This new code will have to live in a new PR, which is why I included it in the list.
threading: Designing a stable API that supports multi-threading is going to be a challenge, but there is nothing that fundamentally prevents it. The CPU-intensive part is decompressing and decoding individual blocks, and that part can be trivially parallelized.
Fewer rabbit holes, more practical examples:
- There seems to be a misunderstanding: tags-filter should remain a simple linear-scan. random-access is the wrong tool for that.
- getid is a better example. In fact, examples/osmium_read_pbf_randomaccess planet.osm.pbf is roughly equivalent to osmium getid planet-231002.osm.pbf w40106791 r123456 -f opl, but runs 100 times faster (1.03 seconds instead of 115 seconds). That was my main point in the last post, sorry, I should have made that clearer. However, let's postpone the discussion of changing osmium-tool.
- I have also tested locally whether the "backref resolution" usecase that led me down this rabbit hole really does see a speed-up: Yes, it does! From 3m47s to 2m50s in the "ways only" scenario. For the general scenario, I expect it to be an even bigger difference, since "linear scan" has to do multiple passes (at least 3, sometimes 4) to fully resolve a relation. Once I've finished implementing it, I'll share the code, a better explanation, and exact numbers. For now, suffice to say: Yes, I am certain that I'm not just chasing a hypothetical idea :)

joto · 2023-12-02T16:20:27Z

Fewer rabbit holes, more practical examples: ...

As I said, the getid use case is not a good use case, because it isn't remotely something that happens often enough in the real world from my experience to make it worthwhile to optimize that case. The tags-filter, on the other hand, is something that users typically need. And the use case you started with is somewhat unclear to me. So you have yet to convince me that there is a real-world case here.

BenWiederhake · 2023-12-03T13:55:16Z

Getting specific objects from a pbf is ridiculously slow for sorted pbfs, and can be easily sped up by factor 100-1000, as already demonstrated. This kind of lookup is something that I need. I wouldn't be here if I didn't need it.
osmium-tool, a project that you maintain, has a getid functionality. It wouldn't exist if users didn't need it. If that doesn't count as a "good" or "real" usecase, I don't know what else would, ever.
Glancing at the list of subcommands, it seems like query-locations-index and getid -r probably could also benefit from random-access, but I haven't looked into it, as these are not usecases are not my personal priority, at this time.
Here's my local random collection of programs I'm writing to test ideas and experiment. It's not in a good shape, because it wasn't originally meant to be shared, but maybe it shows a bit better what I mean: https://github.com/BenWiederhake/osm-play/blob/master/extract_some_relations_random_access_cached.cpp#L25
This program resolves a "random" small subset of all relations down to any location referenced by the relation. This kind of backward-dereferencing is not supported in current libosmium. The only way to do something like this is doing multiple linear scans, each of them take 80+ seconds. And doing 3 (or sometimes 4, because perhaps a relation consists only of other relations) eats up 240+ (320+) seconds. The program in the above link does it in 100 seconds. In case you want to argue against it with RelationManager: No, it only deals with the first layer of resolution (making members available, but not transitive members), and does a full pass for that (consuming unnecessary time and memory), and builds a partial copy of the database in memory, instead of just returning the requested objects.
The resolution of related objects might be improved for tags-filter, but that would depend on the expected quantity of referenced objects. If it's very low in comparison to the overall pbf, then I would expect random-access to be significantly faster for the second pass. However, implementing a robust estimation and switching between implementations sounds like a headache, and optimizing that subcommand is not my personal priority, at this time.

Let's just make fast random-accesses available to everyone, and then you can use it to your heart's content. After all, that's the spirit of free software: Using, understanding, improving, sharing. That's why I'm trying to get the improvements into libosmium.

I'm getting the impression that you feel very negatively about it all, can you help me understand why? Surely you see the benefit of speeding up some types of operations, even if it doesn't affect other types of operations?

…ent block

joto · 2023-12-03T16:49:35Z

I'm getting the impression that you feel very negatively about it all, can you help me understand why? Surely you see the benefit of speeding up some types of operations, even if it doesn't affect other types of operations?

As mentioned the problem is simply that this all means more work for me and I have to take over the maintainance burden. It costs me a lot of time to understand what you want and to review pull requests. If you could demonstrate that your code will help with things I need or that I think a lot of people will need, that would make me more interested. But I don't see that (yet). That being said I am happy to accommodate your use case if that can be done with limited effort on my part. So lets concentrate on the one thing that seems to get you the most for the least amount of effort on my part: Get the PBF low level code somewhere else so that your alternative I/O method can use it, get that into libosmium. And don't get sidetracked by a 1 or 2% performance improvement somewhere else.

BenWiederhake added 3 commits November 16, 2023 03:25

Implement and test bare-bones block indexing

ec0535c

Implement and test arbitrary block decoding and block start caching

204a042

Implement and test binary search

d463e03

BenWiederhake added 2 commits November 20, 2023 07:29

Test the pbf_block_start size

01a22c3

Use auto_grow::yes for randomaccess block decoding

1406b0c

BenWiederhake added 3 commits November 26, 2023 19:32

Factor out needle comparison for easier re-use

bc28212

Implement and test non-caching binary search

17dc80f

Implement Buffer::extract_nested_buffers, switch back to auto_grow::i…

a36b54c

…nternal Turns out, auto_grow::yes causes significant runtime overhead due to all the memmoves.

BenWiederhake mentioned this pull request Nov 27, 2023

Implement and test file seek, extract common "read_exactly" for reuse #368

Merged

BenWiederhake added 2 commits December 2, 2023 22:53

Dump partial progress

8c24ccb

Implement caching for random-access PBF reads

17ff58c

Properly cache blocks, iterate in right direction, always search curr…

b0724ef

…ent block

BenWiederhake closed this Dec 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide random-access methods to PBF reader #367

Provide random-access methods to PBF reader #367

BenWiederhake commented Nov 19, 2023 •

edited

Loading

joto commented Nov 19, 2023

BenWiederhake commented Nov 19, 2023

joto commented Nov 20, 2023

BenWiederhake commented Nov 20, 2023

joto commented Nov 21, 2023

BenWiederhake commented Nov 26, 2023 •

edited

Loading

joto commented Dec 1, 2023

BenWiederhake commented Dec 1, 2023

joto commented Dec 2, 2023

BenWiederhake commented Dec 3, 2023

joto commented Dec 3, 2023

Provide random-access methods to PBF reader #367

Provide random-access methods to PBF reader #367

Conversation

BenWiederhake commented Nov 19, 2023 • edited Loading

Motivation: Why random-access?

How does this work?

Is it really faster?

Who?

What's next?

joto commented Nov 19, 2023

BenWiederhake commented Nov 19, 2023

joto commented Nov 20, 2023

BenWiederhake commented Nov 20, 2023

joto commented Nov 21, 2023

BenWiederhake commented Nov 26, 2023 • edited Loading

joto commented Dec 1, 2023

BenWiederhake commented Dec 1, 2023

joto commented Dec 2, 2023

BenWiederhake commented Dec 3, 2023

joto commented Dec 3, 2023

BenWiederhake commented Nov 19, 2023 •

edited

Loading

BenWiederhake commented Nov 26, 2023 •

edited

Loading