Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] PMTiles output format #98

Closed
wipfli opened this issue Feb 27, 2022 · 31 comments
Closed

[FEATURE] PMTiles output format #98

wipfli opened this issue Feb 27, 2022 · 31 comments
Assignees

Comments

@wipfli
Copy link
Contributor

wipfli commented Feb 27, 2022

Is your feature request related to a problem? Please describe.

Currently, Planetiler outputs mbtiles which requires a tileserver, i.e., static file hosting like on github pages is not enough to serve tiles.

Describe the solution you'd like

PMTiles can be used to serve vector tiles with range requests and they don't require a tile server. Planetiler could implement a pmtiles writer.

Describe alternatives you've considered

@bdon created a mbtiles to pmtiles converter:

pip install pmtiles
pmtiles-convert TILES.mbtiles TILES.pmtiles

Additional context
the powerlines example uses pmtiles https://github.com/wipfli/powerlines-switzerland

@msbarry
Copy link
Contributor

msbarry commented Feb 27, 2022

I think this makes sense, probably migrate the --mbtiles=output.mbtiles option to --output=output.mbtiles or --output=output.pmtiles and switch the writer implementation based on file type. It sounds like the format is pretty straightforward, but Brandon pointed out that it would be beneficial to change the tiles order to collocate nearby tiles. He recommended hilbert curve order, but tile pyramid order might satisfy a similar goal and be a bit easier to implement?

@bdon
Copy link
Contributor

bdon commented Feb 28, 2022

Would it make sense to structure this as a separate Java library? If so, can that live alongside the python/js implementations at https://github.com/protomaps/PMTiles or should it live in its own Git repository?

@msbarry
Copy link
Contributor

msbarry commented Feb 28, 2022

@bdon a separate library would be nice, then the wrapper in planetiler would be pretty minimal. At the simplest it would need an API like:

try (var pmtiles = new PMTiles(pathOrOutputStream)) {
  for (var tile : tiles) {
    pmtiles.writeTile(tile.x, tile.y, tile.z, tile.data);
  }
} // close() flushes the index leaves, or could have an explicit finalize() call like the go library

For performance optimizations it might make sense to expose the hashing function and an API to write a tile with a known hash as well so the writer could avoid hashing the same bytes over and over again, but we could just start with something simple and add that after profiling if necessary.

@msbarry
Copy link
Contributor

msbarry commented Feb 28, 2022

Also @bdon could you elaborate on the tile ordering optimization? Is the main reason to put nearby tiles into the same index leaves? Planetiler packs tile x/y/z coordinates into a 32 bit integer that defines the order in which tiles are emitted, so I'd have to express a different ordering strategy as a different mapping from x/y/z to int.

@bdon
Copy link
Contributor

bdon commented Feb 28, 2022

The tile ordering refers to the order of the tiles in the archive; as of spec v2 their entries in the index is strictly defined (ascending z/x/y). If tiles are in Hilbert order in the archive, they are guaranteed to be nearby in the file if they're nearby in 2D - this locality makes a big latency difference if you're serving from disk and the OS is paging, but usually doesn't make a difference for cloud storage (depends on how it's implemented).

@msbarry
Copy link
Contributor

msbarry commented Apr 8, 2022

I also found that writing pmtiles output is substantially faster (3 minutes for the planet), so I'm going to include a first pass of this in my "reducing single-threaded bottlenecks" workstream. I'll write a pmtiles class with the goal of eventually extracting it to https://github.com/protomaps/PMTiles.

@msbarry
Copy link
Contributor

msbarry commented Jun 10, 2022

Talking with @bdon, the pmtiles format is going through a couple of iterations between now and August so let's wait to add native pmtiles output until after that solidifies more. @bdon feel free to ping this issue when you think the spec is in a stable state to build against.

@wipfli
Copy link
Contributor Author

wipfli commented Nov 22, 2022

@bdon would it be possible to use the go-pmtiles implementation here in Java in Planetiler?

@wipfli
Copy link
Contributor Author

wipfli commented Nov 22, 2022

I run Planetiler on the Shortbread configurable schema on the full planet. It created something like a 68 GB output file. Then I converted the .mbtiles to .pmtiles (25 minutes) and now I am uploading the file to R2 (roughly 30 minutes).

What would be amazing is if Planetiler could directly produce PMTiles and stream them to a S3-compatible storage provider...

@bdon
Copy link
Contributor

bdon commented Nov 22, 2022

@bdon would it be possible to use the go-pmtiles implementation here in Java in Planetiler?

The plan is to output the PMTiles v3 format directly in the java code. It's on my plate, but need to finish up the Tippecanoe output first :)

bdon added a commit to bdon/planetiler that referenced this issue Jan 13, 2023
* add package com.onthegomap.planetiler.writer
* create new interface TileArchive
* rename MbtilesMetadata to TileArchiveMetdata, make MbtilesWriter into general TileArchiveWriter
bdon added a commit to bdon/planetiler that referenced this issue Jan 13, 2023
* add package com.onthegomap.planetiler.writer
* create new interface TileArchive
* rename MbtilesMetadata to TileArchiveMetdata, make MbtilesWriter into general TileArchiveWriter
bdon added a commit to bdon/planetiler that referenced this issue Jan 16, 2023
* add package com.onthegomap.planetiler.writer
* create new interface TileArchive
* rename MbtilesMetadata to TileArchiveMetdata, make MbtilesWriter into general TileArchiveWriter
bdon added a commit to bdon/planetiler that referenced this issue Jan 17, 2023
* add package com.onthegomap.planetiler.writer
* create new interface TileArchive
* rename MbtilesMetadata to TileArchiveMetdata, make MbtilesWriter into general TileArchiveWriter
bdon added a commit to bdon/planetiler that referenced this issue Jan 30, 2023
bdon added a commit to bdon/planetiler that referenced this issue Feb 1, 2023
bdon added a commit to bdon/planetiler that referenced this issue Feb 5, 2023
bdon added a commit to bdon/planetiler that referenced this issue Feb 5, 2023
@hallahan
Copy link

This is cool to see. What is the status? Almost done?

@bdon
Copy link
Contributor

bdon commented Feb 28, 2023

Getting closer. #502 wraps up most of the internal bits but we will need to also expose this output format to configurations/command line for the next point release.

@wipfli
Copy link
Contributor Author

wipfli commented Mar 8, 2023

I am really looking forward to #502.

Is it possible with rclone or something to upload the pmtiles file to an S3-compatible storage while planetiler is still writing?

Just to share my numbers: My custom planet mbtiles file is 45 GB and it takes 15 minutes to write for planetiler. Then it takes 18 minutes to convert the file from mbtiles to pmtiles, and then it takes 13 minutes to upload the pmtiles file to cloudflare.

@msbarry
Copy link
Contributor

msbarry commented Mar 8, 2023

The pmtiles writer is going to write the whole file sequentially, then when it finishes it will go back to the beginning and write the header and root directory. I'm not sure if that pattern would work integrating with a third-party upload tool?

Theoretically I think planetiler could do the upload directly using the S3 multi-part upload API - it would just write the first part last once it knows what the first header/root directory will look like.

At the very least, #502 should combine your first 2 steps into one step that takes less than 15 minutes.

@bdon
Copy link
Contributor

bdon commented Mar 9, 2023

It should be simple to run planetiler and rclone in sequence to perform the upload.

My thought for the next v4 spec of pmtiles (backwards compatible, don't worry) is to allow for the header and root directories to be at the end of the archive. This would make the entire format streamable, meaning planetiler could write to storage as it's assembling the tiles, saving time and local disk space.

To make this work however, we need to validate that every storage platform pmtiles v3 runs on supports end-addressing HTTP range requests correctly.

However, what @msbarry said about multipart uploads out-of-order would be even better and not require a spec revision. I'm not sure if that multipart behavior is consistent across storage platforms though.

@msbarry
Copy link
Contributor

msbarry commented Mar 13, 2023

Resolved by #502

@msbarry msbarry closed this as completed Mar 13, 2023
@wipfli
Copy link
Contributor Author

wipfli commented Mar 13, 2023

Amazing, I need to try it out. Thanks @bdon for writing it and thanks @msbarry for the review!

@laurentdiazfr
Copy link

Thanks a lot for this. Can we use the command line to generate PMTILES ?

@msbarry
Copy link
Contributor

msbarry commented Mar 15, 2023

Almost... I'm working on a change now so you can say --output=result.pmtiles to use the new functionality. Should be ready in a day or two.

@wipfli
Copy link
Contributor Author

wipfli commented Mar 21, 2023

I did a comparison between the direct pmtiles writer in planetiler and the mbtiles writer + conversion afterwards to pmtiles. I did a planet run with my custom map tileset https://github.com/wipfli/swiss-map. Here is the result:

  • New pmtiles writer total duration: 9146 seconds
  • Previous mbtiles writer followed by conversion to pmtiles: 9171 seconds

This is on a 12 core, 128 GB machine. The logs are available here: https://gist.github.com/wipfli/17bb8ad8d123f7d93313417dc7d4fac5

It is surprising that the new pmtiles writer does not outperform the old way significantly. Did I somehow mess up some settings?

@msbarry
Copy link
Contributor

msbarry commented Mar 21, 2023

Archive writing is the only part that gets faster with pmtiles.

Here's what I see for pmtiles:

2:32:17 INF - 	archive   1h14m23s cpu:13h38m25s gc:3m53s avg:11
2:32:17 INF - 	  read    1x(8% 5m43s sys:55s wait:1h4m35s done:9s)
2:32:17 INF - 	  encode 11x(94% 1h9m36s sys:5s wait:10s done:9s)
2:32:17 INF - 	  write   1x(4% 2m50s sys:1m25s wait:1h10m42s) <<<<<<<<<<<---------- pmtiles

and for mbtiles:

2:32:41 INF - 	archive   1h15m8s cpu:13h47m35s gc:4m37s avg:11
2:32:41 INF - 	  read    1x(8% 6m7s sys:1m5s wait:1h4m1s)
2:32:41 INF - 	  encode 11x(92% 1h9m25s sys:7s wait:36s)
2:32:41 INF - 	  write   1x(11% 8m sys:1m17s wait:1h3m14s) <<<<<<<<<<<<--------------- mbtiles

the archive time is dominated by encode since you only have 12 cores - if you run on a machine with 64-100+ cores then encode starts to take less time and write dominates.

@wipfli
Copy link
Contributor Author

wipfli commented Mar 21, 2023

Nice thanks!

@wipfli
Copy link
Contributor Author

wipfli commented Mar 22, 2023

I had a bug in my script: I created swissmap.mbtiles with planetiler but then used pmtiles convert output.mbtiles output.pmtiles, and it turns out that output.mbtiles was a 155 MB file while swissmap.mbtiles is 35 GB...

I then checked how long pmtiles convert swissmap.mbtiles swissmap.pmtiles takes and it turns out that this is 22 minutes.

So the result in my custom planet tile set is:

  • old method (output mbtiles and then convert): 9171 s + 22 min = 10491 s
  • new method (direct pmtiles output): 9146 s

So the new method is almost 13 percent faster!

@sing78
Copy link

sing78 commented Mar 30, 2023

Hi,one question:
I am running this command to generate pmtiles:

sudo java -Xmx1g -jar planetiler.jar --download --area=monaco --output=monaco.pmtiles

However,no monaco.pmtiles file is being created inside data folder,only an output.mbtiles file.

What am I missing?

Thanks

@wipfli
Copy link
Contributor Author

wipfli commented Mar 30, 2023

It creates output.pmtiles in the current folder, no?

@msbarry
Copy link
Contributor

msbarry commented Mar 30, 2023

Oh you're probably using the latest release jar and I haven't made a release for a while. I should do a release and get that up to date!

@etiennejourdier
Copy link

Hi,
It seems to me that the --output option doesn't work when using docker. I always get only the output.mbtiles file. No problem, I finally used java, but I'd rather report it here.

@msbarry
Copy link
Contributor

msbarry commented May 27, 2024

Hi,

It seems to me that the --output option doesn't work when using docker. I always get only the output.mbtiles file. No problem, I finally used java, but I'd rather report it here.

@etiennejourdier what docker command are you running exactly?

@etiennejourdier
Copy link

I used the one in the readme for smaller extracts plus the output option

docker run -e JAVA_TOOL_OPTIONS="-Xmx1g" -v "$(pwd)/data":/data ghcr.io/onthegomap/planetiler:latest --download --area=monaco \
  --water-polygons-url=https://github.com/onthegomap/planetiler/raw/main/planetiler-core/src/test/resources/water-polygons-split-3857.zip \
  --natural-earth-url=https://github.com/onthegomap/planetiler/raw/main/planetiler-core/src/test/resources/natural_earth_vector.sqlite.zip --output=monaco.pmtiles

@msbarry
Copy link
Contributor

msbarry commented May 27, 2024

I think you need --output=/data/output.pmtiles to get it into the mapped volume

@etiennejourdier
Copy link

Of course it seems obvious now that I'm reading it! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants