Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More detailed Specification #107

Closed
DerZade opened this issue Dec 23, 2022 · 11 comments · Fixed by #172
Closed

More detailed Specification #107

DerZade opened this issue Dec 23, 2022 · 11 comments · Fixed by #172

Comments

@DerZade
Copy link
Contributor

DerZade commented Dec 23, 2022

Hey :)

First of all. Thank you for the great format. It is imho the perfect solution to a problem that will become more and more prevalent in the near future.

I tried to implement my own reader / writer in rust and found myself looking at the first-party implementations quite a lot, because I stumbled upon things that are not really clear in the specification.

I think the project would benefit a lot from having a WAY MORE detailed specification (at least more than something I could print on a single page) and I was wondering whether a contribution on my part would be welcomed.

I took the opportunity and wrote a really rough draft, of the section on the header. Just to give you an impression of what I would envision and get your feedback.

PS: It's also totally fine if you do not think the specification needs to be improved.

@bdon
Copy link
Member

bdon commented Dec 27, 2022

@DerZade yes, I'm open to more details In the spec doc. It's currently in a minimum-viable-state and deserves more clarity. I think that https://github.com/mapbox/mbtiles-spec/blob/master/1.3/spec.md is a good target level of verbosity, preferably less.

To answer your questions in the draft:

The current description of clustered is "boolean clustered flag: if true, blobs in the data section are ordered by Hilbert TileId. When writing with deduplication, this means that offsets are either contiguous with the previous offset+length, or refer to a lesser offset."

That's a bit confusing, so another way to describe it would be that clustered is true if and only if this pseudocode is true:

var last_offset, last_length = 0,0
for entry in entries: # ordered by tile_id
   assert (entry.offset = last_offset + last_length OR entry.offset < last_offset)
   last_offset, last_length = entry.offset, entry.length

The programs that actually take advantage of clustered=True don't exist yet and are being worked on in protomaps/go-pmtiles#31

@DerZade
Copy link
Contributor Author

DerZade commented Mar 21, 2023

After a break of several weeks, I finally got around to continuing to work on this.

Only two sections are still missing (Directories > Decoding and Meta Data). Also missing is a proper abstract / introduction.
Apart from these three things, I am mostly happy with the current state of things.

You can find the current version here. I would appreciate feedback 🙃

@bdon
Copy link
Member

bdon commented Mar 21, 2023

@DerZade thanks for your hard work here - just letting you know I want to take a deep look at this but may not have time for a few days! appreciate it.

@DerZade
Copy link
Contributor Author

DerZade commented May 2, 2023

Hey. Just checking in to see if you found some time to look at my draft 😇

@nvkelso
Copy link

nvkelso commented May 2, 2023

@DerZade Speaking for myself, I've found it useful – thanks for putting a more detailed spec together!

@bdon
Copy link
Member

bdon commented May 4, 2023

This is awesome so far! Here's my first pass of very nit-picky details:

PMTiles is a single-file archive of square tiles.

How about a "single-file archive for square pyramids of tiles"? The individual tiles themselves can be arbitrary data, but we are constrained to the pyramid of 1, 4, 16, 64... etc tiles.

Optional JSON meta data

I think we should clarify the spec to make JSON required: the null case is just the empty JSON object. I don't see any downside to doing this and it simplifies the client.

the root directory must be contained in the first 16,384 bytes (16 KB)

This should be 16 KiB and the header (127 bytes) + root combined need to be < 16 KiB.

Meta Data Offset

Can we consolidate on Metadata as a single word with no space? It's easier to search for.

number of bytes reserved

I think this could better be "number of bytes used", since reserved might imply there is empty space in a typical file.

Number of Addressed Tiles

Maybe the "number of tiles addressed by the archive before run-length encoding", to distinguish this from Number of Tile Entries

A directory can only be encoded in its entirety. It is not possible to encode a single directory entry by itself.

What implies this? An archive that addresses a single tile should be valid, AFAIK.

The pseudocode encoding section is useful!

I'll continue looking at this tomorrow, thanks.

@DerZade

This comment was marked as outdated.

@DerZade
Copy link
Contributor Author

DerZade commented May 6, 2023

I have opened the PR. All new changes since the last gist are in extra commits so you can easily see what I changed. I also added a review to the PR with all the items that are still open.

Please leave any comments directly related to my draft in the PR and not here. Otherwise we'll have a bit of a clusterfuck 😇

@youngpm
Copy link

youngpm commented Jul 11, 2023

@bdon et al, Hi! Should the spec mention something about leaf entry position in a directory?

From the implementation, I think it is true that if you're looking for a tileId, you do binary search over the entries in the directory, and if the tileId isn't present, the left closest entry to the tileId (if pointing to a leaf directory), should be searched?

@bdon
Copy link
Member

bdon commented Jul 18, 2023

@youngpm yes, that is correct - if a tile data entry is not found, then the left closest directory entry should be queried. There are some refinements; for example if you ask a z0-12 tileset for a z16 tile, you will always look at the rightmost directory, which isn't efficient; so most clients should be aware of the header-level maxzoom to short-circuit queries beyond the maxzoom of the archive.

@DerZade thanks for your hard work on this and I apologize again for letting this PR go on for so long. I've gotten feedback from other implementers at FOSS4G and we may want to break out some implementation details like pseudocode and what @youngpm described above into a separate "implementation notes" doc, and let the spec document be only the minimum possible to describe the binary format.

Examples of other spec docs:
MBTiles
QOI

@bdon bdon closed this as completed in #172 Aug 17, 2023
bdon pushed a commit that referenced this issue Aug 17, 2023
* docs(spec): detailed rewrite of V3 specification by @DerZade [#107]

* spec bump to v3.2
* includes pseudocode and diagrams for binary layout
* adopt RFC 2119 specification language
@bdon
Copy link
Member

bdon commented Aug 17, 2023

@DerZade sorry again this took so long - with a constant stream of GitHub issues it is hard for me to find time to review larger ones like this - I have some follow up edits we can discuss in smaller issues/PRs, those I will be much better equipped to review and merge in a timely manner

bdon added a commit that referenced this issue Oct 2, 2023
* consistent spelling and capitalization of RunLength, TileID, MVT; wording [#107]

* Spec v.2 changelog; remove sections to be discussed later [#107]

* copy corrections by @DerZade [#235]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants