-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Media type #115
Comments
I suppose having a registered MIME type for Parquet itself (the issue you linked) would be a required first step (since a geoparquet file is technically just a parquet file). |
Yes, that's true. But on the other hand if it takes too long to wait for it in Apache, the community will just come up with their own definition. Microsoft Planetary Computer is currently using |
It would be good to establish a convention that differentiates between parquet and geoparquet. I'm happy to update the media types in the Planetary Computer as needed. I'm not familiar with the various components of a media type, but COG uses |
Not necessarily a helpful answer, but having followed a bit the discussions regarding COG (and not sure we managed to reach an endorsed conclusion), I find that understanding IANA rules for MIME types tends to require a dedicated expertise. For example https://www.rfc-editor.org/rfc/rfc6838.html#page-13 mentions " Media types MAY elect to use one or more media type parameters[...] the names, values, and meanings of any parameters MUST be fully specified when a media type is registered in the standards tree". So I guess that a provision for allowing application=geoparquet should be made at the time application/apache.parquet is registered (unless other rules just ban it or make it already possible...) |
OGC contacted IANA to ask about adding parameters such as profile or application (to GeoTiff in this case) and they said that you can't simply "add" them if the original type is already registered. So you'd need to discuss that with Apache upfront or otherwise register your own, e.g. |
Moving to beta.2. I think we should also try to get in touch with the core Parquet people and see if we can help them register something, even if it's just There was also general consensus on a recent call that we don't want a geo-specific parquet mimetype, we'd just use the general parquet one, and users would rely on the presence of geo metadata in the parquet file (or they could also guess that it'd be likely if there are shapefiles / geopackages of the same data). Happy for arguments on why we should have our own special geo one, and sounds like the time to do so is when apache applies for theirs. But I think we don't really want geospatial systems that distribute a non-geo parquet and a geoparquet. And we also hope that eventually all parquet readers would at least know to identify the standard geo data. |
The same reasons why we want a COG and GeoTiff specific media type over just using "image/tiff" also applies here. If you need to read the file anyway (partially), then you could also just omit media types completely. |
I don't feel that strongly about this on either way, except that an optional parameter (like But I do think the time is 'now' to decide what we want, since as pointed out above the only real chance IANA seems to give for optional parameters is on registration. We can likely help the Apache people with the process, since OGC has experience working with IANA, and we can also just point them to the form to do a vnd registration - you just fill out https://www.iana.org/form/media-types. It does seem like there's precedent for a 'project steering committee' to submit for the official IANA types, with Apache Arrow, Thrift and Node all being submitted by the steering committees or a member on them. But @ogcscotts can likely help navigate the process / talk to the right people. So seems like we should determine which direction we want to go, and if we want an optional parameter we should determine what we'd like, before engaging with them. |
I agree that having a parameter is better than the |
As @jorisvandenbossche mentioned in the meeting today, there's currently progress on Parquet getting a MIME/Media type: https://issues.apache.org/jira/browse/PARQUET-1889 |
Great, so |
If @cholmes is right in his above comment, we'd have to register a |
Yes we would, if we go the official route. But we also never did with COG, and everyone just agreed on a de-facto standard of |
Yeah, if we want something listed in the official IANA then we'd need to do it. Like @m-mohr points out we can just add something on. With COG we wanted to get something registered but it basically wasn't an option. So now is the time to try to advocate for it, if we want to. But I think we were leaning away from that, as mentioned in #115 (comment) I can try to bring it up at the next meeting, but if someone feels like we should push for a 'geo' profile then it'd be good to make the case here. I can't think of the use case where it's really essential, and it seems simpler for the file itself be the place to figure out if it's geo or not. And then not risk it being declared geo but not actually. And I don't see a case where it'd be good to have a non-geo parquet and a geoparquet version of the same file. |
Same reason as for why COGs have a profile: A client can just detect easily what it is and whether it can render it without actually loading (parts of) the file. Think STAC Browser (and ol-stac, stac-layer, ...) for example... @cholmes |
If a Parquet file is served with a media type that indicates that it is GeoParquet, a client cannot blindly try to render it, for example (the same is true for COG, despite what others may believe). Before deciding what to do with the contents of a Parquet file, a client would need to read the footer - this is true for geo and non-geo Parquet. After reading the footer, you can see if it has the geo metadata. @m-mohr - can you provide more specific examples of what a client like STAC Browser would do if it knew that a Parquet file was GeoParquet? If the answer is that it would display the geo-specific metadata, then this is going to require reading the footer of the file - which you can safely do for a non-geo Parquet file as well (and you might want to do anyway to show the user something about the data). |
It's all about giving users the nicest user experience without loading a whole lot of headers upfront. Thinking more about it, it might be more relevant for COGs than it is for GeoParquet files though. Example: |
Looks like there's now a parquet media type, see #115 Search 'parquet' on https://www.iana.org/assignments/media-types/media-types.xhtml application/vnd.apache.parquet I think they could have gotten application/parquet pretty easily, but this does seem consistent with the other apache ones. I'm going to go ahead and make a PR without a 'geo' parquet media type - we can revisit and add it later if there is a lot of value. I do wonder if there's a 'hint' we could give in STAC, for 'show on map'. I also do think it's not crazy to try to blindly render, as most parquet in STAC will likely be geoparquet. |
I'd hope that in STAC people use the https://github.com/stac-extensions/table extension. |
Hi there,
is there already an agreed media type (e.g. for usage in STAC)?
Related issue for parquet: https://issues.apache.org/jira/browse/PARQUET-1889
Maybe something like:
application/vnd.geo+apache.parquet
orapplication/geo+vnd.apache.parquet
?The text was updated successfully, but these errors were encountered: