Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we recommend EPSG:4326 or something else? #52

Closed
cholmes opened this issue Mar 25, 2022 · 48 comments
Closed

Should we recommend EPSG:4326 or something else? #52

cholmes opened this issue Mar 25, 2022 · 48 comments

Comments

@cholmes
Copy link
Member

cholmes commented Mar 25, 2022

This was discussed extensively in #25, but it feels worth revisiting. I think everyone feels good that the core recommendation is to use longitude, latitude in the WKB as the interoperability recommendation. The main question here is how we 'describe' that - use 4326 but then rely on the 'override' in our spec to put longitude first, or use something like OGC:84 that is less popular but actually describes things right.

Other points that were originally in #35:

  • Do we want to continue recommending EPSG:4326?
    • The GDAL page on this topic says: "The generic EPSG:4326 WGS 84 CRS is also considered dynamic, although it is not recommended to use it due to being based on a datum ensemble whose positional accuracy is 2 meters, but prefer one of its realizations, such as WGS 84 (G1762)"
    • For example, QGIS now warns about this, see https://twitter.com/nyalldawson/status/1390118738251317254 for some context.
@cayetanobv
Copy link
Collaborator

cayetanobv commented Apr 3, 2022

This was discussed extensively in #25, but it feels worth revisiting. I think everyone feels good that the core recommendation is to use longitude, latitude in the WKB as the interoperability recommendation. The main question here is how we 'describe' that - use 4326 but then rely on the 'override' in our spec to put longitude first, or use something like OGC:84 that is less popular but actually describes things right.

As Geoparquet will follow WKB order of axis, and therefore the order of axis defined by any CRS will always be override, I agree with you to include something like you are proposing. I wrote this similar text to include at the end of crs section in spec:

Note that as indicated in [Coordinate axis order](link...) the order of the axis in Geoparquet is always x,y, so the EPSG:4326 (latitude, longitude) definition will be overridden. In this sense, the use of OGC:CRS84, which is a variant of EPSG:4326 but differs only in its coordinate order, could be a reasonable option despite being less popular.

Other points that were originally in #35:

  • Do we want to continue recommending EPSG:4326?

    • The GDAL page on this topic says: "The generic EPSG:4326 WGS 84 CRS is also considered dynamic, although it is not recommended to use it due to being based on a datum ensemble whose positional accuracy is 2 meters, but prefer one of its realizations, such as WGS 84 (G1762)"
    • For example, QGIS now warns about this, see https://twitter.com/nyalldawson/status/1390118738251317254 for some context.

We can recommend EPSG:4326 for the sake of simplicity for most of the users (who have low accuracy requirements), but we definitively need to include a warning about the fact that it's using a Datum ensemble and the important accuracy problems related to that. I like the recommendations about how to use this ensemble CRS in one of the last technical docs of IOGP "EPSG null and copy transformations to WGS 84"(https://www.iogp.org/bookstore/product/epsg-null-and-copy-transformations-to-wgs-84/). Take a look at the section about using ensemble Datums for high accuracy applications:
"For high accuracy applications, requiring better than a few metres accuracy over the project time span:
• A geodetic CRS with an ensemble datum whose members are dynamic, or a projected CRS based on such a geodetic CRS, should not be used.
- Be aware that EPSG:4326 (“WGS 84”) is a geodetic CRS with an ensemble datum, as are geodetic CRSs EPSG:4979, EPSG:4978, OGC/1.3/CRS84 and OGC/1.3/CRS84h.
• Preferably use a national geodetic CRS. If this is a dynamic CRS then, to maintain accuracy, ensure that:
- either all data is referenced to a common coordinate epoch when loading,
- or record the coordinate epochs with the coordinates and ensure that the application can apply a point motion operation to transform all coordinates to a common project epoch.
• For web mapping (requiring long-term high accuracy), either a static CRS or a dynamic CRS at a single coordinate epoch should be used."

We could include something like this but more in the short style of QGIS warning (we can also point to GDAL docs where it's explained very well).
We have a PR including the optional parameter "epoch" and this is something probably needed to mention in crs section.

@alexgleith
Copy link

I think that using epsg:4326 as a default is ok, with the caveats as mentioned earlier made clear.

What I think needs to be included is the ability to document the use of any CRS, so not like GeoJSON, which requires you to break the standard to use and document the use of a different CRS.

The broader question of what is a good, global CRS that is accurate and stays accurate through time is really a whole new issue for a far broader set of projects!

@jorisvandenbossche
Copy link
Collaborator

What I think needs to be included is the ability to document the use of any CRS, so not like GeoJSON, which requires you to break the standard to use and document the use of a different CRS.

That is already the case, see the "crs" field in the column metadata: https://github.com/opengeospatial/geoparquet/blob/3a58590ce21ddefc1aff819534e13625a9fb969e/format-specs/geoparquet.md#column-metadata

@cayetanobv
Copy link
Collaborator

The broader question of what is a good, global CRS that is accurate and stays accurate through time is really a whole new issue for a far broader set of projects!

I agree @alexgleith , that's the point. People, in general, are not yet concerned with this important issue. The problem is that it's not easy to understand it from a practical point of view (how to use the epoch with my data and what CRS should I choose). The community is starting to include notes and recommendations in libraries but the road is still long.

@edzer
Copy link

edzer commented Apr 5, 2022

I would advice against recommending a CRS that defines axis order in the opposite way they actually are, as EPSG:4326 does: it preserves the bad habit of using a standard differently from how it was intended. OGC:CRS84 on the other hand is not an ensemble, is WGS84, and has the axis order right.

@jorisvandenbossche
Copy link
Collaborator

OGC:CRS84 on the other hand is not an ensemble

I think it uses the same ensemble datum as EPSG:4326? (at least according to PROJ)

$ projinfo "OGC:CRS84"
$ projinfo "OGC:CRS84"
PROJ.4 string:
+proj=longlat +datum=WGS84 +no_defs +type=crs

WKT2:2019 string:
GEOGCRS["WGS 84 (CRS84)",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Not known."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["OGC","CRS84"]]

@edzer
Copy link

edzer commented Apr 5, 2022

You are right. I was looking at the output of OGRSpatialReference::exportToWkt(..., "FORMAT=WKT2")
, which seems to skip the ensemble info given by projinfo, but gives

GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ID["EPSG",6326]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433],
        ID["EPSG",8901]],
    CS[ellipsoidal,2],
        AXIS["longitude",east,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]],
        AXIS["latitude",north,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]]

for PROJ 8.2.0 and 9.0.0. Strange enough it does gives the ensembles for EPSG:4326; updated my comment.

@cayetanobv
Copy link
Collaborator

I would advice against recommending a CRS that defines axis order in the opposite way they actually are, as EPSG:4326 does: it preserves the bad habit of using a standard differently from how it was intended. OGC:CRS84 on the other hand is not an ensemble, is WGS84, and has the axis order right.

Hi @edzer , for this reason, we are proposing a text clarifying it and including both CRS. See my comment here: #52 (comment)

@tschaub
Copy link
Collaborator

tschaub commented Apr 5, 2022

As long as the CRS is required, it seems like the recommended value might have the following impact

  1. limit confusion about axis order
GEOGCRS["WGS 84 (CRS84)",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Not known."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["OGC","CRS84"]]
  1. be more familiar
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]
  1. avoid ensemble datum
GEOGCRS["WGS 84 (G1762)",
    DYNAMIC[
        FRAMEEPOCH[2005]],
    DATUM["World Geodetic System 1984 (G1762)",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Geodesy. Navigation and positioning using GPS satellite system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",9057]]

As long as CRS is mandatory, it doesn't strike me as a big interoperability hit to use the WKT for OGC:CRS84 over EPSG:4326 (for example). Requiring CRS essentially requires that clients (etc.) be capable of working with an arbitrary CRS.

GeoJSON chose to go with a single CRS in hopes of increasing interoperability. I think it is fair to say that this worked (at the expense of flexibility). Perhaps a balance of interoperability and flexibility could be achieved by saying that the CRS is not mandatory, and if absent, the data is in OGC:CRS84 (or some other). The risk here I guess is that you pick a default that turns out to be wrong in a couple years (solved by issuing a new version of the spec). But I'm gathering that making CRS mandatory has already been decided.

@cholmes
Copy link
Member Author

cholmes commented Apr 6, 2022

GeoJSON chose to go with a single CRS in hopes of increasing interoperability. I think it is fair to say that this worked (at the expense of flexibility). Perhaps a balance of interoperability and flexibility could be achieved by saying that the CRS is not mandatory, and if absent, the data is in OGC:CRS84 (or some other). The risk here I guess is that you pick a default that turns out to be wrong in a couple years (solved by issuing a new version of the spec). But I'm gathering that making CRS mandatory has already been decided.

I wouldn't say that making CRS mandatory has already been decided. I was the one who proposed, but I was really hoping we'd be able to say that to people who don't care about CRS they could just put in 'XXXX' string in there for long/lat and everything would be fine. Turns I had lots more to learn about the current state of CRS's, and there's not an easy answer for what 'XXXX' should be.

So I've recently been leaning towards about what you're suggesting. That we say if there is no CRS then your data should be in long/lat. You can leave out the CRS, and we advise that leaving it out implies CRS=YYYY, and that any implementation that is CRS aware should just use that YYYY.

This seems to be what GeoJSON did - I had looked at it to see what CRS definition they used, but they didn't - it just said i'ts long/lat.

I'm leaning towards YYYY being the WKT2 of OGC:CRS84. We could make it EPSG:4326, with the caveat that we're overriding it to be long/lat, but that still just seems a bit weird for me - seems like the CRS should actually describe what is in there.

It's then probably still an open question if we include the coordinate axis over section that requires '(x, y) where x is easting or longitude and y is northing or latitude.' From my latest learnings I lean towards 'yes' - instead of forcing libraries to look up axis order in the CRS we instead say that they can expect it's x,y/easting,northing, and the crs information is used for projection but not for defining the order of things. That's probably worth its own issue.

I can try to take a crack at a PR that does the above - may make it easier to look at concrete text changes instead of talking in the abstract.

@jorisvandenbossche
Copy link
Collaborator

So I've recently been leaning towards about what you're suggesting. That we say if there is no CRS then your data should be in long/lat. You can leave out the CRS, and we advise that leaving it out implies CRS=YYYY, and that any implementation that is CRS aware should just use that YYYY.

I am not fully sure what it helps to make this less explicit (I am personally fine with making crs an optional field, or a required field but with a value that can be null, but I would then not attach any meaning to "not defined", and have a strong recommendation to always define the CRS).

If we would say that leaving it out implies YYYY, why then not simply having libraries specify YYYY in the metadata? Is it to avoid that people need to be able to write (and recognize) the WKT of YYYY?
(but then this is more an argument for the discussion of "is WKT a good format" cfr #50, rather than for the actual recommendation for which CRS to use)

I'm leaning towards YYYY being the WKT2 of OGC:CRS84. We could make it EPSG:4326, with the caveat that we're overriding it to be long/lat, but that still just seems a bit weird for me - seems like the CRS should actually describe what is in there.

Personally, I don't have a strong opinion about OGC:CRS84 vs EPSG:4326, but I also don't think it matters that much: they are both based on the same (ensemble) datum, and given that we currently override the coordinate axis order anyway, both are essentially equivalent for our purpose.
For me the question is more whether we actually want to recommend the WGS84 ensemble datum at all, or rather "one of its specializations" (as GDAL puts it at https://gdal.org/user/coordinate_epoch.html#dynamic-crs-and-coordinate-epoch)

I think it is also an option to not recommend a specific CRS, and only strongly recommend that you specify a CRS. Or have a more elaborate recommendation (that gives some general guidelines instead of recommending one specific CRS) like the example given above at #52 (comment) by @cayetanobv

@edzer
Copy link

edzer commented Apr 6, 2022

I think that for a user (or software writer) it is most useful if a file at least reveals whether coordinates are geographic or Cartesian, because it matters for computing distances, buffers, finding intersections and so on. If it is not there the software will have to make an assumption, and may make the wrong one.

It might make sense to look at what GPKG does: when writing a GPKG without specifying the CRS, a default CRS corresponding to

GEOGCRS["Undefined geographic SRS",
    DATUM["unknown",
        ELLIPSOID["unknown",6378137,298.257223563,
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433,
            ID["EPSG",9122]]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]]

is added (having the axis order we don't want). For that reason, when writing an object without CRS in R with sf::st_write, a message is given to the user that the non-default,

LOCAL_CS["Undefined Cartesian SRS"]

is being substituted, as the (historically grown) assumption is that data with missing CRS imply some Cartesian CRS.

Assigning an "Undefined geographic SRS" when writing data that has no CRS specified has the advantages that (i) there is no doubt about coordinates being geographic or not when reading the data, (ii) software writers know they need to do something when writing data with Cartesian coordinates, and (iii) it doesn't assign/assume a potentially wrong datum.

A simple boolean metadata flag, indicating coordinates are geographic or Cartesian would of course reach the same goals, and avoid the need to parse a CRS in WKT2 form.

@hobu
Copy link

hobu commented Apr 6, 2022

This seems to be what GeoJSON did - I had looked at it to see what CRS definition they used, but they didn't - it just said i'ts long/lat.

Yes. The reason for this was to do as much as possible to eliminate the implementation burden of CRS on what we perceived to be as the dominant audience of consumers – web ones using GeoJSON as the wire format. Responsibility for consuming, interpreting, and reflecting all of the possible CRSs is a big lift for geospatial folks who care about it. For web people who really don't, they are apt to do their best to ignore what they can, and make up stuff when they can't. GeoJSON said, "ok, fine, we'll do the naive approach too", and in exchange for that, interoperability of the standard was very high because it could be implemented in your own software in an afternoon using copypasta from the specification itself.

I don't think GeoParquet's audience is the same as GeoJSON's, and if the intended dominant use of GeoParquet is as a memory layout and serialization format instead of a wire format, CRS interoperability is really important. @edzer's point about the specification providing a responsibility gradation by denoting cartesian or geographic coordinates is a good one, and it probably meets the needs of many software implementations for their interpretation of the data when they are using it, but a full CRS definition is still needed to specifically define where/when the data are if GeoParquet is to sit as a blob for a decade.

I kind of like GPKG's solution, but I don't have a recommendation for a default CRS. One thing I think is the specification should provide clear consensus on whether or not software implementations are responsible for interpretation and consumption of any possible CRS, and the definition of those CRSs be in only a single format. ASPRS LAS, for example, provides backward compatibility using both WKT and GeoTIFF keys, and because the expressibility of each is not equivalent, the software implementation gets to figure out what to do. If you pick WKTv2, make a statement about supporting all possible WKT future specifications, and how to do it.

@jorisvandenbossche
Copy link
Collaborator

I kind of like GPKG's solution, but I don't have a recommendation for a default CRS

Thanks a lot for the input, Howard. One follow-up question: can you go in a bit more detail what the "GPKG solution" is?

@cholmes
Copy link
Member Author

cholmes commented Apr 6, 2022

I think that for a user (or software writer) it is most useful if a file at least reveals whether coordinates are geographic or Cartesian, because it matters for computing distances, buffers, finding intersections and so on. If it is not there the software will have to make an assumption, and may make the wrong one.

Does https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#edges handle this? Or is this something slightly different?

@jorisvandenbossche
Copy link
Collaborator

Does https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#edges handle this? Or is this something slightly different?

That is slightly different, because we explicitly included this field since there is not necessarily a 1:1 mapping between geographic vs cartesian and spherical edges or not. There exists lots of data using geographic coordinates but that are not valid (or fixed) when being interpreted with spherical edges (one good example is GeoJSON data, which explicitly mentions it uses straight edges)

@edzer
Copy link

edzer commented Apr 6, 2022

Does https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#edges handle this? Or is this something slightly different?

It's related, but slightly different: if you'd convert a GeoJSON to a geoparquet file, it would have geographical coordinates but (according to the GeoJSON specs) assume edges to be planar (straight in a flat, 2D, Cartesian space).

@tschaub
Copy link
Collaborator

tschaub commented Apr 6, 2022

I think there is a significant difference between saying CRS is optional (if not present, the CRS is YYYY) and CRS is mandatory (and you are strongly encouraged to use YYYY).

If the CRS field is optional (and the spec describes the default), CRS-naive applications can happily work with data that doesn't have a CRS field. They can parse it, plot it, transform it, etc.

If the CRS field is required, all applications must at least be able to answer the question equal(data.crs, YYYY) (where YYYY represents the default or any other CRS). This means every application that consumes geoparquet must be able to parse WKT2_2019 and compare two CRS for equality.

This may not be as trivial as it sounds. For example, here are two equivalent CRS represented with WKT2_2019:

GEOGCRS["WGS 84 (CRS84)",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Not known."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["OGC","CRS84"]]

and

GeogCRS("WGS 84 (CRS84)", Ensemble["World Geodetic System 1984 ensemble", Member("World Geodetic System 1984 (Transit)"), Member("World Geodetic System 1984 (G730)"), Member("World Geodetic System 1984 (G873)"), Member("World Geodetic System 1984 (G1150)"), Member("World Geodetic System 1984 (G1674)"), Member("World Geodetic System 1984 (G1762)"), Member("World Geodetic System 1984 (G2139)"), Ellipsoid("WGS 84", 6378137.0, 298.257223563, LengthUnit("metre", 1)), EnsembleAccuracy(2.0)], PrimeM("Greenwich", 0, AngleUnit("degree", 0.0174532925199433)), CS(ellipsoidal, 2), Axis("geodetic longitude (Lon)", east, Order(1), AngleUnit("degree", 0.0174532925199433)), Axis("geodetic latitude (Lat)", north, Order(2), AngleUnit("degree", 0.0174532925199433)))

Keywords are case-insensitive. Space outside of quoted strings is insignificant. Delimiters can be [ or ( and ] or ). Certain keywords (like USAGE and ID) are optional. Anyone reading the spec will likely reach for proj or something else to parse and compare equality. If CRS is mandatory, this essentially means working with geoparquet adds a dependency on proj, even if you just want to know if data is in the recommended CRS (and even proj trips up if the delimiters following the ensemble keyword are parens instead of brackets - proper wkt parsing is hard).

If, on the other hand, CRS is not mandatory, consumers who encounter data without a CRS field can work with it (knowing what CRS it is in because the spec describes it).

@alasarr
Copy link
Collaborator

alasarr commented Apr 11, 2022

Thanks a lot everybody for all the comments, something clear is that is not quite obvious just for the fact we're having multiple discussions about it.

Reading all the comments, I think the approach that better satisfies all of our concerns will be making CRS optional and setting the default to OGC:84 when it's not specified.

I'm going to open a draft PR with this today, so we can see if it's clearer that way.

@cayetanobv
Copy link
Collaborator

cayetanobv commented Apr 11, 2022

Reading all the comments, I think the approach that better satisfies all of our concerns will be making CRS optional and setting the default to OGC:84 when it's not specified.

The complete name (including authority) of the crs is OGC:CRS84. I've included comments in PR #60

@cholmes
Copy link
Member Author

cholmes commented Apr 11, 2022

I'm +1 on CRS optional, assuming Long/Lat as the default, and pointing CRS aware readers at the OGC:84 definition for the default.

@Jesus89
Copy link
Collaborator

Jesus89 commented Apr 19, 2022

PR merged. I think we can close this one

@jorisvandenbossche
Copy link
Collaborator

As I mentioned above, I don't have a strong opinion on the change from EPSG:4326 to OGC:CRS84 as the default / recommended CRS. But, for background, I wanted to share how this will probably work in practice in GeoPandas.
Because part of the reason that I don't care that much, is that it probably won't really change much in practice. The GeoPandas function to write GeoParquet will honor the CRS of the GeoDataFrame (that the user has set themselves, or from reading from another file). So unless the user has data using OGC:CRS84, we will not write a GeoParquet file with that CRS. And in practice, most people will use EPSG:4326 for lat/lon (for better or worse), because that is what our docs say to use, that is what the internet says to use, ..

There is one other case: when the GeoDataFrame has no CRS information. But also in that case I don't think GeoPandas should write OGC:CRS84 as the crs, because we have no basis to assume that this is correct for that data (it can by anything, because GeoPandas supports coordinates "without crs" specified).
And this is where the crs field becoming optional is a bit annoying. Because it is now optional, we have to write some crs, to avoid that other readers would incorrectly interpret it as OGC:CRS84 (which we don't know if that is correct). One solution would be to use some "Undefined Cartesian CRS" as mentioned by @edzer above (sf does that when writing to GPKG). Another solution would be to error and disallow writing to GeoParquet without CRS.

In summary, from GeoPandas point of view, it could be useful to be able to specify that the CRS is "unknown" (in the original geopandas version of this spec, we allowed a null value for the crs field for this case).

@brendan-ward
Copy link
Contributor

Personally, I'm in favor of null (aka CRS is uknown) as a valid option and also the default when not provided by the writer, as we had in the original specification. We should absolutely make a strong recommendation to set it and how to set it when it is known, but it should be neither mandatory nor implied via a non-null default.

Given arbitrary input data that is missing a CRS, we cannot claim to know what CRS it should be (without other info or guessing) nor should we imply what it is via defaulting to OGC:CRS84. null is a very easy thing for readers to check, and decide what they want to do about it (reject it outright, make their own assumptions - stated or otherwise, etc).

It seems unnecessarily burdensome on writers - and also not particularly helpful to readers, that would then need to parse WKT2 - to instead translate an unknown CRS to WKT (e.g., GEOGCRS["Undefined geographic SRS",, etc). Instead, null provides a very clear signal that it is simply unknown; do with it as you will.

If you are a toolkit that knows you are working entirely with OGC:CRS84 data (from whatever inputs), then it is very easy to hard-code the correct WKT and always write that; no WKT parsing required. Likewise, if you are toolkit that simply consumes, transforms (except CRS), and writes GeoParquet data, simply write back out the same CRS as the input; also no WKT parsing required.

But if you are a toolkit where the CRS of the inputs matters, there simply is no avoiding the need to parse WKT - unless you control all your inputs, and thus choose not to care. Given that GeoParquet doesn't have a single mandatory CRS (and shouldn't!), I don't see another way. It's a known issue that WKT is not friendly to parse but not for the spec to solve via implied defaults; that feels dangerouse for reasons stated in comments above.

If adding null as a valid option and default is not acceptable, then indeed all writers will need to come up with an approach for representing unknown CRS, and they may do so differently - so now we have N variations of WKT that have to be checked to indicate simply that the CRS is not known. Doesn't seem like that helps readers either.

@tschaub
Copy link
Collaborator

tschaub commented Apr 27, 2022

The purpose of having a default if crs is not provided is to relieve the burden on consumers (who might be able to work with the default CRS but not necessarily parse all possible CRS WKT to distinguish the default from other).

This is not in conflict with making it possible for a provider to say that the CRS is unknown. Though I'm not sure if it will be difficult for consumers to distinguish between null and an absent crs. If that is the case, it could make sense to come up with a different way to indicate that the CRS is not known.

@brendan-ward
Copy link
Contributor

who might be able to work with the default CRS but not necessarily parse all possible CRS WKT to distinguish the default from other

I understand the goal, but I think this only works where there is also only one supported CRS for GeoParquet; you can ignore CRS because there can be no variation in CRS.

But if the common case in practice is that there will be a wide variation in CRS provided via WKT, including those writers that choose to be explicit (based on what they knew about the data) by writing some variant of OGC:CRS84 / EPSG:4326 / similar - which the reader would have to parse to know if it is equivalent enough to OGC:CRS84. This only gets easier for readers if all writers deliberately omitted CRS and always used OGC:CRS84 to keep it easier for readers, which is to say, please write CRS except if it is OGC:CRS84.

If we are encouraging writers to always include CRS as WKT - which we should - then the cases where the implied default of OGC:CRS84 is used should be small, right?

How is null different from absent crs? I was suggesting that those become equivalent; they shouldn't diverge (i.e., would be bad to a have a null option but a default-if-absent of OGC:CRS84).

I think the convenience case for readers is only where:

  • your use is not impacted by CRS; it can be whatever and does not pose a problem for your analysis
  • you control your inputs a priori so that CRS is known and fixed, no matter how you got there. Maybe this is a pre-processing step, maybe this is by only ever allowing inputs that are pre-standardized to a single CRS, etc.

Otherwise, in an ecosystem of mixed CRS, where you consume what you do not produce, you have to deal with complexities of CRS, right?

@tschaub
Copy link
Collaborator

tschaub commented Apr 27, 2022

which is to say, please write CRS except if it is OGC:CRS84

Yes, this would make the world a nicer place.

@edzer
Copy link

edzer commented Apr 27, 2022

which is to say, please write CRS except if it is OGC:CRS84
Yes, this would make the world a nicer place.

It takes away the ambiguity for missing CRS about whether coordinates are geodetic or Cartesian. GPKG left the datum unspecified, but removing that ambiguity is good IMO.

@cholmes
Copy link
Member Author

cholmes commented Apr 27, 2022

The GeoPandas function to write GeoParquet will honor the CRS of the GeoDataFrame (that the user has set themselves, or from reading from another file). So unless the user has data using OGC:CRS84, we will not write a GeoParquet file with that CRS. And in practice, most people will use EPSG:4326 for lat/lon (for better or worse), because that is what our docs say to use, that is what the internet says to use.

If it's EPSG:4326 Do you write that out as long/lat? Or lat/long? Like in the WKB? It seems to me that if you are writing it out as long/lat then the ideal would be to just leave off the CRS and use the default in geoparquet.

I do think our goal should be to have as much data as possible in the default long/lat. And I agree it feels like a step back if most writers just take EPSG:4326 data and explicitly write out that CRS.

Like could we make the recommendation to 'please write CRS except if it is OGC:CRS84', and make sure the early implementations do that?

@brendan-ward
Copy link
Contributor

I'll preface this by saying I think some of the challenges (and tension) we're having here is between trying to standardize the metadata that goes along with the data and transport of the data (i.e., structure w/in parquet file) versus standardizing the geospatial data being transported. The spec emerged from the former, and while the latter is noble, it's also a tough sell given the wide variability of geo data that can be transported via WKB (for now; arrow spec later) and variability of data we're trying to shove into this format. Thus concerns about gatekeeping what is allowed into the format (i.e., data representations that are not valid according to spec) and the implementation or performance impact of standardizing the data and not just the transport.

could we make the recommendation to 'please write CRS except if it is OGC:CRS84', and make sure the early implementations do that?

This seems counterproductive to good metadata, right? It seems like the best practice is for writers to document what they know, but document it in a well-defined manner. Thus it seems reasonable that the spec would prescribe how you document CRS information. Further, if future versions of the spec relax that default, or want to switch to a different default, now readers and writers have to do more strict version checking in order to do the right thing.

Re: ambiguity of geodetic vs cartesian. In the case outlined above where a dataset simply has no CRS defined, a toolkit still cannot automatically with 100% confidence determine which of those two it is, which means that emitting a WKT with unknown cartesian or unknown geodetic is still potentially not correct. Which then means a 3rd unknown variant, or that the writer simply should prevent writing a dataset with an unknown CRS. Such gatekeeping maybe is OK if the strict goal is for interoperability, but it seems possibly counterproductive for internal use where CRS is irrelevant.

From the reader perspective, readers are free to reject outright a dataset where crs is missing (if default is null) or crs=null, if it would be problematic to support those. They can also read the data, and then it is the responsibility of the analyst to overcome that missing information; this is what I'd have to do as an analyst within a GeoPandas workflow. But it is the user not the tool that determines what to do with data that have no defined CRS.

Another consideration is round-tripping the data through GeoParquet. If we make the assumption that the writer can produce WKT that accurately matches the data, then it is reasonable to assume that we can write to GeoParquet, read from GeoParquet, and assert that the CRS is the same. If a datset has no CRS set, and we backfill that with WKT that states the CRS is unknown (geopackage example), when we read that we now have a non-empty CRS and can no longer assert (without additional processing and / or assumptions) that the CRS that was read matches the CRS that was written. Instead, if we allow unset (=null) or null, we can roundtrip the CRS accurately; it is unset on both write and read.

Likewise, if the CRS can be taken to be close enough to OGC:CRS84 and thus omitted on write and repopulated from OGC:CRS84 on read, it seems possible in theory (untested!) that it may not be identical to the WKT of the CRS that would have been written - so we can't easily assert that the CRS matches between what was written and what was read. I haven't tested this, so take it with a grain of salt.

I do think our goal should be to have as much data as possible in the default long/lat

This seems like it would be a good recommendation rather than element of the spec. E.g., for better interoperability, we recommend that your coordinates are in OGC:CRS84 and follow orientation="counterclockwise" orientation, etc. Then users and toolkits can opt-in to the requisite coordinate transformations, or at minimum simply document their existing CRS. Just like someone now could reproject their data from an obscure / isolated CRS to OGC:CRS84 for posting to a public data store, in order to increase portability. The spec doesn't require that they do so, but we encourage it, and we give them a mechansim to document that they did so (crs field).

it feels like a step back if most writers just take EPSG:4326 data and explicitly write out that CRS

But in this case, if the writer knows with 100% accuracy the CRS, it is very easy to simply hardcode the CRS into the writer, right? And likewise if they read then write data in exactly that CRS, they can simply copy the CRS from input to output. Then it is written directly within the metadata alongside the data following an established format (WKT). I'm not seeing a lot of burden here on the writer to write with an effectively hard-coded or copied value, but I could be wrong.

I think if we're trying to make the argument of convenience to readers that they can ignore the crs field and that data are in one and only one well-defined CRS, then it seems like the spec has to take a strong stand on prohibiting WKT in the case of OGC:CRS84 because presence of crs in that case would signal to the reader than more work is required to parse and determine if that is equal to OGC:CRS84 or something else entirely. This then seems counter to an optional field.

I think this would be a different situation if we were inventing a format where data are represented in only one coordinate representation, and we have to decide how to document that that CRS is (i.e., the GeoJSON problem). Instead, we have a format that supports and documents various CRS's, so you have to be able to work with CRS if you interoperate with data outside your control.

All that said, my suggestion is that the default for crs is null and that null or unset means that CRS is unset and we can make no assertions about the CRS of the underlying data; this is consistent with orientation, and likely a good precedent for other optional fields. Implied defaults feel awkward unless there is no variability in that thing in practice: they force writers to then deal with writing out data where that bit of information is unknown and / or expensive to calculate, but cannot be left out because the default implies something else entirely (hence the need to solve this for writing to GeoParquet in GeoPandas).

@cholmes
Copy link
Member Author

cholmes commented Apr 28, 2022

We probably should try to have this discussion in a synchronous meeting at some point, to really get into it. But getting all these points of view written down is great, appreciate the time we're taking to try to sort this out to hopefully get to a solution that balances all the various concerns. I don't have the time to respond in full, but one point:

This seems counterproductive to good metadata, right? It seems like the best practice is for writers to document what they know, but document it in a well-defined manner. Thus it seems reasonable that the spec would prescribe how you document CRS information

So I think I'd agree with you that this is counterproductive to good metadata, except that in most cases adding the CRS for EPSG:4326 is actually bad metadata, since most systems override that it says lat, long and use it for data that is long, lat. So the metadata (EPSG:4326) indicates that it's latitude, longitude, but is actually inaccurate in many/most cases, relying on some often used common knowledge - so it's not actually 'good' metadata.

I'm definitely not sure that the right answer for the crs field is to encourage any writer that is writing out long/lat to leave the crs blank. But so far the other answers seem less compelling.

The important thing to me is that long, lat is the recommended 'default' for people who don't know anything about CRS's, and that we encourage data to be written as that for maximum interoperability. But that we also do enable those who prefer other projections to use geoparquet. The tricky part does seem to be striking the right balance with defaults / recommendations / etc. relative to the mess that is CRS's and axis order for longitude & latitude. And I'd prefer we attempt to take one step to 'help' that situation and make it work both for readers and writers that are CRS aware and also make it easy for those that are not CRS aware. Like not forcing non-CRS aware readers to try to look for every possible WKT2 string that might mean long, lat to know that they can read it. Isn't that what we'd need to do if we don't have a default CRS? Provide a list of all the WKT2 strings that could potentially correspond to long/lat? I'm definitely open to other answers, but I'd say that's my main concern: how we help non-crs-aware readers to be able to read all the geoparquet data that is in long/lat.

@cholmes
Copy link
Member Author

cholmes commented Apr 28, 2022

I think if we're trying to make the argument of convenience to readers that they can ignore the crs field and that data are in one and only one well-defined CRS,

I don't think we are saying to them that the data is in one and only one well-defined CRS. I think we're saying to them 'the data is stored in longitude, latitude. Readers who are CRS aware can use OGC:CRS84 if they'd like to reproject it.

But I agree to make that work well we need writers on board to try to identify when they are aware that they are writing out long/lat that they don't include the CRS.

@brendan-ward
Copy link
Contributor

I agree that a synchronous meeting would be helpful, and I hope that my critical comments here are helpful to advancing this effort and not just being a wet blanket.

I think the specification extension (or whatever it should be called) idea I outlined in #89 would address your concerns around giving non CRS-aware readers the ability to safely opt-out of parsing CRS, without getting into awkward territory around how the crs field is populated. Thus the default expectation is that you have to be CRS-aware, but that there is a mechanism separate from the crs field that signals you can opt-out of being CRS-aware.

I agree that EPSG:4326 causes ambiguity when used between toolkits, but that there are likely some fairly entrenched existing practices within toolkits. The unpalatable options are either passing through whatever is used in the writer (e.g., EPSG:4326 but long, lat order) because that will round-trip accurately within that toolkit (i.e., internal use) but risk portability, or coercing to the more correct OGC:CRS84 for portability but break round-trip use of the data. This seems like something where the spec can make both a recommendation: don't use EPSG:4326 for portability (but not prohibit usage), and via the extension idea, a well-defined manner to opt-in to it.

@jorisvandenbossche
Copy link
Collaborator

jorisvandenbossche commented Apr 29, 2022

Something that contributes to making this a difficult discussion (apart from the inherent difficulty of coordinate reference systems ;)), is that there are different, but intertwined questions:

  • Do we want to have CRS required or optional?
  • If it is optional, what is the "default" CRS, or if required, what is the "recommended" CRS?
  • Do we think WKT is a good format to store the CRS value? (xref Consider more information about the CRS? #50)

Partly, I would have preferred to try to keep those as separate discussions to keep it structured. But they are also clearly affecting each other, because as far as I understand from the discussions above, one of the reasons for making CRS optional (with a default value) is because WKT is inconvenient and requires a parser to understand even basic question of "are this geographic lon/lat coords".

So that makes we wonder, while we are discussing again the optional-ness of the crs field, if we shouldn't reconsider how the CRS information is stored. So that we can solve the problem of "WKT is hard to parse" by improving how the CRS is stored, rather than removing the CRS information altogether. (cfr #50)

Depending on how we improve this CRS information, that might actually go in the direction of the extension idea @brendan-ward is proposing in the comment above (#89)

Some possible ideas:

  • Use projjson instead of WKT2 as the representation. Implementations have to be able to parse JSON anyway, and then you don't need to understand the full projjson schema just to know whether it is WGS84 or not (eg projjson["type"] == "GeographicCRS" just to know if it uses geographic coordinates, or a more specific check like projjson["name"] == "WGS 84")
  • Add an additional (optional) crs_name field next to crs, which is recommended to set to the name of OGC:CRS84. If that field is present, implementations can just check that for OGC:CRS84 to know whether it is using that CRS or not without parsing the full WKT2 string.
  • Add an additional (optional) flag that just indicates whether this is "WGS84 based lon/lat coordinates", for example a boolean is_wgs84_geographic_coords=True/False field (or some other shorter name). In that case readers that want to know this can just check that field, and writers can use both OGC:CRS84 or EPSG:4326 for the actual crs WKT as they prefer.

EDIT: while I was writing this, it seems @kylebarron opened a very related discussion about using PROJJSON ;) -> Thoughts on PROJJSON for CRS encoding? #90

@jorisvandenbossche
Copy link
Collaborator

Sidenote, I would like to disagree with the following:

in most cases adding the CRS for EPSG:4326 is actually bad metadata

Yes, it is unfortunate that there is a need for the whole "authority compliant axis order" vs "traditional GIS order" concept. But I think you either say "we follow the axis order as specified by the CRS" (and then actually follow that) or either "we explicitly overrule this and always use the same axis order, regardless of the CRS" (in which case the axis order of the CRS doesn't matter anymore).
And at the moment we explicitly do the second:

The axis order of the coordinates in WKB stored in a geoparquet follows the de facto standard for axis order in WKB and is therefore always
(x, y) where x is easting or longitude and y is northing or latitude. This ordering explicitly overrides the axis order as specified in the CRS.
This follows the precedent of [GeoPackage](https://geopackage.org), see the [note in their spec](https://www.geopackage.org/spec130/#gpb_spec).

At that moment, I don't think it is "bad" to use a CRS that has a different axis order, since we explicitly specified that this is OK and we ignore it. For EPSG:4326 there is an alternative, i.e. OGC:CRS84 which is exactly equivalent except for the axis order. But there are many other geographic CRS options that define lat/lon axis order (eg NAD83, one of the specific realizations of the WGS84 ensemble, etc), for which there is not such an equivalent alternative CRS.
(and IMO having an alternative CRS for each of those with just a different axis order would only make things more confusing)

@rouault
Copy link
Contributor

rouault commented May 2, 2022

Giving my GDAL perspective,

  • the GDAL GeoParquet writer writes a CRS when it is known to it (even if EPSG:4326), and otherwise let it undefined
  • on the reading side, if it doesn't found the crs member, it assumes it is EPSG:4326, following the spec. Personally I'm a bit uncomfortable to make such assumption, but this is similar to GeoJSON in this respect, so I can live with that, although I would prefer that missing crs member means "unknown crs".
  • I don't have a particular problem using EPSG:4326 instead of OGC:CRS84 w.r.t axis order, given that we are explicit we don't follow authority axis order.

@kylebarron
Copy link
Collaborator

In my mind, the argument for having CRS default to OGC:CRS84 was succinctly stated by @tschaub above:

If the CRS field is required, all applications must at least be able to answer the question equal(data.crs, YYYY) (where YYYY represents the default or any other CRS). This means every application that consumes geoparquet must be able to parse WKT2_2019 and compare two CRS for equality.

On an issue in geo-arrow-spec, @rouault pointed out

A better string based approach to identify EPSG:4326 would be to check that the string ends with (not just contains) ID["EPSG",4326]] (however one cannot exclude that the formatting will be slightly different with extra spaces and identation, so one would have first to strip all space, tabulation, newline). ID["EPSG",4326,URI["urn:ogc:def:crs:EPSG::4326"]]] would also be valid in WKT2 (cf example at end of §7.3.3 "Identifier" of http://docs.opengeospatial.org/is/18-010r7/18-010r7.html#37)

Given the issues with WKT, a function to assert equal(data.crs, 'EPSG:4326') would not be trivial.

Would switching to PROJJSON nullify these concerns? Testing with Pyproj, the id field of PROJJSON seems easy to parse:

In [1]: from pyproj import CRS

In [2]: CRS('epsg:4326').to_json_dict()['id']
Out[2]: {'authority': 'EPSG', 'code': 4326}

In [3]: CRS('crs84').to_json_dict()['id']
Out[3]: {'authority': 'OGC', 'code': 'CRS84'}

For my own needs on the web, if I can check the CRS of the data as easily as suggested above only via JSON, I would not be opposed to having the spec always require CRS metadata (or having null or missing indicate unknown CRS instead of WGS84).

@rouault
Copy link
Contributor

rouault commented May 2, 2022

Would switching to PROJJSON nullify these concerns?

that would be simpler indeed. Note that potentially, you can have several id for an object, and then the "ids" member is used to have an array of id. But that's mostly a theoretical concern as this isn't much used in practice.

@paleolimbot
Copy link
Collaborator

Just getting around to rewriting some example files to catch up with the latest version of this spec, and I'm still not sure what the best way is handle the case where the CRS is unspecified by the user (in R this happens frequently for reasons that are sometimes good and sometimes bad). I've read "crs": null and "crs": "unknown" as options above...perhaps null or {} would be best since PROJJSON would introduce something that isn't a string into the crs field. An empty object {} might be nice so that readers can always make the assumption that the crs field is an object?

(I'm also +10000 to PROJJSON being in the CRS field!)

@jorisvandenbossche
Copy link
Collaborator

In my mind, the argument for having CRS default to OGC:CRS84 was succinctly stated by @tschaub above:

If the CRS field is required, all applications must at least be able to answer the question equal(data.crs, YYYY) (where YYYY represents the default or any other CRS). This means every application that consumes geoparquet must be able to parse WKT2_2019 and compare two CRS for equality.

I fully understand that concern. But so I think that there are other ways to address this issue than omitting the CRS. For example, we could add another (optional) field that indicates whether data are WGS84 lon/lat (see some ideas in my comment above at #52 (comment)). Or by switching to PROJJSON as you proposed!


Based on the latest discussions, I think there are several ideas floating around:

  • Use PROJJSON instead of WKT as the crs representation
  • Make the crs field required again (and potentially provide other means to check for WGS84-based lon/lat coords)
  • Allow the crs to be undefined / unknown, eg using the explicit "crs": null or "crs": "unknown"
  • (reconsider EPSG:4326 vs OGC:CRS84 as the default recommeded crs again?)

Others?

I think the 3rd item to allow the CRS to be explicitly "unknown" might be relatively uncontroversial? (it's backwards compatible). In that case I could already open a PR for that item. I would go for "crs": null (instead of "crs": "unknown"), as that seems more distinct from an actual WKT string (and is compatible with moving to PROJJSON)

@rouault
Copy link
Contributor

rouault commented May 4, 2022

+1 for "crs": null to indicate undefined/unknown CRS

@tschaub
Copy link
Collaborator

tschaub commented May 4, 2022

Is the top-level id member required for PROJJSON? The JSON Schema for PROJJSON doesn't require id, but I'm not sure if there is some other detail in the specification that would make it required. Perhaps it is just always included by convention.

The identifier keyword (ID) in WKT 2019 looks optional (https://docs.opengeospatial.org/is/18-010r7/18-010r7.html#37). I'm not sure if the top-level identifier is optional or if this just means the id for other objects is optional.

I think JSON or more structured data representing the CRS sounds great. Just wanting to get clarification on when a consumer can rely on id.

@rouault
Copy link
Contributor

rouault commented May 4, 2022

Is the top-level id member required for PROJJSON?

no, it is optional, as in WKT. Typically a custom CRS will likely lack a top-level id.

@jorisvandenbossche
Copy link
Collaborator

We probably should try to have this discussion in a synchronous meeting at some point, to really get into it.

General notice: we are having a synchronous meeting tomorrow (Tuesday) at 15:00 UTC (8am Pacific, 5pm central Europe). Everybody is welcome, so if you would like to join, let me know and I will send you an invite with the meeting details.

@cholmes
Copy link
Member Author

cholmes commented May 9, 2022

Sorry for not responding in for awhile - got too busy with other stuff. Will attempt to sound in on a few things in this comment.

Something that contributes to making this a difficult discussion (apart from the inherent difficulty of coordinate reference systems ;)), is that there are different, but intertwined questions:

  • Do we want to have CRS required or optional?
  • If it is optional, what is the "default" CRS, or if required, what is the "recommended" CRS?
  • Do we think WKT is a good format to store the CRS value? (xref Consider more information about the CRS? #50)

Partly, I would have preferred to try to keep those as separate discussions to keep it structured.

A big +1, and I do think we should try to structure the synchronous conversation around this tomorrow. And I agree they all are intertwined, so the synchronous conversation should help.

Use projjson instead of WKT2 as the representation. Implementations have to be able to parse JSON anyway, and then you don't need to understand the full projjson schema just to know whether it is WGS84 or not (eg projjson["type"] == "GeographicCRS" just to know if it uses geographic coordinates, or a more specific check like projjson["name"] == "WGS 84")

I think projjson does solve some of the original concerns that I had, so am definitely psyched to explore this. Big question to me is how much support there is in non-GDAL geo-tools chains, and if support isn't great than how hard is it to implement? One very concrete example would be ESRI - it'd be a big win for them to support geoparquet, but they likely don't support projjson yet, and I don't think they use proj under the hood.

Add an additional (optional) crs_name field next to crs, which is recommended to set to the name of OGC:CRS84. If that field is present, implementations can just check that for OGC:CRS84 to know whether it is using that CRS or not without parsing the full WKT2 string.

This one doesn't seem like it'd help the case where someone has data as long,lat but calls it 4326, and the writer just writes that out? Like in this case the writer wouldn't write that the crs_name is OGC:CRS84, right?

But I suppose this helps if we don't adopt projjson, as it means that wkt parsing isn't required for everything.

Add an additional (optional) flag that just indicates whether this is "WGS84 based lon/lat coordinates", for example a boolean is_wgs84_geographic_coords=True/False field (or some other shorter name). In that case readers that want to know this can just check that field, and writers can use both OGC:CRS84 or EPSG:4326 for the actual crs WKT as they prefer.

I like this one. Those who do CRS can do their thing, but naive readers who don't want to worry about that can count on it being lon/lat. I think the one small downside is it introduces corner cases where things don't agree with one another - like if someone used web mercator CRS / coordinates and didn't know what they were doing and just set this to 'true'. That's clearly a violation, but what do we do if they set crs to 'null' (which I'm +1 on)? I think this can all be solved by good validation tools. But an original reason to just do one CRS field was so that things couldn't get out of sync - there's just one source of truth.

@rouault
Copy link
Contributor

rouault commented May 9, 2022

Like in this case the writer wouldn't write that the crs_name is OGC:CRS84, right?

If we allow a non-WKT string value for crs, an alternative would be to use URLs like http://www.opengis.net/def/crs/OGC/1.3/CRS84 or http://www.opengis.net/def/crs/EPSG/0/4326 .

@cayetanobv
Copy link
Collaborator

Regarding PROJJSON +1. This is not a new thing at all; It's been here since 2019 (https://github.com/OSGeo/PROJ/releases/tag/6.2.0) so compatibility is high because most open-source software (and an important part of non-open) is using Proj and can handle it. And indeed it makes easier the life of developers. But the @cholmes concern about other important software non supporting this is something to take into account.

Also +1 for "crs": null to indicate undefined/unknown CRS.

@cholmes
Copy link
Member Author

cholmes commented May 6, 2024

Closing this, as we shipped 1.0 and made this decision awhile ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

14 participants