Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short codes with padding are unintentionally considered valid #273

Closed
JonMcPherson opened this issue Mar 24, 2019 · 16 comments
Closed

Short codes with padding are unintentionally considered valid #273

JonMcPherson opened this issue Mar 24, 2019 · 16 comments

Comments

@JonMcPherson
Copy link
Contributor

JonMcPherson commented Mar 24, 2019

The Open Location Code specification mentions that full codes with padding cannot be shortened, which would imply that a short code with padding is invalid. However, the API reference for code validation does not specify anything about allowing or disallowing short codes with padding. And as such, most (if not all) of the implementations will determine short codes with padding as being valid (and short), and even allow them to be recovered. Because of this gap in the specification there are inconsistencies and unexpected results between the different implementations and how short codes with padding are recovered.

Theoretically, recovering a short code with padding can sensibly result in a full code with the same padding. Some implementations (such as JavaScript) do maintain the padding when recovering shortened codes as one might expect. Whereas other implementations (such as Java) do not and always result in a full code with 8 digits (non-padded) where the digits that replace the padding are determined by the recovered coordinates which may be unexpected.

Example:

// Throws exception since specification disallows shortening padded codes
OpenLocationCode.shorten("8FWC2300+", 48.025, 8.075);

// JavaScript returns: "8FWC2300+"  (Padded 6 digit code)
// Java returns:       "8FWC23GG+"  (Non-padded 8 digit code as center of 6 digit code area)
OpenLocationCode.recoverNearest("2300+", 48.025, 8.075);

To resolve this issue, I think the following needs to be done:

  1. The Open Location Code specification should be updated to define short codes with padding as either being explicitly disallowed (consistent with shorten() spec), or explicitly allow it if it makes sense and can reasonably be recovered/shortened.
  2. Update the implementations to handle the change to short code validation and/or short code recovery (dependent on the decision to allow/disallow)
  3. Update the test_data/validationTests.csv to include a padded short code to test that all implementations now conform to the updated spec and handle padded short codes appropriately.

I think the easiest and probably best option is to just explicitly disallow padded short codes to be consistent with disallowing shortening of padded full codes. This would also indirectly resolve these inconsistencies assuming that short code validation is fixed in each implementation.

@JonMcPherson
Copy link
Contributor Author

ping @drinckes

@drinckes
Copy link
Contributor

Thanks for this, I'm going to take a look at it in the next couple of days.

@JonMcPherson
Copy link
Contributor Author

This PR ^ resolves this issue

@JonMcPherson
Copy link
Contributor Author

@drinckes @zongweil Is this project still being maintained?

@zongweil
Copy link
Contributor

Hey, yes, sorry for the delay Jon. I'm going to let @drinckes review this, as it involves changing the spec.

@drinckes
Copy link
Contributor

Yes, my fault. I'm going to send this today to the public mailing list as well as some other interested parties to give them a chance to provide feedback.

Anything we do will require a modification to the spec, even if it's just clarification, so what I propose is that I summarise the questions here, we get some feedback, and then proceed from there to a consensual decision.

Question: What should the spec say about:

  1. Should shortening of codes with only eight digits be allowed? E.g. 8FVC9G8F+
  2. Should shortening of codes with six digits be allowed? E.g., 8FVC9G00+
  3. Should shortening of codes with two or four digits be allowed? E.g., 8FVC0000+

Recommendation

I recommend that the answer to all of these is yes, with the requirement that the resulting code always includes a minimum of two significant digits.

So for example, shortening 8FVC0000+ with relation to Zurich would return VC0000+. Shortening 8F000000+ with relation to Zurich will return 8F000000+ (because it must have two digits).

This basically says that you can shorten (or attempt to shorten) any code with relation to a location, and you the recover method will return the original code. The number of digits doesn't make this either work/not work.

Comments please! If I receive comments from interested parties without github accounts I'll post them here under my name.

@bocops
Copy link
Contributor

bocops commented Apr 10, 2019

I think a suggestion could be added to the spec that codes SHOULD NOT be padded and shortened at the same time: what's the use case for "VC0000+ Zurich", if that code describes an area much bigger than Zurich itself (~10% the size of whole Switzerland) and if the code isn't even shorter than the full code "8FVC0000+"?

That said, I don't see any reason for artificially removing the ability to recover a short code with padding while that is strictly possible.

It should just be ensured that padding characters are kept intact while recovering a full code. @JonMcPherson mentioned that the Java implementation recovers short code "2300+" to "????23GG+". Inflating a code's precision like that seems like a bug to me.

This bug is currently shared by maps.google.com, by the way. Searching for "8FVC0000+" on Maps leads to a map pin at "GG22+22, Niederglatt, Switzerland".

@JonMcPherson
Copy link
Contributor Author

Yea after giving it more thought, I think that the better option is to not restrict and to explicitly allow padded codes (with 2, 4, or 6 digits) to be shortened and recovered. I'm all for avoiding unnecessary exceptional cases and having one less precondition or thrown exception.
Basically since the resolution of the code can be maintained when shortening and recovering, then a code of any resolution should be allowed to be shortened and recovered.

However, I think if the API spec for shorten() is going to modified, we should consider addressing this other issue which I think is valid and relevant to this issue.

The API spec states:

If the code cannot be shortened, the original full code should be returned.

This makes the return type ambiguous as it can be either a short or full code. Because of this ambiguity, the caller has to know to check if the returned code isShort() or isFull() (or is equal to the original full code).
I think that any ambiguity in whether a code is short or full should be avoided because a short code cannot be decoded and can only be recovered. In other words, short codes cannot be used unless they are recovered, and so they should be dealt with and even represented separately from full codes.

This is why in the v2.0 release of my C# implementation, I refactored the class structure so that OpenLocationCode accepts only full codes, and a separate OpenLocationCode.ShortCode class was added which accepts only short codes. This approach avoids any ambiguity and the need for throwing InvalidOperationException such as when calling decode() on an OpenLocationCode that is a short code.

Additionally, there are inconsistencies in how the different implementations handle this. The Java implementation does not follow the spec and actually throws an IllegalArgumentException stating that the reference point is too far from the center of the code area when the code couldn't be shortened. Whereas most other implementations follow the spec and just return the original full code when it couldn't be shortened.

I think that if shorten() couldn't shorten the code, then it should communicate that to the caller in some way. Whether that be by throwing an exception or just simply returning null, I think that is better than returning a full code from shorten() making it ambiguous and even confusing. Maybe language features such as Optional types can be used for some of the implementations to better represent that the shorten() method will attempt to shorten the full code and thus a returned short code is optional.

@WilliamDenniss
Copy link
Contributor

Should shortening of codes with only eight digits be allowed? E.g. 8FVC9G8F+
Should shortening of codes with six digits be allowed? E.g., 8FVC9G00+
Should shortening of codes with two or four digits be allowed? E.g., 8FVC0000+

In my opinion, the answer is yes to all three, and without restriction (no minimum significant digits requirement).

For the reason that they are still meaningful, even if they may not always be considered particularly useful. E.g. 0000+, San Francisco can be recovered to a valid plus code, so why restrict it unnecessarily? I think in general that the spec should focus on the algorithm (objectively: can this code be meaningfully resolved) and avoid making usability judgments (subjectively: do we think people should actually use this form). There may be future use-cases we have not thought of, and restricting them unnecessarily doesn't add much value to me.

Case in point: I really like that the spec today doesn't set short codes to be say "exactly 6 digits" (e.g. RG9C+VH). While this is the common size of shortened code we see (as it maps nicely to greater city areas), other use-cases for even shorter codes, say at a neighborhood level (e.g. 9C+VH), are still viable. This flexibility allows people to innovate on top of plus codes, taking advantage of their useful inherent properties.

I also prefer this approach as it keeps the spec simpler and with less edge cases.

@drinckes
Copy link
Contributor

@WilliamDenniss

If the proposal is that it is acceptable to remove all significant digits from these codes, then is it also acceptable to remove all significant digits from 8FVC9G8F+6W, or the first 10 digits from 8FVC9G8F+6WG?

If yes, then we will need to decide how to represent these currently invalid strings (00000000+00G or just +00G).

If no, then shortening padded codes is introducing a new edge case anyway.

Shortening the codes is done to make them easier for people to use in the context of addressing, by reducing the number of digits someone has to remember.

I think we risk defining a mathematically elegant but not practically useful solution here.

@WilliamDenniss
Copy link
Contributor

WilliamDenniss commented Apr 30, 2019

My approach would be similar to your comment on the uneven length codes: "be conservative in what you do, be liberal in what you accept from others", which I would apply in a way that "do" are the codes we generate in the API, and "accept" are codes that are considered valid according to the spec (which is possibly not the original intent of the phrase).

I don't honestly see the harm in accepting as valid a code that can be meaningfully resolved. As a straw-person, codes like +00G could even have some application for finding locations at a very small scale like rooms in a house, or smaller (though I'm not advocating for this use).

I totally agree that the API should not by default generate codes that are hard to use, I think this kind of specialty use would be left for experts to determine.

The API design for short codes already exhibit this principle. shorten will return short codes with an included safety margin, with usability in mind. But recoverNearest will just resolve the code to the nearest. This guides users to make the right decision but doesn't stop an expert user say further truncating short codes themselves when they are confident in their own application (i.e. if I'm generating a POI map of SF, I can by-pass shorten completely and just truncate by a fixed amount).

@Evilc06
Copy link

Evilc06 commented Jul 23, 2020

@JonMcPherson Hi, this question has nothing to do with this issue. I was trying to find a way to contact you to ask you regarding the parse live query for .NET and Unity. I downloaded your repo and placed them within my Assets folder but I tried to create a parse live query client and connecting it to the URI of the parse server (using the API Back4App) but when testing with a test callback I tried to log that I have entered the callback but it did not work. So I wanted to ask you, is there maybe something missing within the code and it is not a finished project yet or could I possibly be doing something wrong?

@fulldecent
Copy link
Contributor

Recommended tag: specification

@fulldecent
Copy link
Contributor

This issue, and #462, is fixed by #463

It is specified:

  • Plus Codes with code length less than 8 SHALL NOT be represented as short codes.
  • If both distances are less than 10 degrees, digit places 1–2 MAY be omitted.
  • If both distances are less than 0.5 degrees, digit places 1–2 or 1–4 MAY be omitted.
  • If both distances are less than 1/40 degrees, digit places 1–2, 1–4 or 1–6 MAY be omitted.

This incorporates existing specifications at https://github.com/google/open-location-code/blob/main/docs/specification.md#short-codes.

@fulldecent fulldecent mentioned this issue Jun 4, 2021
9 tasks
@JonMcPherson
Copy link
Contributor Author

Awesome. I'll update the C# implementation after the Java implementation is updated to use as a reference for consistency.

@drinckes
Copy link
Contributor

From the responses above and the response in the mailing list:

Should shortening of codes with only eight digits be allowed? E.g. 8FVC9G8F+
Should shortening of codes with six digits be allowed? E.g., 8FVC9G00+
Should shortening of codes with two or four digits be allowed? E.g., 8FVC0000+

Are all a yes.

I personally don't think we want to entertain codes with padding after the plus though e.g. +00G, so will need to make sure the spec doesn't define that or add a "please don't" type of note.

@JonMcPherson comment re shorten() method return values is a good one. I think the spec should say that it should return either a short code or a language appropriate response such as an exception (C++, Java), error (Golang) or null.

We also need to address the bug in the recovery where some implementations are inflating the precision.

That's a lot for a single issue, and it's drifted a bit from the earlier comments, so I've created separate issues for the spec and shorten() changes:

Thanks everyone for your patience and contributions! Let me know if I missed anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants