Short codes with padding are unintentionally considered valid #273

JonMcPherson · 2019-03-24T05:00:26Z

The Open Location Code specification mentions that full codes with padding cannot be shortened, which would imply that a short code with padding is invalid. However, the API reference for code validation does not specify anything about allowing or disallowing short codes with padding. And as such, most (if not all) of the implementations will determine short codes with padding as being valid (and short), and even allow them to be recovered. Because of this gap in the specification there are inconsistencies and unexpected results between the different implementations and how short codes with padding are recovered.

Theoretically, recovering a short code with padding can sensibly result in a full code with the same padding. Some implementations (such as JavaScript) do maintain the padding when recovering shortened codes as one might expect. Whereas other implementations (such as Java) do not and always result in a full code with 8 digits (non-padded) where the digits that replace the padding are determined by the recovered coordinates which may be unexpected.

Example:

// Throws exception since specification disallows shortening padded codes
OpenLocationCode.shorten("8FWC2300+", 48.025, 8.075);

// JavaScript returns: "8FWC2300+"  (Padded 6 digit code)
// Java returns:       "8FWC23GG+"  (Non-padded 8 digit code as center of 6 digit code area)
OpenLocationCode.recoverNearest("2300+", 48.025, 8.075);

To resolve this issue, I think the following needs to be done:

The Open Location Code specification should be updated to define short codes with padding as either being explicitly disallowed (consistent with shorten() spec), or explicitly allow it if it makes sense and can reasonably be recovered/shortened.
Update the implementations to handle the change to short code validation and/or short code recovery (dependent on the decision to allow/disallow)
Update the test_data/validationTests.csv to include a padded short code to test that all implementations now conform to the updated spec and handle padded short codes appropriately.

I think the easiest and probably best option is to just explicitly disallow padded short codes to be consistent with disallowing shortening of padded full codes. This would also indirectly resolve these inconsistencies assuming that short code validation is fixed in each implementation.

The text was updated successfully, but these errors were encountered:

JonMcPherson · 2019-03-27T01:22:40Z

ping @drinckes

drinckes · 2019-03-27T13:29:38Z

Thanks for this, I'm going to take a look at it in the next couple of days.

JonMcPherson · 2019-03-30T03:30:25Z

This PR ^ resolves this issue

JonMcPherson · 2019-04-10T00:47:49Z

@drinckes @zongweil Is this project still being maintained?

zongweil · 2019-04-10T04:11:56Z

Hey, yes, sorry for the delay Jon. I'm going to let @drinckes review this, as it involves changing the spec.

drinckes · 2019-04-10T07:51:29Z

Yes, my fault. I'm going to send this today to the public mailing list as well as some other interested parties to give them a chance to provide feedback.

Anything we do will require a modification to the spec, even if it's just clarification, so what I propose is that I summarise the questions here, we get some feedback, and then proceed from there to a consensual decision.

Question: What should the spec say about:

Should shortening of codes with only eight digits be allowed? E.g. 8FVC9G8F+
Should shortening of codes with six digits be allowed? E.g., 8FVC9G00+
Should shortening of codes with two or four digits be allowed? E.g., 8FVC0000+

Recommendation

I recommend that the answer to all of these is yes, with the requirement that the resulting code always includes a minimum of two significant digits.

So for example, shortening 8FVC0000+ with relation to Zurich would return VC0000+. Shortening 8F000000+ with relation to Zurich will return 8F000000+ (because it must have two digits).

This basically says that you can shorten (or attempt to shorten) any code with relation to a location, and you the recover method will return the original code. The number of digits doesn't make this either work/not work.

Comments please! If I receive comments from interested parties without github accounts I'll post them here under my name.

bocops · 2019-04-10T08:49:51Z

I think a suggestion could be added to the spec that codes SHOULD NOT be padded and shortened at the same time: what's the use case for "VC0000+ Zurich", if that code describes an area much bigger than Zurich itself (~10% the size of whole Switzerland) and if the code isn't even shorter than the full code "8FVC0000+"?

That said, I don't see any reason for artificially removing the ability to recover a short code with padding while that is strictly possible.

It should just be ensured that padding characters are kept intact while recovering a full code. @JonMcPherson mentioned that the Java implementation recovers short code "2300+" to "????23GG+". Inflating a code's precision like that seems like a bug to me.

This bug is currently shared by maps.google.com, by the way. Searching for "8FVC0000+" on Maps leads to a map pin at "GG22+22, Niederglatt, Switzerland".

JonMcPherson · 2019-04-11T05:47:18Z

Yea after giving it more thought, I think that the better option is to not restrict and to explicitly allow padded codes (with 2, 4, or 6 digits) to be shortened and recovered. I'm all for avoiding unnecessary exceptional cases and having one less precondition or thrown exception.
Basically since the resolution of the code can be maintained when shortening and recovering, then a code of any resolution should be allowed to be shortened and recovered.

However, I think if the API spec for shorten() is going to modified, we should consider addressing this other issue which I think is valid and relevant to this issue.

The API spec states:

If the code cannot be shortened, the original full code should be returned.

This makes the return type ambiguous as it can be either a short or full code. Because of this ambiguity, the caller has to know to check if the returned code isShort() or isFull() (or is equal to the original full code).
I think that any ambiguity in whether a code is short or full should be avoided because a short code cannot be decoded and can only be recovered. In other words, short codes cannot be used unless they are recovered, and so they should be dealt with and even represented separately from full codes.

This is why in the v2.0 release of my C# implementation, I refactored the class structure so that OpenLocationCode accepts only full codes, and a separate OpenLocationCode.ShortCode class was added which accepts only short codes. This approach avoids any ambiguity and the need for throwing InvalidOperationException such as when calling decode() on an OpenLocationCode that is a short code.

Additionally, there are inconsistencies in how the different implementations handle this. The Java implementation does not follow the spec and actually throws an IllegalArgumentException stating that the reference point is too far from the center of the code area when the code couldn't be shortened. Whereas most other implementations follow the spec and just return the original full code when it couldn't be shortened.

I think that if shorten() couldn't shorten the code, then it should communicate that to the caller in some way. Whether that be by throwing an exception or just simply returning null, I think that is better than returning a full code from shorten() making it ambiguous and even confusing. Maybe language features such as Optional types can be used for some of the implementations to better represent that the shorten() method will attempt to shorten the full code and thus a returned short code is optional.

WilliamDenniss · 2019-04-19T16:24:15Z

Should shortening of codes with only eight digits be allowed? E.g. 8FVC9G8F+
Should shortening of codes with six digits be allowed? E.g., 8FVC9G00+
Should shortening of codes with two or four digits be allowed? E.g., 8FVC0000+

In my opinion, the answer is yes to all three, and without restriction (no minimum significant digits requirement).

For the reason that they are still meaningful, even if they may not always be considered particularly useful. E.g. 0000+, San Francisco can be recovered to a valid plus code, so why restrict it unnecessarily? I think in general that the spec should focus on the algorithm (objectively: can this code be meaningfully resolved) and avoid making usability judgments (subjectively: do we think people should actually use this form). There may be future use-cases we have not thought of, and restricting them unnecessarily doesn't add much value to me.

Case in point: I really like that the spec today doesn't set short codes to be say "exactly 6 digits" (e.g. RG9C+VH). While this is the common size of shortened code we see (as it maps nicely to greater city areas), other use-cases for even shorter codes, say at a neighborhood level (e.g. 9C+VH), are still viable. This flexibility allows people to innovate on top of plus codes, taking advantage of their useful inherent properties.

I also prefer this approach as it keeps the spec simpler and with less edge cases.

drinckes · 2019-04-21T09:50:46Z

@WilliamDenniss

If the proposal is that it is acceptable to remove all significant digits from these codes, then is it also acceptable to remove all significant digits from 8FVC9G8F+6W, or the first 10 digits from 8FVC9G8F+6WG?

If yes, then we will need to decide how to represent these currently invalid strings (00000000+00G or just +00G).

If no, then shortening padded codes is introducing a new edge case anyway.

Shortening the codes is done to make them easier for people to use in the context of addressing, by reducing the number of digits someone has to remember.

I think we risk defining a mathematically elegant but not practically useful solution here.

WilliamDenniss · 2019-04-30T16:34:25Z

My approach would be similar to your comment on the uneven length codes: "be conservative in what you do, be liberal in what you accept from others", which I would apply in a way that "do" are the codes we generate in the API, and "accept" are codes that are considered valid according to the spec (which is possibly not the original intent of the phrase).

I don't honestly see the harm in accepting as valid a code that can be meaningfully resolved. As a straw-person, codes like +00G could even have some application for finding locations at a very small scale like rooms in a house, or smaller (though I'm not advocating for this use).

I totally agree that the API should not by default generate codes that are hard to use, I think this kind of specialty use would be left for experts to determine.

The API design for short codes already exhibit this principle. shorten will return short codes with an included safety margin, with usability in mind. But recoverNearest will just resolve the code to the nearest. This guides users to make the right decision but doesn't stop an expert user say further truncating short codes themselves when they are confident in their own application (i.e. if I'm generating a POI map of SF, I can by-pass shorten completely and just truncate by a fixed amount).

Evilc06 · 2020-07-23T10:12:51Z

@JonMcPherson Hi, this question has nothing to do with this issue. I was trying to find a way to contact you to ask you regarding the parse live query for .NET and Unity. I downloaded your repo and placed them within my Assets folder but I tried to create a parse live query client and connecting it to the URI of the parse server (using the API Back4App) but when testing with a test callback I tried to log that I have entered the callback but it did not work. So I wanted to ask you, is there maybe something missing within the code and it is not a finished project yet or could I possibly be doing something wrong?

fulldecent · 2021-04-25T04:50:36Z

Recommended tag: specification

fulldecent · 2021-06-04T18:54:51Z

This issue, and #462, is fixed by #463

It is specified:

Plus Codes with code length less than 8 SHALL NOT be represented as short codes.
If both distances are less than 10 degrees, digit places 1–2 MAY be omitted.
If both distances are less than 0.5 degrees, digit places 1–2 or 1–4 MAY be omitted.
If both distances are less than 1/40 degrees, digit places 1–2, 1–4 or 1–6 MAY be omitted.

This incorporates existing specifications at https://github.com/google/open-location-code/blob/main/docs/specification.md#short-codes.

JonMcPherson · 2021-06-04T19:28:18Z

Awesome. I'll update the C# implementation after the Java implementation is updated to use as a reference for consistency.

drinckes · 2024-06-14T09:23:17Z

From the responses above and the response in the mailing list:

Should shortening of codes with only eight digits be allowed? E.g. 8FVC9G8F+
Should shortening of codes with six digits be allowed? E.g., 8FVC9G00+
Should shortening of codes with two or four digits be allowed? E.g., 8FVC0000+

Are all a yes.

I personally don't think we want to entertain codes with padding after the plus though e.g. +00G, so will need to make sure the spec doesn't define that or add a "please don't" type of note.

@JonMcPherson comment re shorten() method return values is a good one. I think the spec should say that it should return either a short code or a language appropriate response such as an exception (C++, Java), error (Golang) or null.

We also need to address the bug in the recovery where some implementations are inflating the precision.

That's a lot for a single issue, and it's drifted a bit from the earlier comments, so I've created separate issues for the spec and shorten() changes:

Thanks everyone for your patience and contributions! Let me know if I missed anything.

JonMcPherson mentioned this issue Mar 30, 2019

Fix isValid() to invalidate short codes with padding (issues/273) #274

Merged

JonMcPherson mentioned this issue Apr 10, 2019

Release Version 2.0 JonMcPherson/open-location-code#7

Merged

drinckes mentioned this issue Apr 21, 2019

Migrate specification from description document #285

Merged

fulldecent mentioned this issue Jun 4, 2021

Update specification #463

Closed

9 tasks

drinckes added js documentation go java python dart ruby visualbasic c++ rust c labels Jun 7, 2024

drinckes added the js/closure label Jun 7, 2024

This was referenced Jun 14, 2024

Clarify what codes can be shortened in the spec #586

Open

Short code recovery is inflating precision #587

Open

drinckes closed this as completed Jun 14, 2024

fulldecent added this to Community service Jul 19, 2024

fulldecent moved this to Done in Community service Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Short codes with padding are unintentionally considered valid #273

Short codes with padding are unintentionally considered valid #273

JonMcPherson commented Mar 24, 2019 •

edited

Loading

JonMcPherson commented Mar 27, 2019

drinckes commented Mar 27, 2019

JonMcPherson commented Mar 30, 2019

JonMcPherson commented Apr 10, 2019

zongweil commented Apr 10, 2019

drinckes commented Apr 10, 2019

bocops commented Apr 10, 2019

JonMcPherson commented Apr 11, 2019

WilliamDenniss commented Apr 19, 2019

drinckes commented Apr 21, 2019

WilliamDenniss commented Apr 30, 2019 •

edited

Loading

Evilc06 commented Jul 23, 2020

fulldecent commented Apr 25, 2021

fulldecent commented Jun 4, 2021

JonMcPherson commented Jun 4, 2021

drinckes commented Jun 14, 2024

Short codes with padding are unintentionally considered valid #273

Short codes with padding are unintentionally considered valid #273

Comments

JonMcPherson commented Mar 24, 2019 • edited Loading

JonMcPherson commented Mar 27, 2019

drinckes commented Mar 27, 2019

JonMcPherson commented Mar 30, 2019

JonMcPherson commented Apr 10, 2019

zongweil commented Apr 10, 2019

drinckes commented Apr 10, 2019

Question: What should the spec say about:

Recommendation

bocops commented Apr 10, 2019

JonMcPherson commented Apr 11, 2019

WilliamDenniss commented Apr 19, 2019

drinckes commented Apr 21, 2019

WilliamDenniss commented Apr 30, 2019 • edited Loading

Evilc06 commented Jul 23, 2020

fulldecent commented Apr 25, 2021

fulldecent commented Jun 4, 2021

JonMcPherson commented Jun 4, 2021

drinckes commented Jun 14, 2024

JonMcPherson commented Mar 24, 2019 •

edited

Loading

WilliamDenniss commented Apr 30, 2019 •

edited

Loading