Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KeepRight] Use regex group capture to extract error details #5275

Merged
merged 12 commits into from
Sep 5, 2018
Merged

[KeepRight] Use regex group capture to extract error details #5275

merged 12 commits into from
Sep 5, 2018

Conversation

kymckay
Copy link
Collaborator

@kymckay kymckay commented Aug 31, 2018

As discussed here, this PR improves the extraction of details from error descriptions by directly using regex group capture instead of splitting the strings and using a lot of logic to figure out what's what (effectively re-implementing the same group capture behaviour).

Advantages of this approach:

  • Can precisely extract what we want, no relying on space characters to split the string nicely.
    • No need to handle edge cases in the code like this:
      // handle special cases
      // error _170
      if (errorType === '_170') { return { var1: entity.description }; }
      // error _220
      if (errorType === '_220') { _220 = true; }
      if (errorType === '_401') { _401 = true; }
  • Less looping and calls to regex .test()/.match() methods.
  • We're being more explicit about what we expect in the error strings (aka \d+ shows that we expect a number whereas {$1} didn't). This is a minor maintainability improvement.
  • Can easily handle errors that have multiple possible string templates (see types 312 and 282).

Side effect of this approach:

  • I've changed the way that IDs are identified. The code to identify IDs and their type (node/way/relation) was a bit clumsy (and wouldn't correctly handle some cases) especially with some inconsistencies in the error messages it had to account for. It became a bit more so once the code was no longer looping over every word. So instead I've added an extra optional "IDs" key to the errorSchema.json which provides an array of strings where the indices correspond to the regex groups (see here). This let's us explicitly tell the code which groups capture an ID and what type of ID they are ("n"/"w"/"r").
    • It seems there are a couple of cases where strings can have an arbitrary list of node/way/relation IDs depending on the geometry involved in the error. For getting the IDs in these cases I'm capturing the whole list as a group and parsing those with unique code (see error types 211, 231 and 294).
  • To make life easier I've added a "regex" key to the errorSchema.json so that errors where the message is fixed aren't parsed for details at all (saves us having to escape any special regex characters and prevents needlessly matching a fixed string for many errors). However, it might actually make more sense to just remove these messages all together from the file since we don't need that data for anything.

This focuses on converting error types I found on the map for testing
(can confirm the code is working!) as well as the more problematic cases
from the old code to show that this approach can handle them easily.
This adds a flag to the error schema to explicitly say whether to parse
the description with regex or not. Prevents us from having to escape
special characters in fixed strings and is a minor optimisation.
These aren't shown in the layer currently, but for future use this
should work
@kymckay
Copy link
Collaborator Author

kymckay commented Sep 3, 2018

@thomas-hervey

I consider this PR now feature complete 😃

All error types now have details extracted via regex and as a bonus I've handled cases where arbitrary lists of IDs could be present so that they all become links.

Locations with some of the more complicated errors:

Some of the output messages on our end (in core.yaml) could use improvement for clarity and consistency (quoting keys and values for example), but I'll leave that for another PR.

@thomas-hervey thomas-hervey self-assigned this Sep 5, 2018
@thomas-hervey
Copy link
Collaborator

Thank you very much @SilentSpike for the great refactor. I've reviewed these changes and I am fine merging them so long as there aren't parsing conflicts between the templates and the KeepRight.at server responses.

@thomas-hervey thomas-hervey merged commit ee35037 into openstreetmap:keep-right_QA Sep 5, 2018
@kymckay kymckay deleted the keep-right-regex branch September 5, 2018 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants