Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use route APIs instead of web scraping directions links #1

Open
sgoodm opened this issue Aug 2, 2021 · 0 comments
Open

Use route APIs instead of web scraping directions links #1

sgoodm opened this issue Aug 2, 2021 · 0 comments
Labels
code related to code for processing data enhancement New feature or request

Comments

@sgoodm
Copy link
Member

sgoodm commented Aug 2, 2021

Current approach:

We currently use a headless browser (implemented with Selenium and Firefox) to load the Javascript web map for OSM directions links. This allows us to extract the SVG path from the web map and utilize the start/end coordinates for the route to georeference the SVG path and create a GeoJSON.

Issue:

The basic implementation of Selenium does not appear to be threadsafe and can produce a range of errors when parallelized. Extracting all SVG path data in serial is by far the longest running portion of the build (takes several hours, whereas the remaining parallelized portion can take under an hour).

Possible solution:

The SVG data for routes produced by the directions links are generating using either OSRM or Grasshoper routing services, through their APIs. See an example OSM link for directions and corresponding API call for each service below

OSRM

Grasshopper

Implementation discussion:

  1. OSRM has the option to return a GeoJSON of the route directly, but Grasshopper will only return a polyline that must be decoded.

  2. We would need to explore further to determine if paid API keys are required or if the ones used for OSM are considered acceptable for public use. (Note: if they are not and I need to remove the keys from the above links please let me know). Since they are not hidden from publicly-viewable queries by OSM, I am assuming they are public use.

  3. Would the amount/rate of API calls be an issue (currently on order of a few hundred per build)?

  4. How fast are API results? Giving parallelization issue with SVG-based implementation, I am guessing this will be a net improvement even if the API is not very fast.

@sgoodm sgoodm added enhancement New feature or request code related to code for processing data labels Aug 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code related to code for processing data enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant