Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify major roads feature #230

Merged
merged 11 commits into from
Nov 16, 2023
Merged

Conversation

wagnerlmichael
Copy link
Member

@wagnerlmichael wagnerlmichael commented Nov 14, 2023

The main task of this PR was originally to use st_simplify to reduce the complexity of the linestrings representing major roads.

It has since expanded in scope to accomplish the following:

  • Add historical data back to 2014
  • Implement an additive approach. Starting from 2014, each year's data builds upon the previous year, with new major road additions being added to the existing dataset
  • Previously the data was written with one script directly to our warehouse bucket. We have added an intermediary script that writes the raw data to our unprocessed bucket, and then we utilize the existing script to perform the simplification, which writes the data to our warehouse bucket

Closes #219

@wagnerlmichael wagnerlmichael linked an issue Nov 14, 2023 that may be closed by this pull request
@wagnerlmichael wagnerlmichael changed the title 219 simplify major roads feature Simplify major roads feature Nov 14, 2023
@wagnerlmichael wagnerlmichael marked this pull request as ready for review November 14, 2023 17:36
@wagnerlmichael wagnerlmichael requested a review from a team as a code owner November 14, 2023 17:36
Copy link
Member

@dfsnow dfsnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wagnerlmichael This looks good, but let's wait for https://github.com/ccao-data/data-architecture/actions/runs/6867260306 to finish before merging.

Also, can you quickly check the difference between the roads in each year? I see there's a large difference in the raw Parquet file size between 2014 and 2023. I recommend making a small multiples map showing the roads and posting it in the PR comments.

@wagnerlmichael
Copy link
Member Author

wagnerlmichael commented Nov 14, 2023

This is interesting. In the past it seems like there were a lot more roads classified as major roads in the downtown area, and it seems to cut off the roads earlier in the newer 2023 data. Will need to click on the figure and make full screen to get a good view.

2014_2017_2023_compare_separate

@dfsnow
Copy link
Member

dfsnow commented Nov 14, 2023

This is interesting. In the past it seems like there were a lot more roads classified as major roads in the downtown area, and it seems to cut off the roads earlier in the newer 2023 data. Will need to click on the figure and make full screen to get a good view.

@wagnerlmichael Ugh, this is what I was worried about. Clearly someone retagged a bunch of roads at some point, so the road definitions aren't going to be consistent historically. Let's do this:

  • Find the earliest year where the tags are mostly consistent with the current (2023) tags
  • Change the script such that we're only gathering data from that year on
  • Re-run the CTAS again

Forward filling here will be handled by the CTAS, and backward filling should be handled by the model views, where appropriate (so you don't have to do anything to handle filling).

@wagnerlmichael
Copy link
Member Author

wagnerlmichael commented Nov 15, 2023

We ended up taking the 2023 data from the January 1st historical data. This is consistent with our major roads ("motorway", "trunk", "primary") all the way back to 2014. Next year in 2024 we will need to handle the OSM highway change.

This is the most recent build time. It is significantly faster than the prior run time without updated simplified linestrings.

Is there anything I need to do once this PR is merged to update the CTE for prod? I'm assuming the buld-and-test-dbt doesn't make prod changes.

@wagnerlmichael wagnerlmichael marked this pull request as draft November 15, 2023 22:13
@wagnerlmichael
Copy link
Member Author

Due to the major roads changing from year to year, we have decided to go with an additive approach. Starting from 2014, each year's data builds upon the previous year, with new major road additions being added to the existing dataset.

@wagnerlmichael wagnerlmichael marked this pull request as ready for review November 16, 2023 17:28
Copy link
Member

@dfsnow dfsnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @wagnerlmichael. Thanks for the iterations on this.

@wagnerlmichael wagnerlmichael merged commit 2027c68 into master Nov 16, 2023
7 checks passed
@wagnerlmichael wagnerlmichael deleted the 219-simplify-major-roads-feature branch November 16, 2023 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplify major roads feature
2 participants