Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor monthly ntd ridership portfolio site #1190

Merged
merged 40 commits into from
Aug 1, 2024
Merged

refactor monthly ntd ridership portfolio site #1190

merged 40 commits into from
Aug 1, 2024

Conversation

csuyat-dot
Copy link
Contributor

@csuyat-dot csuyat-dot commented Jul 29, 2024

With the dim_monthly_ntd_ridership_with_adjustment table in the warehouse, there is no longer a need to ping the NTD website and extract the wide data from their excel document. Functions were refactored to accept the new, long version of the data in the warehouse.

With the data now in a long format, a lot of initial functions are no longer needed and thus removed. Refactored functions produce identical charts with identical values.

One major difference in change in UPT from prior year by... charts. We now have change in UPT data going back to 2018, when previously we were only able to calculate/display change in UPT data from 2024 due to the wide format of the data.

before:
image

after:
image

It is possible to add a "2024" filter to mimic the initial charts.

Other changes, per #990:

  • data outputs were changed to .xlsx files.
  • Each file includes a new READ ME tab that defines the column names, Modes and TOS types.
  • columns were renamed to be spelling out in full and un-snakecased

Successfully ran all these changes locally via make ntd_monthly_export, saw notebooks parameterized, see data outputs in both public/private GCS, and see updated charts published online.

@csuyat-dot csuyat-dot marked this pull request as ready for review July 29, 2024 23:06
@tiffanychu90
Copy link
Member

@csuyat-dot: for the merge conflicts above...is it caused by not running this line?

  • While you're testing the portfolio, if there are parameterized notebooks already checked in, you will need to run that line. If there aren't any files checked in, running that line will get you an error.
  • If you're testing a lot and checking in the notebooks with a git add, you will need to run the git rm. If you aren't adding them and simply testing locally, you just need the --clean and not the git rm

@tiffanychu90 tiffanychu90 marked this pull request as draft August 1, 2024 22:55
@tiffanychu90 tiffanychu90 marked this pull request as ready for review August 1, 2024 22:55
…to see what needs to change to accomdate this new table in the existing scripts
…ode, and tos. then subtract the upt of the previous year/month but same ntd id, mode, and tos
…g to temp GCS folder. about to move v2 functions to TEST scripts and reports
…om new functions to minimize function name changes in the rest of the portfolio. also was able to filter incoming data to charts to match current portfolio
…m_by_group to get % change col and then testing charts with % change data instead of change in upt.
…2024-05 data. still start to move refactored fucntions over to prod scripts
@csuyat-dot csuyat-dot merged commit 2d9439d into main Aug 1, 2024
2 checks passed
@csuyat-dot csuyat-dot deleted the cs_explore branch August 1, 2024 23:41
@csuyat-dot
Copy link
Contributor Author

Initially, this PR had merge conflicts with the checked in parameterized notebooks. Conflicts possibly stem from: parameterizing notebooks multiple times, blindly checking in files not normally checked in with the build_portfolio_site commands, not following the commands in the build_portfolio_site verbatim, or checking in the parameterized notebooks multiple times. Any of which can cause the branch be out of sync with main. Concluded the best strategy to resolve the conflicts were to: remove the conflicting notebooks from git and local, redeploy the notebooks, add new notebooks to git.

Steps taken to resolve merge conflicts include:

  1. git rebase against main, insisting that portfolio/ntd_monthly_ridership be deleted.
  2. python portfolio/portfolio.py clean ntd_monthly_ridership, to remove local portfolio/ntd_monthly_ridership sub-directory.
  3. python portfolio/portfolio.py build ntd_monthly_ridership --deploy, to remake portfolio/ntd_monthly_ridership sub-directory and redeploy notebooks.
  4. git add portfolio/ntd_monthly_ridership/*.yml portfolio/ntd_monthly_ridership/*.md, checking inspecific .yml and .md files
  5. commit/push, ensured PR passed all check.
  6. git add portfolio/ntd_monthly_ridership/*.ipynb, to check in new notebooks.
  7. commit/push, ensured PR still passed all checks.
  8. added the remaining files.

Note, an alternate PR #1191 was created in case this PR was not able to be resolved. Alternate PR was able to check in specific files and deploy notebooks with no conflicts. This confirms that the scripts and notebooks work, but that something between git, local, and main may be the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants