Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve N+1 queries in pub status report and index #864

Merged
merged 1 commit into from
Jan 14, 2022

Conversation

JPrevost
Copy link
Member

@JPrevost JPrevost commented Jan 14, 2022

Why are these changes being introduced:

  • There were multiple N+1 queries that when combined with production
    data led to Heroku timeouts

Relevant ticket(s):

How does this address that need:

  • eager load the relevant data to eliminate the N+1 problems

Developer

  • All new ENV is documented in README
  • All new ENV has been added to Heroku Pipeline, Staging and Prod
  • ANDI or Wave has been run in accordance to
    our guide and
    all issues introduced by these changes have been resolved or opened as new
    issues (link to those issues in the Pull Request details above)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer

  • The commit message is clear and follows our guidelines
    (not just this pull request message)
  • There are appropriate tests covering any new functionality
  • The documentation has been updated or is unnecessary
  • The changes have been verified
  • New dependencies are appropriate or there were no changes

Requires database migrations?

NO

Includes new or updated dependencies?

YES

@JPrevost JPrevost requested a review from jazairi January 14, 2022 16:23
@mitlib mitlib temporarily deployed to thesis-submit-pr-864 January 14, 2022 16:23 Inactive
@coveralls
Copy link

coveralls commented Jan 14, 2022

Coverage Status

Coverage remained the same at 96.611% when pulling a3cb95d on publication_status_report into b97f5c6 on main.

@@ -43,7 +43,7 @@ def publication_statuses
@terms = defined_terms Thesis.all
@publication_statuses = Thesis.all.pluck(:publication_status).uniq.sort
# Filter relevant theses by selected term from querystring
term_filtered = filter_theses_by_term Thesis.all
term_filtered = filter_theses_by_term Thesis.all.includes(:degrees).includes(:departments).includes(:users)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a difference between this and .includes(:degrees, departments, users)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea! I can try it that way if it reads cleaner to you

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That definitely results in the same thing, so I'll switch to your suggested syntax change as it's likely easier to read the intent when we come back to this in the future. 👍🏻

@@ -223,7 +223,7 @@ def data_student_contributions
row_data = {}
terms = Thesis.all.pluck(:grad_date).uniq.sort
terms.each do |term|
row_data[term] = Thesis.where('grad_date = ?', term).map(&:student_contributed?).count(true)
row_data[term] = Thesis.where('grad_date = ?', term).includes(:versions).map(&:student_contributed?).count(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goal here to reduce the number of queries by loading only theses with papertrail versions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. This eager loads the versions so the views don't get into an N+1 situation. Theses without versions should still load I believe (although I'm not sure if that's even a thing)

Copy link
Contributor

@jazairi jazairi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, TIL .includes. This appears to solve the problem, based on some limited local testing. I left a couple of non-blocking questions so I can learn more about this method (happy to discuss in Slack if that's easier).

Why are these changes being introduced:

* There were multiple N+1 queries that when combined with production
  data led to Heroku timeouts

Relevant ticket(s):

* https://mitlibraries.atlassian.net/browse/ETD-550
* https://mitlibraries.atlassian.net/browse/ETD-551

How does this address that need:

* eager load the relevant data to eliminate the N+1 problems
@JPrevost JPrevost force-pushed the publication_status_report branch from 2483f3b to a3cb95d Compare January 14, 2022 16:52
@JPrevost JPrevost temporarily deployed to thesis-submit-pr-864 January 14, 2022 16:52 Inactive
@JPrevost JPrevost merged commit d8b507e into main Jan 14, 2022
@JPrevost JPrevost deleted the publication_status_report branch January 14, 2022 16:54
@jazairi jazairi self-assigned this Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants