Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection Statistics Page/Reviewer Dashboard Performance (Outage) #3786

Closed
benwbrum opened this issue Sep 14, 2023 · 1 comment · Fixed by #3787
Closed

Collection Statistics Page/Reviewer Dashboard Performance (Outage) #3786

benwbrum opened this issue Sep 14, 2023 · 1 comment · Fixed by #3787

Comments

@benwbrum
Copy link
Owner

We had another outage this morning from the collection statistics page or the reviewer dashboard. Here are the slow-running queries:

Reviewer dashboard

Pages of certain status (1.28 minutes)

SELECT COUNT(*) FROM `pages` INNER JOIN `works` ON `pages`.`work_id` = `works`.`id` WHERE `works`.`collection_id` = ? AND `pages`.`status` IN (?, ?, ?, ?, ?)

Suggestion

Query the work_statistics table instead of pages for this information

Collection statistics

Pages of status (69 s)

SELECT COUNT(*) FROM `pages` INNER JOIN `works` ON `pages`.`work_id` = `works`.`id` WHERE `works`.`collection_id` = ? AND (`pages`.`status` = ? OR `pages`.`status` IS ?)

from

/home/fromthepage/deployment/releases/20230912180620/app/models/collection_statistic.rb:67:in `get_stats_hash'
/home/fromthepage/deployment/releases/20230912180620/app/controllers/statistics_controller.rb:7:in `collection'

Suggestion

Use work statistics instead

Pages within date

SELECT COUNT(*) FROM `pages` INNER JOIN `works` ON `pages`.`work_id` = `works`.`id` WHERE `works`.`collection_id` = ? AND (`pages`.`status` = ? OR `pages`.`status` IS ?) AND (pages.edit_started_at >= ?)

Suggestion

Add edit_started_at to the index index_pages_on_status_and_work_id

Pages of status x

SELECT COUNT(*) FROM `pages` INNER JOIN `works` ON `pages`.`work_id` = `works`.`id` WHERE `works`.`collection_id` = ? AND `pages`.`status` = ?

from

/home/fromthepage/deployment/releases/20230912180620/app/models/collection_statistic.rb:68:in `get_stats_hash'

Suggestion

Use work_statistics instead.

Transcribe help screen (58s)

SELECT COUNT(*) FROM `pages` WHERE `pages`.`work_id` = ?

from

/home/fromthepage/deployment/releases/20230912180620/app/views/shared/_page_tabs.html.slim:81:in `_app_views_shared__page_tabs_html_slim__2802543535057409202_119960'
/home/fromthepage/deployment/releases/20230912180620/app/views/transcribe/help.html.slim:1:in `_app_views_transcribe_help_html_slim___4384188942254563659_158520'

at
Thu Sep 14 2023 07:37:48:727 GMT-0500 (Central Daylight Time)

Suggestion

Review logs to find actual parameter; attempt to reproduce.

Suggestion

There is no reason this should take a long time. Find the work parameter in the logs and duplicate.

@saracarl
Copy link
Collaborator

saracarl commented Sep 14, 2023

Ok, I found 3 transcribe help page calls in the same second:

I, [2023-09-14T12:37:48.809958 #30663]  INFO -- : Started GET "/aucklandmuseum/transcription-projects/memoirs-on-the-extinct-wingless-birds/help/32509374" for 110.249.201.197 at 2023-09-14 12:37:48 +0000
I, [2023-09-14T12:37:48.811708 #30663]  INFO -- : Processing by TranscribeController#help as HTML
I, [2023-09-14T12:37:48.811918 #30663]  INFO -- :   Parameters: {"user_slug"=>"aucklandmuseum", "collection_id"=>"transcription-projects", "work_id"=>"memoirs-on-the-extinct-wingless-birds", "page_id"=>"32509374"}
I, [2023-09-14T12:37:48.815789 #30777]  INFO -- : Started GET "/indianaarchives/indiana-wwi-service-record-cards/in-wwi-service-record-cards--army-and-marine-loh-lot/help/323366" for 194.247.173.99 at 2023-09-14 12:37:48 +0000
I, [2023-09-14T12:37:48.836952 #30777]  INFO -- : Oink Action: transcribe#help
I, [2023-09-14T12:37:48.837355 #30777]  INFO -- : Memory usage: 4734104 | PID: 30777
I, [2023-09-14T12:37:48.837497 #30777]  INFO -- : Instantiation Breakdown: Total: 7 | Collection: 3 | Work: 2 | Page: 1 | User: 1
I, [2023-09-14T12:37:48.857256 #30754]  INFO -- : Started GET "/fdp/correspondence-volume-3-1866-1880/1813---julia-g-crofts-to-fd-/help/1744356" for 111.225.148.105 at 2023-09-14 12:37:48 +0000
I, [2023-09-14T12:37:48.858894 #30754]  INFO -- : Processing by TranscribeController#help as HTML

The IPs for these seem to be crawlers:

110.249.201.197 bytespider-110-249-201-197.crawl.bytedance.com
194.247.173.99 no DNS found
111.225.148.105 bytespider-111-225-148-105.crawl.bytedance.com`

I tried the URLs (both logged in and not logged in) and they loaded very quickly. I think these requests were just backed up and don't need any further action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants