Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4117 - Clean up ahoy records after roll up #4168

Merged
merged 2 commits into from
Jul 16, 2024

Conversation

WillNigel23
Copy link
Collaborator

No description provided.

@WillNigel23 WillNigel23 linked an issue Jun 19, 2024 that may be closed by this pull request
'static#landing_page',
'transcribe#display_page',
'transcribe#save_transcription'
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved some relevant constants here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

exclude_actions = AhoyActivitySummary::WEEKLY_TRIAL_COHORT_TARGET_ACTIONS +
AhoyActivitySummary::WEEKLY_TRANSCRIBER_COHORT_TARGET_ACTIONS

Ahoy::Event.where('date < ?', keep_after_date).where.not(name: exclude_actions).destroy_all
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you let me know if this looks right to you @benwbrum ?

In ticket it was mentioned we delete Ahoy::Events and Visits, but it seems there is no 'action' field for Visits. Do I delete only based on started_at for Visits then?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me.

On events versus visits, one Visit may have many Ahoy::Event child records. Generally speaking, a Visit corresponds to a user session, while an Event corresponds to a click or HTTP request.

You can see this in action if you go to the Admin dashboard and click on the Users tab, then click Visits for a user, and drill into a few Visits.

I'm trying to think about the ways we might clean up Visit records. Since one is created for each session--including anonymous sessions or bots/spiders--we do want to clean these as well. We certainly want to delete Visit records that have no child Ahoy::Event records after the Event clean-up.

Any ideas on how to do this in a performant manner?

@@ -2,7 +2,7 @@ namespace :fromthepage do
desc "weekly transcriber cohort"
task :weekly_transcriber_cohort => :environment do
# generate a csv file of users who signed up in the last week and write it out to a temporary file
TRANSCRIBER_TARGET_ACTIONS = ['static#landing_page', 'registrations#new', 'registrations#create', 'transcribe#display_page', 'transcribe#save_transcription']
target_actions = AhoyActivitySummary::WEEKLY_TRANSCRIBER_COHORT_TARGET_ACTIONS
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved these to constants

@WillNigel23 WillNigel23 requested review from benwbrum and saracarl June 19, 2024 21:59
AhoyActivitySummary::WEEKLY_TRANSCRIBER_COHORT_TARGET_ACTIONS

Ahoy::Event.where('date < ?', keep_after_date).where.not(name: exclude_actions).destroy_all
Visit.left_joins(:ahoy_events).where(ahoy_events: { id: nil }).destroy_all
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benwbrum this should delete all Visits without ahoy_events. Let me know if this is good for you

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a constraint on visit deletion to only delete visits older than 2 weeks (or whatever the constant is). We have used the Visit data to track down problems with scrapers and bots.

@WillNigel23 WillNigel23 force-pushed the 4117-clean-up-ahoy-records-after-roll-up branch 4 times, most recently from 5f2a0be to 6f3e6c2 Compare June 24, 2024 17:28
@benwbrum
Copy link
Owner

I've tested this and run into an exception:

benwbrum@sparckjones:~/dev/products/fromthepage/fromthepage$ rake fromthepage:summarize_ahoy_activity_for_last_n_days[30]
/home/benwbrum/.rvm/rubies/ruby-2.7.3/lib/ruby/2.7.0/net/protocol.rb:66: warning: already initialized constant Net::ProtocRetryError
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/net-protocol-0.2.2/lib/net/protocol.rb:68: warning: previous definition of ProtocRetryError was here
/home/benwbrum/.rvm/rubies/ruby-2.7.3/lib/ruby/2.7.0/net/protocol.rb:206: warning: already initialized constant Net::BufferedIO::BUFSIZE
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/net-protocol-0.2.2/lib/net/protocol.rb:214: warning: previous definition of BUFSIZE was here
/home/benwbrum/.rvm/rubies/ruby-2.7.3/lib/ruby/2.7.0/net/protocol.rb:503: warning: already initialized constant Net::NetPrivate::Socket
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/net-protocol-0.2.2/lib/net/protocol.rb:541: warning: previous definition of Socket was here
warning: parser/current is loading parser/ruby27, which recognizes 2.7.8-compliant syntax, but you are running 2.7.3.
Please see https://github.com/whitequark/parser#compatibility-with-ruby-mri.

-- Ahoy Rollup 2024-06-26 --
rake aborted!
ActiveRecord::StatementInvalid: Mysql2::Error: Unknown column 'date' in 'where clause'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/mysql2-0.5.6/lib/mysql2/client.rb:151:in `_query'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/mysql2-0.5.6/lib/mysql2/client.rb:151:in `block in query'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/mysql2-0.5.6/lib/mysql2/client.rb:150:in `handle_interrupt'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/mysql2-0.5.6/lib/mysql2/client.rb:150:in `query'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:206:in `block (2 levels) in execute'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/dependencies/interlock.rb:48:in `block in permit_concurrent_loads'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/concurrency/share_lock.rb:187:in `yield_shares'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/dependencies/interlock.rb:47:in `permit_concurrent_loads'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:205:in `block in execute'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract_adapter.rb:696:in `block (2 levels) in log'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/concurrency/load_interlock_aware_monitor.rb:26:in `block (2 levels) in synchronize'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract_adapter.rb:695:in `block in log'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/notifications/instrumenter.rb:24:in `instrument'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract_adapter.rb:687:in `log'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:204:in `execute'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/mysql/database_statements.rb:52:in `execute'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:215:in `execute_and_free'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/mysql/database_statements.rb:57:in `exec_query'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract/database_statements.rb:532:in `select'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract/database_statements.rb:69:in `select_all'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract/query_cache.rb:103:in `select_all'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/mysql/database_statements.rb:12:in `select_all'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/querying.rb:47:in `find_by_sql'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/relation.rb:843:in `block in exec_queries'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/relation.rb:861:in `skip_query_cache_if_necessary'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/relation.rb:828:in `exec_queries'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/relation.rb:631:in `load'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/relation.rb:249:in `records'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/relation.rb:553:in `destroy_all'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/newrelic_rpm-9.9.0/lib/new_relic/agent/instrumentation/active_record_prepend.rb:79:in `block in destroy_all'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/newrelic_rpm-9.9.0/lib/new_relic/agent.rb:883:in `with_database_metric_name'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/newrelic_rpm-9.9.0/lib/new_relic/agent/instrumentation/active_record_prepend.rb:78:in `destroy_all'
/home/benwbrum/dev/products/fromthepage/fromthepage/lib/ahoy_activity_utils.rb:60:in `rollup_transcribe_for_date'
/home/benwbrum/dev/products/fromthepage/fromthepage/lib/tasks/ahoy_activity_rollup.rake:22:in `block (3 levels) in <main>'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/core_ext/range/each.rb:9:in `each'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/core_ext/range/each.rb:9:in `each'
/home/benwbrum/dev/products/fromthepage/fromthepage/lib/tasks/ahoy_activity_rollup.rake:15:in `block (2 levels) in <main>'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/rake-13.2.1/exe/rake:27:in `<top (required)>'
/home/benwbrum/.rvm/gems/ruby-2.7.3/bin/ruby_executable_hooks:22:in `eval'
/home/benwbrum/.rvm/gems/ruby-2.7.3/bin/ruby_executable_hooks:22:in `<main>'

Caused by:
Mysql2::Error: Unknown column 'date' in 'where clause'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/mysql2-0.5.6/lib/mysql2/client.rb:151:in `_query'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/mysql2-0.5.6/lib/mysql2/client.rb:151:in `block in query'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/mysql2-0.5.6/lib/mysql2/client.rb:150:in `handle_interrupt'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/mysql2-0.5.6/lib/mysql2/client.rb:150:in `query'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:206:in `block (2 levels) in execute'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/dependencies/interlock.rb:48:in `block in permit_concurrent_loads'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/concurrency/share_lock.rb:187:in `yield_shares'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/dependencies/interlock.rb:47:in `permit_concurrent_loads'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:205:in `block in execute'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract_adapter.rb:696:in `block (2 levels) in log'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/concurrency/load_interlock_aware_monitor.rb:26:in `block (2 levels) in synchronize'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract_adapter.rb:695:in `block in log'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/notifications/instrumenter.rb:24:in `instrument'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract_adapter.rb:687:in `log'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:204:in `execute'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/mysql/database_statements.rb:52:in `execute'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:215:in `execute_and_free'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/mysql/database_statements.rb:57:in `exec_query'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract/database_statements.rb:532:in `select'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract/database_statements.rb:69:in `select_all'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/abstract/query_cache.rb:103:in `select_all'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/connection_adapters/mysql/database_statements.rb:12:in `select_all'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/querying.rb:47:in `find_by_sql'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/relation.rb:843:in `block in exec_queries'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/relation.rb:861:in `skip_query_cache_if_necessary'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/relation.rb:828:in `exec_queries'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/relation.rb:631:in `load'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/relation.rb:249:in `records'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activerecord-6.1.7.6/lib/active_record/relation.rb:553:in `destroy_all'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/newrelic_rpm-9.9.0/lib/new_relic/agent/instrumentation/active_record_prepend.rb:79:in `block in destroy_all'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/newrelic_rpm-9.9.0/lib/new_relic/agent.rb:883:in `with_database_metric_name'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/newrelic_rpm-9.9.0/lib/new_relic/agent/instrumentation/active_record_prepend.rb:78:in `destroy_all'
/home/benwbrum/dev/products/fromthepage/fromthepage/lib/ahoy_activity_utils.rb:60:in `rollup_transcribe_for_date'
/home/benwbrum/dev/products/fromthepage/fromthepage/lib/tasks/ahoy_activity_rollup.rake:22:in `block (3 levels) in <main>'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/core_ext/range/each.rb:9:in `each'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/activesupport-6.1.7.6/lib/active_support/core_ext/range/each.rb:9:in `each'
/home/benwbrum/dev/products/fromthepage/fromthepage/lib/tasks/ahoy_activity_rollup.rake:15:in `block (2 levels) in <main>'
/home/benwbrum/.rvm/gems/ruby-2.7.3/gems/rake-13.2.1/exe/rake:27:in `<top (required)>'
/home/benwbrum/.rvm/gems/ruby-2.7.3/bin/ruby_executable_hooks:22:in `eval'
/home/benwbrum/.rvm/gems/ruby-2.7.3/bin/ruby_executable_hooks:22:in `<main>'
Tasks: TOP => fromthepage:summarize_ahoy_activity_for_last_n_days
(See full trace by running task with --trace)

@WillNigel23 WillNigel23 force-pushed the 4117-clean-up-ahoy-records-after-roll-up branch from 2863313 to 23ab536 Compare June 27, 2024 18:16
exclude_actions = AhoyActivitySummary::WEEKLY_TRIAL_COHORT_TARGET_ACTIONS +
AhoyActivitySummary::WEEKLY_TRANSCRIBER_COHORT_TARGET_ACTIONS

Ahoy::Event.where('time < ?', keep_after_date).where.not(name: exclude_actions).destroy_all
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried running this locally in my development environment, and the process ran for an hour and got up to 15GB of memory before I killed it.

Can we switch these two lines to call delete_all instead of destroy_all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Although, without proper callbacks we are producing orphaned records this way (I believe deeds will be affected). Are we sure that is okay?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Ahoy events should be leaf nodes in the data model, and Ahoy Visits without any events should also be leaf nodes. Let me take a look at the deed data model and think harder about this.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, Deed has a foreign key to Ahoy::Visit, which we use when showing which deeds were created during a particular session on the Admin->Users->Visits listing. We never travel in the other direction in the application, so we shouldn't run into a nil visit when displaying a deed.

That said, it would be nice to clean these up. Perhaps we could get the list of visit_ids which we are planning to delete, call delete_all on them, then call Deed.where(visit_id: visit_ids_to_delete).update_all(visit_id: nil)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please try again and see how is the performance with this one.

We do destroy_all in batches. Maybe that would help.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please git pull --rebase or reset to head if there are any conflicts

@WillNigel23 WillNigel23 force-pushed the 4117-clean-up-ahoy-records-after-roll-up branch from ebbea3b to b4fdb17 Compare July 3, 2024 22:34
@benwbrum benwbrum merged commit 17257df into development Jul 16, 2024
@benwbrum benwbrum deleted the 4117-clean-up-ahoy-records-after-roll-up branch July 16, 2024 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up Ahoy records after roll-up
2 participants