-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4117 - Clean up ahoy records after roll up #4168
4117 - Clean up ahoy records after roll up #4168
Conversation
'static#landing_page', | ||
'transcribe#display_page', | ||
'transcribe#save_transcription' | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved some relevant constants here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it!
lib/ahoy_activity_utils.rb
Outdated
exclude_actions = AhoyActivitySummary::WEEKLY_TRIAL_COHORT_TARGET_ACTIONS + | ||
AhoyActivitySummary::WEEKLY_TRANSCRIBER_COHORT_TARGET_ACTIONS | ||
|
||
Ahoy::Event.where('date < ?', keep_after_date).where.not(name: exclude_actions).destroy_all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you let me know if this looks right to you @benwbrum ?
In ticket it was mentioned we delete Ahoy::Events and Visits, but it seems there is no 'action' field for Visits. Do I delete only based on started_at
for Visits then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks correct to me.
On events versus visits, one Visit
may have many Ahoy::Event
child records. Generally speaking, a Visit corresponds to a user session, while an Event corresponds to a click or HTTP request.
You can see this in action if you go to the Admin dashboard and click on the Users tab, then click Visits for a user, and drill into a few Visits.
I'm trying to think about the ways we might clean up Visit records. Since one is created for each session--including anonymous sessions or bots/spiders--we do want to clean these as well. We certainly want to delete Visit records that have no child Ahoy::Event records after the Event clean-up.
Any ideas on how to do this in a performant manner?
@@ -2,7 +2,7 @@ namespace :fromthepage do | |||
desc "weekly transcriber cohort" | |||
task :weekly_transcriber_cohort => :environment do | |||
# generate a csv file of users who signed up in the last week and write it out to a temporary file | |||
TRANSCRIBER_TARGET_ACTIONS = ['static#landing_page', 'registrations#new', 'registrations#create', 'transcribe#display_page', 'transcribe#save_transcription'] | |||
target_actions = AhoyActivitySummary::WEEKLY_TRANSCRIBER_COHORT_TARGET_ACTIONS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved these to constants
lib/ahoy_activity_utils.rb
Outdated
AhoyActivitySummary::WEEKLY_TRANSCRIBER_COHORT_TARGET_ACTIONS | ||
|
||
Ahoy::Event.where('date < ?', keep_after_date).where.not(name: exclude_actions).destroy_all | ||
Visit.left_joins(:ahoy_events).where(ahoy_events: { id: nil }).destroy_all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benwbrum this should delete all Visits without ahoy_events. Let me know if this is good for you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a constraint on visit deletion to only delete visits older than 2 weeks (or whatever the constant is). We have used the Visit data to track down problems with scrapers and bots.
5f2a0be
to
6f3e6c2
Compare
I've tested this and run into an exception:
|
2863313
to
23ab536
Compare
lib/ahoy_activity_utils.rb
Outdated
exclude_actions = AhoyActivitySummary::WEEKLY_TRIAL_COHORT_TARGET_ACTIONS + | ||
AhoyActivitySummary::WEEKLY_TRANSCRIBER_COHORT_TARGET_ACTIONS | ||
|
||
Ahoy::Event.where('time < ?', keep_after_date).where.not(name: exclude_actions).destroy_all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried running this locally in my development environment, and the process ran for an hour and got up to 15GB of memory before I killed it.
Can we switch these two lines to call delete_all
instead of destroy_all
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Although, without proper callbacks we are producing orphaned records this way (I believe deeds will be affected). Are we sure that is okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. Ahoy events should be leaf nodes in the data model, and Ahoy Visits without any events should also be leaf nodes. Let me take a look at the deed data model and think harder about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, Deed
has a foreign key to Ahoy::Visit
, which we use when showing which deeds were created during a particular session on the Admin->Users->Visits listing. We never travel in the other direction in the application, so we shouldn't run into a nil visit
when displaying a deed.
That said, it would be nice to clean these up. Perhaps we could get the list of visit_ids
which we are planning to delete, call delete_all
on them, then call Deed.where(visit_id: visit_ids_to_delete).update_all(visit_id: nil)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please try again and see how is the performance with this one.
We do destroy_all in batches. Maybe that would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please git pull --rebase or reset to head if there are any conflicts
ebbea3b
to
b4fdb17
Compare
No description provided.