-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean orphaned page article link records #4175
Clean orphaned page article link records #4175
Conversation
has_and_belongs_to_many :categories, -> { distinct } | ||
belongs_to :collection, optional: true | ||
has_many(:target_article_links, :foreign_key => "target_article_id", :class_name => 'ArticleArticleLink') | ||
scope :target_article_links, -> { include 'source_article' } | ||
scope :target_article_links, -> { order "articles.title ASC" } | ||
|
||
has_many(:source_article_links, :foreign_key => "source_article_id", :class_name => 'ArticleArticleLink') | ||
has_many(:page_article_links) | ||
has_many :page_article_links, dependent: :destroy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This callback is in page.rb, but seems missing on article side.
def up | ||
page_article_links = PageArticleLink.where(article_id: nil).or(PageArticleLink.where(page_id: nil)) | ||
|
||
page_article_links.in_batches(of: 1000).destroy_all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
migration to clean existing bad records
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same caveat here.
2f1a2c0
to
e5716da
Compare
@@ -530,7 +529,7 @@ def clear_links(text_type) | |||
if self.page_article_links.present? | |||
self.clear_article_graphs | |||
# clear out the existing links to this page | |||
PageArticleLink.where("page_id = #{self.id} and text_type = '#{text_type}'").delete_all | |||
PageArticleLink.where("page_id = #{self.id} and text_type = '#{text_type}'").destroy_all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing delete_all does not trigger active_record callbacks , therefore no validations.
Moving forward, unless we specifically want to avoid callbacks, we should use destroy instead of delete in order to avoid unvalidated changes.
e5716da
to
743e80a
Compare
app/models/article.rb
Outdated
@@ -81,7 +72,7 @@ def source_text | |||
|
|||
def self.delete_orphan_articles | |||
# don't delete orphan articles with contents | |||
Article.delete_all("source_text IS NULL AND id NOT IN (select article_id from page_article_links)") | |||
Article.where('source_text IS NULL AND id NOT IN (SELECT article_id FROM page_article_links)').destroy_all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has the same issue caveat as #4217 :
Project owners can upload canonical subjects via a spreadsheet, and we don't want to delete those if they are not used.
If the subject record was created via a spreadsheet upload, then the Article record will have a provenance of the filename they uploaded. Every subject that was created via the transcription process will have a nil provenance. So when we "clean up orphaned subjects" at page save, we need to only delete articles with a nil provenance and no links.
(In production 6000/200,000 have a provenance.)
@@ -146,7 +137,7 @@ def possible_duplicates | |||
def clear_links(type='does_not_apply') | |||
# clear out the existing links to this page | |||
if self.id | |||
ArticleArticleLink.where("source_article_id = #{self.id}").delete_all | |||
ArticleArticleLink.where("source_article_id = #{self.id}").destroy_all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
def up | ||
page_article_links = PageArticleLink.where(article_id: nil).or(PageArticleLink.where(page_id: nil)) | ||
|
||
page_article_links.in_batches(of: 1000).destroy_all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same caveat here.
Oops. Just realized I'm wrong here -- we're not deleting Articles, but links. Please ignore my review.... |
6ad9ddd
to
0244a2c
Compare
0244a2c
to
e61892a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-review to see if this might be causing #4269
|
||
has_many :pages, through: :page_article_links, counter_cache: true | ||
has_many :pages, through: :page_article_links |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting we're missing the counter_cache
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug in current rails version. This ticket provides more discussions on that
def self.delete_orphan_articles | ||
# don't delete orphan articles with contents | ||
Article.delete_all("source_text IS NULL AND id NOT IN (select article_id from page_article_links)") | ||
Article.where(provenance: nil). | ||
where('source_text IS NULL AND id NOT IN (SELECT article_id FROM page_article_links)').destroy_all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this was not working the way we expected it to, it could account for some of what we're seeing in #4269
No description provided.