Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean orphaned page article link records #4175

Merged

Conversation

WillNigel23
Copy link
Collaborator

No description provided.

@WillNigel23 WillNigel23 linked an issue Jun 25, 2024 that may be closed by this pull request
has_and_belongs_to_many :categories, -> { distinct }
belongs_to :collection, optional: true
has_many(:target_article_links, :foreign_key => "target_article_id", :class_name => 'ArticleArticleLink')
scope :target_article_links, -> { include 'source_article' }
scope :target_article_links, -> { order "articles.title ASC" }

has_many(:source_article_links, :foreign_key => "source_article_id", :class_name => 'ArticleArticleLink')
has_many(:page_article_links)
has_many :page_article_links, dependent: :destroy
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This callback is in page.rb, but seems missing on article side.

def up
page_article_links = PageArticleLink.where(article_id: nil).or(PageArticleLink.where(page_id: nil))

page_article_links.in_batches(of: 1000).destroy_all
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

migration to clean existing bad records

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same caveat here.

@WillNigel23 WillNigel23 force-pushed the 3993-3576-clean-orphan-page-article-link-records branch from 2f1a2c0 to e5716da Compare June 25, 2024 19:10
@@ -530,7 +529,7 @@ def clear_links(text_type)
if self.page_article_links.present?
self.clear_article_graphs
# clear out the existing links to this page
PageArticleLink.where("page_id = #{self.id} and text_type = '#{text_type}'").delete_all
PageArticleLink.where("page_id = #{self.id} and text_type = '#{text_type}'").destroy_all
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing delete_all does not trigger active_record callbacks , therefore no validations.

Moving forward, unless we specifically want to avoid callbacks, we should use destroy instead of delete in order to avoid unvalidated changes.

@WillNigel23 WillNigel23 requested a review from benwbrum June 25, 2024 19:12
@WillNigel23
Copy link
Collaborator Author

#3993 and #3576

@WillNigel23 WillNigel23 linked an issue Jun 25, 2024 that may be closed by this pull request
@WillNigel23 WillNigel23 force-pushed the 3993-3576-clean-orphan-page-article-link-records branch from e5716da to 743e80a Compare June 26, 2024 19:16
@@ -81,7 +72,7 @@ def source_text

def self.delete_orphan_articles
# don't delete orphan articles with contents
Article.delete_all("source_text IS NULL AND id NOT IN (select article_id from page_article_links)")
Article.where('source_text IS NULL AND id NOT IN (SELECT article_id FROM page_article_links)').destroy_all
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has the same issue caveat as #4217 :
Project owners can upload canonical subjects via a spreadsheet, and we don't want to delete those if they are not used.

If the subject record was created via a spreadsheet upload, then the Article record will have a provenance of the filename they uploaded. Every subject that was created via the transcription process will have a nil provenance. So when we "clean up orphaned subjects" at page save, we need to only delete articles with a nil provenance and no links.
(In production 6000/200,000 have a provenance.)

@@ -146,7 +137,7 @@ def possible_duplicates
def clear_links(type='does_not_apply')
# clear out the existing links to this page
if self.id
ArticleArticleLink.where("source_article_id = #{self.id}").delete_all
ArticleArticleLink.where("source_article_id = #{self.id}").destroy_all
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

def up
page_article_links = PageArticleLink.where(article_id: nil).or(PageArticleLink.where(page_id: nil))

page_article_links.in_batches(of: 1000).destroy_all
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same caveat here.

@saracarl
Copy link
Collaborator

Oops. Just realized I'm wrong here -- we're not deleting Articles, but links. Please ignore my review....

@WillNigel23 WillNigel23 force-pushed the 3993-3576-clean-orphan-page-article-link-records branch from 6ad9ddd to 0244a2c Compare July 16, 2024 17:20
@WillNigel23 WillNigel23 requested a review from saracarl July 16, 2024 17:20
@WillNigel23 WillNigel23 force-pushed the 3993-3576-clean-orphan-page-article-link-records branch from 0244a2c to e61892a Compare July 16, 2024 22:43
@benwbrum benwbrum merged commit d8cbf2f into development Jul 18, 2024
1 check passed
@benwbrum benwbrum deleted the 3993-3576-clean-orphan-page-article-link-records branch July 18, 2024 16:48
Copy link
Collaborator

@saracarl saracarl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review to see if this might be causing #4269


has_many :pages, through: :page_article_links, counter_cache: true
has_many :pages, through: :page_article_links
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting we're missing the counter_cache here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug in current rails version. This ticket provides more discussions on that

#4005

def self.delete_orphan_articles
# don't delete orphan articles with contents
Article.delete_all("source_text IS NULL AND id NOT IN (select article_id from page_article_links)")
Article.where(provenance: nil).
where('source_text IS NULL AND id NOT IN (SELECT article_id FROM page_article_links)').destroy_all
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was not working the way we expected it to, it could account for some of what we're seeing in #4269

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Orphan article-article links cause 500 error Orphan article-article or page-article links?
3 participants