-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partner Imports: Detect low quality publishers #6611
Partner Imports: Detect low quality publishers #6611
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this approach!
0d2126c
to
ad86382
Compare
Can you remove the last commit? It looks like an autoformatter ran with it and I can't see the diff anymore |
Oh, Fudge! I got sick of hand wrapping long lines so I ran black. big mistake. I will do the deed in my morning. |
ef4f477
to
5df18dc
Compare
5df18dc
to
f03bfea
Compare
- Check publish_date field, not created - Publish block list blocks import regardless of name match - Add more title block words for independently published books
f03bfea
to
c5a15c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! I believe this should do the trick. Notable changes I made:
- These publishers are actually (confusingly) imported as authors ; so check the authors field
created
isn't on the record;publish_date
is the field we seek- The publisher-author exclude list is a hard exclude; it doesn't require the title to contain anything. Those publisher-authors should always be excluded
Let me know if anything looks off! If all is good, please feel free to merge :)
Closes #6573
Closes #6604
Create a set of
LOW_QUALITY_PUBLISHERS
(is there a better name? SPAM_PUBLISHERS?) and then for each book create a set of publishers. IF there isnotebook
in the book's title AND there is an intersection between the two sets THEN it is a low-quality book that we should not import.Technical
Testing
See:
scripts/tests/test_partner_batch_imports.py
Screenshot
Stakeholders