-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MARC records incorrectly matching ISBN promise item records #9839
Conversation
unfortunately this test demopnstrates the correct result of not matching the light record, so it's still unclear why a dated record matched an undated record with an ISBN. More investigation is required.
Following through how and where the threshold match code is used:
in a three stage check: find_quick_match(rec) openlibrary/openlibrary/catalog/add_book/__init__.py Lines 838 to 847 in fa22a2e
which is effectively:
|
changing the return type to str | None rather than str | bool because previously find_match() -> str | None | bool
to confirm the refactor hasn't broken anything (tests should pass again)
it's failing my tests on whitespace, tests which pass locally, and makes less than ideal 'suggestions' as commits
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I also double checked by running the original import from #9794 locally, and the the correct edition is matched with the marc_columbia import.
Closes #9808
Closes #9794
After some investigation, this PR:
exact match
check that was matching existing light records with only title + ISBN against MARC records only on the title field, and skipped the more intelligent threshold match method that at least tries to score matches.find_threshold_match()
and makes sure it is used for performing edition matchesInvestigation notes
threshold_match()
test currently passes by not matching the two records.find_match()
test currently fails, and I think this is what caused Why did a specific MARC match OL49206154M? #9794The THRESHOLD checking code seems to do the right thing, but from following through the add_book load code based on your initial investigation in #9794, that threshold checking isn't used for this import flow. If there is a title match, it looks like it counts as an absolute match, and there is also specific code to trigger promise item records getting overwritten by MARC records.
At this stage I'm not even sure that that special case is being triggered here; the record might get added to anyway?
I'm not sure where the 'don't count the match if existing record has an ISBN' code should go. I'm still trying to make sense out of why the threshold code, which appears to be the right thing, isn't being used here.
Feel free to review and let me know if you have any ideas on how to improve this.
Technical
Testing
Screenshot
Stakeholders
@scottbarnes