-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load translation_of into solr work title + allow work.* field in search #8770
Load translation_of into solr work title + allow work.* field in search #8770
Conversation
…itions of a work in alternative titles.
…prefaced with 'work.'
… fields prefaced with 'work.'", as it does not follow the intended solution. This reverts commit 1bc82d9.
…ly be located and deleted for the final version.
for more information, see https://pre-commit.ci
…ns, and removed prefix from work.title in solr params.
…ross the entire list of parameters, rather than just the first index.
…deitch/openlibraryfork into 8700/feature/search-for-translation
for more information, see https://pre-commit.ci
…deitch/openlibraryfork into 8700/feature/search-for-translation
for more information, see https://pre-commit.ci
…ration over a non-iterable object.
…deitch/openlibraryfork into 8700/feature/search-for-translation
…interchangeable in Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good thank you @benbdeitch ! A few fixes to put in place :)
I fixed most of the issues; I'll get to the last one tomorrow. Thanks for catching them! |
…ct traversal of the luqum tree when removing the 'work.' prefix.
…deitch/openlibraryfork into 8700/feature/search-for-translation
for more information, see https://pre-commit.ci
…deitch/openlibraryfork into 8700/feature/search-for-translation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok this all looks correct! One suggestion. Putting on testing.openlibrary.org now to test it out!
assert fn('work.title:Bob') == 'title:Bob' | ||
assert fn('title:Joe') == 'title:Joe' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use some complex queries here? These tests would also pass this function ;)
def luqum_replace_field(query, replacer):
return replacer(query)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests:
- Search for
title:
finds editions - Search for
work.title
does not match editions - Search for
work.title
matches works - Test field other than title https://testing.openlibrary.org/search?q=ddc%3A8*+first_publish_year%3A%5B*+TO+1950%5D+work.publish_year%3A%5B2000+TO+*%5D+ebook_access%3Apublic&mode=everything
All good! I won't be able to test the solr updater on testing.openlibrary.org , so will wait for that to be on prod. Either way it'll require a full reindex to affect the full catalog, although it will work on edits as they happen!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm! Will merge after that last final code review suggestion is fixed up 😊
…ch (internetarchive#8770) * Solr Updater now keeps track of all 'translation_of' fields of the editions of a work in alternative titles. * Some logging for development, and registering 'work.' fields as valid. * Fixed typo in 'startswith' method * Prevented convert work_field to edition_field from processing fields prefaced with 'work.' * Revert "Prevented convert work_field to edition_field from processing fields prefaced with 'work.'", as it does not follow the intended solution. This reverts commit 1bc82d9. * Marked additional logging messages with [TODO]. so that they can easily be located and deleted for the final version. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed variable name, prevented work.title from being passed to editions, and removed prefix from work.title in solr params. * Fixed the remove_work_prefix_from_query function to properly crawl across the entire list of parameters, rather than just the first index. * Removed the logging added for testing purposes. * Removed unnecessary set comprehension * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Adding 'translation of' property to Solr edition objects, to eliminate bug. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removed 'book in' for 'book.translation_of', to prevent attempted iteration over a non-iterable object. * Fixed set() declared as {}. The empty set and the empty dict are not interchangeable in Python * misc fixes, as suggested in PR review * Other misc fixes, as suggested in PR review. * removed unnecessary try/except clause. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added support for new 'luqum_replace_fields' futil, and fixed incorrect traversal of the luqum tree when removing the 'work.' prefix. * Fixed whitespace errors * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added unit test for luqum_replace_fields function. * Added newline to end of file * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed testing * Removed unneeded argument. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added additional testing --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Closes: #8700
This PR enables Solr support for searching by 'work.title', or other related fields, in addition to searching for edition titles.
Technical
This PR the Solr indexes of works to include all 'translated_from' fields of their editions under 'alternative titles', as well as including the 'translation_of' property in the SolrEditionBuilder class. In addition, it currently alters the 'is_search_field' function to remove 'work.' prefixes from strings, before checking them against the all_fields list.
Once the query passes as a valid field, it is ignored from conversion to an edition query, and added back on to the query afterwards, with 'title' removed, in order to achieve proper behavior.
Testing
Simply search for queries involving 'work.:' on the search bar. This should show all editions that are translations of the work.
Screenshot
It currently does not behave correctly.
Stakeholders
@cdrini