-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
find API: add experimental spell checking #1459
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
- Agree that
_sortKeyByLang.sv
is a reasonable starting point - Perhaps
_spell=true
should be the default. 🤔 Has performance implications though. _spell=only
might be a good option for clients to do spell checking "on the side" with a separate request without slowing down the main search query.- We probably want to promote
_spell
(or whatever we decide on) to an official vocab (non-underscore) term . Let's do this later. - TODO: Add support in "new style" search API after Feature/rework new search #1455 is merged
Changes requested:
- I think we should provide a link to the corrected query within the suggestion. So that the client doesn't have to do any URL manipulation. See e.g. facet links.
- I think we should remap the terms in the result to keep the API free from elasticsearch details.
What about:
curl -s "http://localhost:8180/find?q=flyttning%20och%20peldning&%40type=Instance&_spell=true" | jq '._spell'
[
{
"label": "flyttning och <em>pendling</em>",
"view": { "@id": "/find?q=flyttning%20och%20pendling&%40type=Instance&_spell=true" }
}
]
?
We might want to use something more specific than label
to indicate that it contains some markup.
(I don't think the score is useful?)
@olovy Sounds good! See latest commit. (Probably we wouldn't want to actually show suggestions unless there are no search hits (or perhaps very few hits?), otherwise there might often be annoying suggestions even if you spell correctly. Though this is something for the client to decide.) Score might be useful internally if we want to have a threshold for whether a suggestion should be returned, but we'll see. I agree it's probably not useful in the API result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
See suggestion about using the same naming convention as descriptionHTML
A first stab at adding spell checking, to be used by Libris sök. Uses Elasticsearch's phrase suggester with two generators as described in the ES docs.
Uses
_sortKeyByLang.sv
as the field to get suggestions from, as it seems like a reasonable choice; with e.g._all
we'd get lots of "bad" suggestions since it contains lots of oft-repeated vocab stuff. (sv vs en shouldn't matter for these purposes.)Quick testing on the dev data is promising. We might want to tinker with both the suggester configuration/query and the field(s) to target. This is just somerthing to start with.
With
/find?q=foobar&_spell=true
the spell checking is done in addition to the regular query and returned along with the usual results.With
/find?q=foobar&_spell=only
only the spell checking query is performed.(Possibly we want to keep it separate from /find though? 🤔 ...or have it both in /find, for internal use, and a bibspell-like thing (returning "bibspell-compatible" results would of course be trivial))
curl directly against ES, local dev:
https://kbse.atlassian.net/browse/LWS-87