-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solr search result ordering broken #4938
Comments
After looking at the existing search order stories and other docs (#1928 , #2472), we need a new definition of our goals when doing search order. Having those in a "spec" will make it easier to config solr for our needs. There is some good info to start on in #1928 but its doesn't cover the breadth of the term space. For reference, we have these fields boosted in the
In the end there is going to be some "magic" in how the results are returned, but we can steer that magic with some configuration. We should be able to pin some fields to show up extremely high in the results, but we'll need to be fairly choosy in the design we want to apply. p.s. I think part of our problem currently is that we added boost values without a real metric of the raw score values being returned per field by search algorithm. Also, I think one quick win is we can add is a |
I realized I made a fundamental mistake in how I was testing solr results. Looks like where we put the boosting in solr-4.6 did not work when we switched to solr-7.x. So our boosting IS turned off in 4.9.2. That being said, changes in solr-7.3 mean that the search ordering is wacky with our old boost values. So the discussions were still of value and will inform the new solr configuration. |
Hey @matthew-a-dunlap, I know we discussed this last week, but as this heads to QA is there a short summary of the expected changes before/after with this PR? Thank you! |
Pull request #5080 looks fine to me so I moved it to QA. The reason the fix works (in theory, since I haven't tested it) is tied in with the long comment I left in d3b721e when I changed the Solr request handler from "/spell" to "/select" in pull request #4520 when we upgraded from Solr 4 to Solr 7. Here's that comment: Back in 60e640b when I was playing with spelling suggestions from Solr I In the new pull request, we are putting the boosting under "/select", the out of the box request handler we use in Solr 7 rather than "/spell", the non-default request handler we used in the Solr 4 days. This is probably hard to follow but I'm happy to explain more in person or write more here. |
@djbrooke Summary is I updated the solrconfig.xml to move the boosting to the more normal location where it'll be picked up by Solr 7. Also, due to changes in the (non-boosted) weights in Solr 7 and needs to prioritize Dataverses even more, I increased the spread of the boosting so Dataverses show at the top for most matches. I also added a tie variable that means objects that match multiple fields will get more weight. The upgrade needs have not changed for the next release as upgrading this is the same as upgrading highlighting. We'll need to add the new solrconfig.xml, restart and reindex. |
This was moved back into develop for two reasons:
|
I've taken another stab at the boosting. It is more in line with the original boosting values, but I added an extra xml option for phrase based boosting. This means that when a user searches two words results with both of those words will get boosted extra, especially if they are right next to each other. |
Thanks @matthew-a-dunlap, let's discuss today if possible. |
I moved this over to QA because I'm happy with the results shown to me on a test server. @mheppler is happy with it as well. @matthew-a-dunlap - if there's anything helpful that you can add in here for QA, please do so. Thanks for your work on this issue. |
After upgrading solr, the configuration we had to boost search matches on certain Dataverse fields is broken.
Based upon our configuration, the boosting should be in this order:
Yet the result is off (note that this screenshot does not contain all the fields boosted)
There is more information (production examples, etc) captured in #4836 , along with information on solr highlighting which has been fixed.
The text was updated successfully, but these errors were encountered: