-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Litesearch POC #58
Litesearch POC #58
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
Regarding similarity search, instead of similarity matching, and since Litesearch sorts by rank by default, did you think of trying out extracting the most significant words from the current video title & description and then doing an OR search with them? the resulting set would be sorted by those closest to the search query. The trick here would be to manually get rid of what could be considered stop words (currently Litesearch has no facility for doing so) |
yeah, I thought about that but going that route I feel, I ll be re inventing a search engine. This is where the combo Sqlite Meilisearch was interesting as Meilisearch brings all of this already. The pain point I have with Meilisearch is the upgrades are not really easy. I ll see if a simple Litesearch is good enough especially once I have some tags filters available |
I can try to hide much of the complexity and offer a model#similar method on AR objects, could be a nice abstraction. |
Litesearch now has a similar method on the index, and on any AR or Sequel model object |
56d2c1b
to
846db36
Compare
This comment was marked as outdated.
This comment was marked as outdated.
@oldmoe I tried to run it out of master branch but I am getting this error
I tried to run in console
but it returns the same error if I rollback to the latest official release, litesearch works ok (but no similarity search) |
Thanks for trying it out, turns out this is due to the tokenizer being a trigram one, I am looking into how to avoid tokens that would cause syntax errors, could you please send me the data for the particular object you are testing? |
I have just pushed a change that would fix the issue, but I am not sure of the quality of the similarity search using the terms stored in the trigram tokenized index, a porter or unicode tokenizer will yield much better similarity results. I think I will need to reconsider how similarity is implemented for trigram indexes specifically |
The data can be found in /data https://github.com/adrienpoly/rubyvideo/tree/main/data it is all the videos.yml file that are indexed by the Talk model |
If you run this branch a simple rails db:create db:seed and bin/dev should get you up and running then you can update the related_talks method to use lite search similar |
closing for now |
This is mostly a POC at this time to test the Litesearch capabilities.
It is very straight forward to integrate with Active record. The ability to set a weight. Being able to replace Meilisearch here would be nice to have something easier to install/deploy. Maybe we can keep the vector based recommendations from #19
Current limitation is that I cannot search on speakers name as the
through
associations are not yet supported oldmoe/litestack#45