-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store vector embeddings for talks, and display related talks #19
Conversation
Thanks @crohr for this prototype it looks very promising. EmbeddingsTo move forward if we want to have this in production I guess we could as a first step put in place the logic to compute the embeddings and store them in the talk model. With some kind of logic to recompute them every time title / description changes. Then Meillisearch would just index that new column from our model. Can this work or I am missing something? UII supposed you put the list below for testing. Ultimately this should feed the cards that are on the right and replace this random suggestion. But while developing let's keep it like that. At some point, we might want to deploy this feature to prod behind a feature flag so that will make it easier to test DevopsTo release it to prod I will need to update the Meillisearch engine. We probably need to wait a bit as it seems pretty new and they highly recommend waiting. This being said whatever result we get out of it will always be better than a random suggestion. |
@Kerollmops I've tried using
Any reason why this line calls |
I tried to selectively tell meilisearch to ignore _vectors when title or description hasn't changed, but it doesn't seem to work. So yes storing there embedding in sqlite and recomputing with classic AR callbacks if title or description changes would work.
Yes, kept it simple for now, you're the UI guy :) Also, I didn't really notice that videos on the right were supposed to be "more like the current one". I think it could make sense to keep both exploratory videos on the right and related videos below the one you've just visioned, but maybe I'm wrong.
Not sure how you deploy the meilisearch container / process on the server, but yes since there is nothing in meilisearch that can't be reindexed from sqlite, I think it's ok to deploy alpha/beta software in that case. |
Hey @crohr 👋
Sorry for the delay. I will summon @brunoocasali on this one. I would expect this to work as for the Ruby integration, Note that, currently, Meilisearch isn't particularly smart when only the title or the description is updated it will reindex the document entirely, the vectors too! I am currently working on something that could help in this regard... |
Hey @crohr 👋 Do you plan to release this recommendation system? We improved the Vector Store solution since then 🧇 |
@Kerollmops thanks for the update I will look into updating the Meilisearch version into the hosting platform. There is an official docker image available now? If I understand correctly I need a 1.3+ version to enable it right? |
@adrienpoly Indeed, you need a v1.3.x and we provide a Docker image and all sort of binaries |
Sure, I will have another look this week, thanks for the update! |
attribute :title | ||
attribute :description | ||
attribute :slug | ||
attribute :video_id | ||
# ⚠️ This `video_id` attribute makes indexing (silently) fail with v1.3.2 of meilisearch. Error message from meilisearch (GET /tasks): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kerollmops I wonder if there is a regression here with v1.3.2, since it fails to index even if I specifically set the primary_key: :id
as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recently specified the primary key to index one of my datasets and haven't had issues. Are you sure it was introduced in v1.3.2? Is it related to the ruby SDK? I used the primaryKey
query parameter with the POST /indexes/{indexname}/documents
route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @crohr why do you need to specify the primary key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just an attempt to fix the issue I'm explaining in the comment, but it doesn't make a difference (and shouldn't since this is the default)
attribute :video_provider | ||
attribute :thumbnail_sm | ||
attribute :thumbnail_md | ||
attribute :thumbnail_lg | ||
attribute :speakers do | ||
speakers.pluck(:name) | ||
end | ||
# ⚠️ This must return nil and not an empty array if no vector is available. | ||
# Otherwise all other indexing tasks with non-zero vector arrays will silently fail, since the engine will expect all vectors to have the same length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kerollmops this was by far the most time-consuming issue I had with meilisearch vector indexing: seems like meilisearch rejects any document that doesn't have a vector of the same size as the first one it has seen. If you mistakenly send a zero-sized vector (or incorrectly sized vector) for your first document, then all other documents with a different vector size will be rejected. This should be put in a big warning somewhere, or hopefully have some better error message in the server logs (had to inspect the tasks list to see this error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the report @crohr. I shared your feedback with the team and will see what we can do. We recently fixed a bug with the _vectors
field that was producing issues when specifying multiple vectors for a single document. This one will be released in v1.3.3.
# Recomputes embedding for all talks that don't have one yet. | ||
def self.reembed!(sleep_interval: 2.seconds, limit: nil) | ||
# required for querying vectors (not indexing) | ||
MeiliSearch::Rails.client.http_patch "/experimental-features", {vectorStore: true} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kerollmops could this be auto-enabled in some way if we start to send vectors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not. You must enable it by hand 😞
@adrienpoly ready to be reviewed, I've updated the description with the new task to run |
thanks @crohr I made a bit of preparation work in #64 to integrate on the front end part the results of the suggestions and to isolate then into a frame so that initial page load for the Talk#show route is not coupled to this suggestion method. My next step is to upgrade Meilisearch in the prod environment and it is not as plug and play as I hope it would be... Anyway it is not neither a very high traffic site so If search is down for a little time that should be ok 😄 Will try to look back into this soon. Thanks for your work |
ca98c66
to
aa6f659
Compare
Thanks for exploring this lots of changed have happened since so I will be closing it. I am now looking at brining this feature back with a full sqlite solution. JoyOfRails has implemented a solution that I think is looking promising joyofrails/joyofrails.com#280 |
This PR adds a "Talks you might be interested in" below each talk. This uses the newly released vector support in meilisearch to find talks that share some similarity with the current one.
Support for vector search is currently non-existant in meilisearch-rails and meilisearch-ruby so we cannot easily (?) fetch the similarity scores to keep only the most interesting ones, but the results are quite good as it is.
To make it work, you need to specify an OpenAI API key in
.env
, and launch a reindex as follows:Note that vector support is still a bit fiddly, so you may have to start from a fresh meilisearch database if it doesn't work for you on the first try (tip: inspect
GET localhost:7700/tasks
to see if anything is going wrong when indexing).Demo below:
rubyvideo-meilisearch-vectors.mp4
Closes #18.