Skip to content

Jupyter notebook code for movie recommendations. Created with using OpenAI's text-embedding-3-small model.

Notifications You must be signed in to change notification settings

szilvia-csernus/movie-recommendations-from-embeddings

Repository files navigation

Movie Recommendations with Embeddings

Recommends the k most similar movie(s) after their plot texts' similarities.

5000 American movies are selected from a wiki dataset (see in Credits). For each movie plot, I created a text embedding with OpenAI's "text-embedding-3-small" model.

Text embeddings measure the relatedness of text strings by turning the texts into high-dimentional vectors of floating point numbers. The distance between two vectors measures their relatedness: small distances suggest high relatedness and large distances suggest low relatedness.

To list movie recommendations for a selected movie, I selected the records with the smallest vector distances.

resommendations-image

NOMIC Atlas Map Visuals

By visualising the high-dimensional text embeddings in a 2D map with the help of NOMIC Atlas, we can see distinguishable clusters.

clusters

https://atlas.nomic.ai/data/csernusszilvi/experimental-arora/map

How to run this project?

  1. Prerequisites:

    • Make sure Python3 is installed.
    • If you don't have an account with OpenAI, create one here: https://openai.com/
    • Create a project API key under Dashboard / API keys
    • Create a NOMIC Atlas account here: https://atlas.nomic.ai/
  2. Clone the project. - Be aware that the project includes the original dataset I used (wiki_movie_plots_deduped.csv) as well as the cached, movie_embeddings.pkl file which are 81MB and 86MB in size, respectively. Assuming you choose to run the embedding function with the same parameters as in the project, the cache file would help avoid charges from OpenAI,. If you plan to use the embedding function for a different dataset / model, downloading these files won't be neccessary.

  3. Create a virtual environment inside the project folder:

    python -m venv venv

  4. Activate the virtual environment:

    Mac: source venv/bin/activate

    Windows: venv\Scripts\activate

  5. Select interpreter in VSCode:

    (on Mac) Cmd + Shift + P ---> Select Interpreter ---> Select the created venv environment

  6. Create an .env file in the root folder and add your project's API key:

    OPENAI_API_KEY=your-unique-opanai-project-key
    
  7. Install the python dependencies:

    pip install -r requirements.txt

  8. Log in into NOMIC Atlas

    • In the terminal: run nomic login,
    • click the link to retrieve your API KEY then return to the terminal to run nomic login <your-api-key> to get authenticated.
  9. Run the Jupyter Notebook:

    • jupyter notebook command will open the Notebook in the browser.
    • Run the commands in the given order in the movies-embedding.ipynb file, adjusting the models and cost calculations as neccessary.
    • I used caching when I ran the embedding function myself. The cached pickle file, movie_embeddings.pkl is part of this project folder. If you don't change the dataset or the text-embedding model, you won't be charged as the embedding function will use the cached data whenever it's available.
    • Be aware that you'll be charged by OpenAI for running the embedding function if you use a different dataset and / or embedding model.

Credits

About

Jupyter notebook code for movie recommendations. Created with using OpenAI's text-embedding-3-small model.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published