Skip to content

Latest commit

 

History

History

multimodal-embeddings

This repo is provided as-is for reference, since you need a Firebase project with at least one collection that contains embeddings, and the proper APIs enabled (with billing) to generate embeddings.

NEW! While not fully functional, the repo now includes exported Firebase emulator data so you can get up and running quicker, follow along below!

Multimodal Embeddings Demo

Check out this video on the Labs.google X account to see a quick overview of the project:

Overview Video on X

Unsure what embeddings are? Here's an old video we made about visualizing embeddings that does a good job explaining the basics. Learn more about Multimodal Embeddings in the Cloud docs here.

This repo represents most of the code used in Khayti's "Personal Search" demo in the video above, with two endpoints that can be used to explore your own embeddings:

  1. /search - where we use Firebase Vector Search to find the closest embeddings to both text and image input (image search!), and
  2. /viz - bonus! where we use UMAP to reduce the dimensions of our embeddings to visualize their relationships in 3D.

The app is built with Firebase and SvelteKit, which uses Threlte for a declarative 3D rendering engine built on top of THREE.js (for /viz).

Get Started

We'll be able to test quickly with some exported Firebase Emulator data so let's dive right in:

  1. Create a new project in Firebase that has Functions, Firestore and Storage enabled.
  2. Run firebase init within this folder, enabling Firestore, Storage and Emulators to quickly be able to test with emulator data.
  3. Update /src/lib/consts.ts with your firebase project info.
  4. npm i && npm run dev:emulate should now work, building the site and starting the emulators. You can test this by visiting http://localhost:5173/viz, which should load in the provided 'Weater' dataset.
  5. Optional - Get a Gemini API key for any Gemini-related extra tasks.

firebase init creates some files, like firebase.rc and some rules for Firestore and Firebase Storage. If you run into errors like the 'weather' images not loading in /viz, it could be the storage rules being set to 'false' as opposed to something that allows them to be loaded. Learn more in the Visualizing section below.

Firebase Cloud Function for Embedding Generation

We've included a little bonus here in /fb/functions that can automatically generate embeddings for files uploaded to your Cloud Bucket. It also generates collections based on the folder structure of the uploads.

This was great for our team when we were prototyping with the API since anyone could just create a new folder, upload images, and have it available in the UI for exploration.

Check out /fb/README.md for more info.

Note: you can get this to run in the emulator as well, but its out of scope for this already too-long doc.

There are a bunch of utility methods and components as well in /src/lib, but most importantly used by /search and /viz is /src/lib/components/CollectionList.svelte which will attempt to pull in any Firestore Collections created by the function in /fb/functions (if you choose to use that).

Create embeddings yourself

Read through the Multimodal Embeddings documentation as our code in /src/lib/embedder.ts implements this almost exactly. You'll need to add your project name to this file as well for it to work. You send text, images, or video, and receive back a vector that needs to be store in Firebase.

embedder.ts is also called directly from the Function in /fb/functions/index.ts so we didn't have to have two copies of the code in each sub module.

I have embeddings, now what?

Next, read through the Firebase Vector Search docs.

You'll store each image embedding in a Firestore document via FieldValue.vector(), and once you've done this for all your embeddings, you'll need to create an index of that collection so the Vector Search can work.

If you have not created an index, but go to /search and try to search your collection, you'll get a handy error that gives you the exact code to run in order to start that, something like this:

gcloud firestore indexes composite create \
--collection-group=collection-group \
--query-scope=COLLECTION \
--field-config field-path=vector-field,vector-config='vector-configuration' \
--database=your-database-id

Once indexed, your collection can be searched!

Searching

Again, everything in this repo follows the Firebase Vector Search docs closely, and for searching, we're making a nearest-neighbor query.

Conceptually though, you're doing two things:

  1. Embedding the query in order to place it within the same space as your collections embeddings, then
  2. Doing a nearest-neighbor lookup to find any results that are nearby to your query.

And since we're using the Multimodal Embeddings API, your query can be text, an image, or a video.

Important Note - you'll notice that we also have a file /src/lib/cloud-firebase.ts. At time of creating this demo, the actual Search APIs only resided in @google-cloud/firestore on NPM, which is separate from the normal Firebase web APIs in npm's firebase that are used elsewhere in the app.

Visualizing

/viz takes your Firestore collections and attempts to plot them in 3D using UMAP, an API similar to T-SNE but much faster (and just as non-deterministic).

Learn more about UMAP here.

/viz using the public weather dataset mentioned above

It was a WIP that was never fully completed but should get you 90% of the way there. What's important to note is lowering dimensions on embeddings inherently loses information, so while its a really nice way to visualize things it shouldn't be considered an exact representation of the embeddings (which are 1408 dimensions).

Experiments for all

This is an experiment, not an official Google product. We’ll do our best to support and maintain this experiment but your mileage may vary.

We encourage open sourcing projects as a way of learning from each other. Please respect our and other creators’ rights, including copyright and trademark rights when present, when sharing these works and creating derivative work. If you want more info on Google's policy, you can find that here.