Developer sample written in Angular demonstrating Gemini multimodal (image and audio) input and understanding. The user enters a prompt and the app generates images via VertexAI’s image generation which the user can after that preview in a three-dimensional gallery. The user has an input where they can ask a question about the images. Using Web Audio’s Speech Synthesis API we read Gemini’s answer for the images.
- Node.js and npm
- Download and install Go: https://docs.npmjs.com/downloading-and-installing-node-js-and-npm
- Gemini API key
- Launch Google AI Studio: https://aistudio.google.com/
- Click “Get API Key”
Compile and run the app:
npm i
npm start
In the text box with placeholder "API key" enter your Gemini API key. You can find instructions how to use the app under "Instructions" when you open the user interface.