Explore the remarkable capabilities of Gemini, an open-source application powered by the Google Gemini Vision API(Gemini-1.5-flash / gemini-1.5-pro modal). Seamlessly reasoning across text, and images and voice. Gemini is your gateway to the future of AI.
You can use your camera and screen capture (chrome browser) ! .
If you like this repo, Give me a star ⭐ ~
Demo: Gemini Assistant Demo (Need Chrome/Edge Browser)
git clone https://github.com/youkpan/gemini-assistant.git
npm install
step:3 🔑 Setup Gemini API Key: Rename .env.example
to .env
and paste your Gemini API key in VITE_GEMINI_KEY
.
Get GEMINI_KEY | Get azure TTS Subscription key
Addtional:
VITE_GEMINI_MODEL="gemini-1.5-flash-latest" #"gemini-1.5-pro" or "gemini-1.5-flash"
change your TTS key(azure ,in file src/components/synthesis.tsx line 13):
var subscriptionKey = "your azure subscriptionKey" ;
var serviceRegion = "your serviceRegion e.g eastasia" ;
npm run dev
#or
npm run dev -- --host 0.0.0.0
#or
./run.sh (change your key in file)
Visit localhost:3000 to experience Gemini on your machine.
Enjoying Gemini? Show your support by giving it a star on GitHub! ⭐
Simply say "Hey Gemini," show an object to the camera, and witness the magic of multimodal AI.
Visit the Gemini api doc for in-depth information about Gemini's capabilities.
Thanks iamsrikanthnani for init version.
Your contributions make Gemini even more powerful.
Unlock the potential of AI with Gemini—your gateway to the future.