Skip to content

Google's Gemini implemented with GPT-4 Vision, Whisper and Resemble AI

License

Notifications You must be signed in to change notification settings

ZohaibAhmed/real-gemini

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real Gemini

Google's Gemini implemented with GPT-4 Vision, Whisper and Resemble AI

This project leverages the power of AI to answer questions based on visual inputs -- like Google's Gemini demo. It integrates GPT-4 Vision for image understanding, Whisper for voice recognition, and Resemble AI for voice synthesis, creating a comprehensive system capable of interpreting visual data and responding verbally.

realgemini.mov

Features

  • Visual Question Answering: Uses GPT-4 Vision to interpret images from a camera feed and answer questions related to the visual content.
  • Voice Recognition: Employs Whisper for accurate speech-to-text conversion, allowing users to ask questions verbally.
  • Voice Synthesis: Utilizes Resemble AI for generating realistic voice responses, enhancing the interactive experience.

Prerequisites

  • Python 3.x
  • Camera hardware compatible with your system
  • Microphone and speaker setup for voice input and output

Installation

  1. Clone the Repository

    git clone git@github.com:ZohaibAhmed/real-gemini.git
    cd real-gemini
  2. Install Dependencies Install the required Python packages:

    pip install -r requirements.txt
  3. Environment Setup

    • Create a .env file in the project root.
    • Add your Resemble AI and OpenAI credentials to the .env file:

Usage

Run the application using the following command:

python run.py

Place the camera in view of the subject and use a microphone to ask questions. The system will process the visual and audio inputs to provide a spoken answer.

Contributions

Contributions to this project are welcome. Please create a pull request with your proposed changes.

Acknowledgements

Special thanks to OpenAI for GPT-4 and Whisper APIs, and to Resemble AI for their voice synthesis technology.

About

Google's Gemini implemented with GPT-4 Vision, Whisper and Resemble AI

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages