This project integrates Google's Gemini-Pro AI models into a Streamlit web application to provide a suite of AI-powered tools. Each tool leverages a specific aspect of the Gemini-Pro model to interact with different types of data, including chat, vision, PDFs, and YouTube videos. The application is designed to showcase the capabilities of AI in processing and generating content based on user inputs.
-
Chat Model (chat.py):
- Implements a chatbot using the Gemini-Pro model.
- Users can interact with the chatbot, which processes and responds to queries in real-time.
-
Vision Model (vision.py):
- Utilizes the Gemini-Pro vision model to analyze images.
- Users can upload images and receive descriptions or insights based on the visual content.
-
PDF Model (pdf.py):
- Processes uploaded PDF documents to extract text.
- Performs text chunking and vectorization for efficient search and retrieval.
- Supports a question-answering feature where users can ask questions related to the content of the uploaded PDFs.
-
YouTube Model (yt.py):
- Extracts transcripts from YouTube videos using the YouTube Transcript API.
- Summarizes the video content using the Gemini-Pro text generation model, providing concise notes on the video's content.
- Python: Primary programming language used.
- Streamlit: Framework for building the web interface.
- Google Generative AI: API used for accessing Gemini-Pro models.
- YouTube Transcript API: For retrieving video transcripts.
- PyPDF2 and LangChain: For PDF processing and text analysis.
- Clone the repository.
- Install dependencies using
pip install -r requirements.txt
. - Set up environment variables for the Google API key in a
.env
file. - Run the application using
streamlit run main.py
.
Select the desired model from the dropdown menu in the application:
- Chat: Interact with the AI chatbot.
- Vision: Upload images to analyze.
- PDF: Upload PDFs for text extraction and querying.
- YouTube: Enter a YouTube video link to get detailed notes.