Skip to content

This repository hosts the source code for the 'Reading Companion,' an application using computer vision-based Optical Character Recognition (OCR) integrated with the Gemini API.

License

Notifications You must be signed in to change notification settings

Sajitha-Madugalle/Reading_Companion_OpenCV

Repository files navigation

Reading Companion- CV based OCR application with Gemini Integration

This is a simple yet powerful application that uses Optical Character Recognition (OCR) and Computer Vision to detect open windows, extract their content, filter the text, and suggest important phrases to search on Google using Gemini API integration. The application is useful in various scenarios, such as reading PDFs or attending lectures, by highlighting key points and providing one-click Google search functionality.

Table of Contents

How to Use It

  1. Download and extract the files:

    • Clone the repository or download the file.zip file from GitHub and extract it.
    • Navigate to the directory where the files are located.
  2. Run the application:

    • Execute the main.py script to start the application.
    python main.py
  3. Using the application:

    • A window will open displaying a list of currently opened windows.
    • Enter the number corresponding to the window you want to capture and click "Go".
    • The application will display important phrases detected from the window content. Click any phrase to automatically search it on Google.

    Simple as that!

Used Technologies

  • Computer Vision: For capturing and processing window screenshots.
  • OCR (Optical Character Recognition): To extract text from images using Tesseract.
  • AI Technology - Gemini API Integration: To analyze extracted text and suggest important phrases for Google search.

Features

  • Capture Open Windows: Lists and captures screenshots of currently open windows.
  • OCR Extraction: Extracts text from captured window screenshots.
  • AI-Powered Text Analysis: Uses Gemini API to analyze text and suggest important search phrases.
  • One-Click Google Search: Provides a simple GUI to search suggested phrases on Google with one click.
  • Always on Top: Keeps the application window always on top for easy access.

Installation

  1. Clone the repository:

    git clone https://github.com/Sajitha-Madugalle/Reading_Companion_OpenCV.git
    cd Reading_Companion_OpenCV
  2. Install dependencies: Ensure you have Python installed, then install the required packages using:

    pip install -r requirements.txt
  3. Configure Tesseract:

    • Download and install Tesseract OCR from here.
    • Ensure the Tesseract executable path is correctly set: Mostly in
    pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
  4. Run the application:

    python main.py

Screenshots

First Window showing the currently opened windows
First Window showing the currently opened windows

       

Extracted Text         Captured Screenshot
Selected window and important phrases to search suggested by Gemini

       

Google Search Results
Google Search Results

       

Contributing

Contributions are welcome!

Problems

  • Refresh rate and FPS tradeoff
  • Improved GUI
  • It is great if there is an floating icon to openup the window, so the reader is not get distracted.
  • Applying Deep Learning algorithms to identify Maths phrases, and solve them

Please feel free to submit a Pull Request or open an Issue to improve this project.

  1. Fork the repository.
  2. Create your feature branch:
    git checkout -b feature/YourFeature
  3. Commit your changes:
    git commit -m 'Add your feature'
  4. Push to the branch:
    git push origin feature/YourFeature
  5. Open a Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

References

  1. Fast Window Capture - OpenCV Object Detection in Games #4 See here. by Learn Code By Gaming
  2. Realtime Text Detection in Images using Tesseract | OpenCV | Python | Tutorial for beginners See here. by DeepLearning_by_PhDScholar

About

This repository hosts the source code for the 'Reading Companion,' an application using computer vision-based Optical Character Recognition (OCR) integrated with the Gemini API.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages