Skip to content

ESP32-Work/Text-Recognition-ESP32-CAM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Text Recognition using ESP32-CAM and OCR

Table of Contents

  1. Overview
  2. Requirements
  3. Installation
  4. Usage
  5. Code Explanation
  6. Demo
  7. Libraries Used
  8. Extension for VS Code
  9. Clone and Implementation
  10. Contributing
  11. License

Overview

This project utilizes an ESP32-CAM module to capture images, perform Optical Character Recognition (OCR) using Tesseract, and display the live stream with extracted text. The ESP32-CAM serves the images through a local web server, and a Python script on the client side processes the stream for text extraction.

Requirements

  • ESP32-CAM module
  • Arduino IDE or PlatformIO extension for VS Code
  • Python
  • Tesseract OCR
  • OpenCV library for Python

Installation

  1. Clone this repository.
  2. Upload the code to the ESP32-CAM using Arduino IDE or PlatformIO in VS Code.
  3. Ensure Python is installed on your system.
  4. Install required Python libraries using:
    pip install numpy 
    pip install opencv-python 
  5. Install Tesseract OCR. Refer to Tesseract Installation Guide.
    sudo apt install tesseract-ocr -y
    pip install pytesseract

Usage

  1. Power up the ESP32-CAM and connect to its access point (AP) with the provided SSID and password.
  2. Run the Python script on your local machine.
  3. The live stream with extracted text will be displayed on your screen.

Code Explanation

Esp32 Code

The code configures the ESP32-CAM, sets up a web server, and handles different image resolutions. It uses the esp32cam library to capture images and serve them through HTTP.

Snippet:

// Arduino Code Snippet
// (Refer to the full Arduino code in main.cpp)
#include <WebServer.h>
#include <WiFi.h>
#include <esp32cam.h>
#include <SPI.h>
#include <Wire.h>

// ...

void serveJpg() {
  // Capture image and serve as JPEG
}

// ...

Python Code

The Python script reads the live stream from the ESP32-CAM, performs OCR using Tesseract, and displays the stream with extracted text using OpenCV. Snippet:

# Python Code Snippet
# (Refer to the full Python code in main.py)
import cv2
import urllib.request
import numpy as np
import pytesseract

# ...

while True:
    # Read live stream from ESP32-CAM
    img_resp = urllib.request.urlopen(url)
    imgnp = np.array(bytearray(img_resp.read()), dtype=np.uint8)
    frame = cv2.imdecode(imgnp, -1)

    # Extract text using Tesseract
    text = pytesseract.image_to_string(frame, config='--psm 6')

    # ...

    key = cv2.waitKey(1) & 0xFF
    if key == ord('q'):
        break

cv2.destroyAllWindows()

Demo

Screencast.from.12-14-2023.01.27.30.AM.webm

Libraries Used

  • WebServer
  • WiFi
  • esp32cam
  • OpenCV
  • Pytesseract

Extension for VS Code

To develop and upload code to ESP32 using VS Code, install the PlatformIO extension.

Clone and Implementation

git clone https://github.com/ESP32-Work/Text-Recognition-ESP32-CAM

Open the project in VS Code with PlatformIO extension installed. Upload the Arduino code to the ESP32-CAM and run the Python script.

Move to the directory containing the python script. Ensure that it is executable.

chmod +x webcam.py

Run the script.

python webcam.py  or ./webcam.py

Contributing

Contributions are welcome! Open an issue or create a pull request to contribute.

License

This project is licensed under the MIT License.