VisionGuard: A Journey Through Google Summer of Code 2024 with OpenVINO

As the sun sets on my 16-week journey with Google Summer of Code (GSoC) 2024, I'm thrilled to share the fruits of my labor: VisionGuard, an innovative desktop application designed to combat eye strain and promote healthier computing habits. This project, developed under the mentorship of the OpenVINO Toolkit team, represents a significant step forward in leveraging advanced computer vision technology for personal well-being.

Project Overview

VisionGuard is a privacy-focused screen time management tool that uses your computer's webcam to monitor your gaze and encourage healthy viewing habits. By operating entirely locally and supporting inference on AI PC's Neural Processing Units (NPUs), VisionGuard offers a unique blend of functionality, performance, and data security.

Key Features

During the GSoC period, I successfully implemented the following features:

Real-time Eye Gaze Tracking: Integrated OpenVINO’s gaze detection engine to accurately track user gaze without compromising privacy.
Customizable Break Notifications: Developed a smart alert system that reminds users to take breaks based on the 20-20-20 rule.
Comprehensive Statistics: Built a statistics calculator to provide daily and weekly screen time insights.
Flexible Device Support: Enabled seamless switching between CPU, GPU, and NPU for inference, optimizing performance across hardware configurations.
Multi-Camera Compatibility: Supported up to five camera devices for enhanced flexibility.
Aesthetic Customization: Designed both dark and light themes for user preference.
Resource Optimization: Integrated a system resource monitor and frame processing limits to ensure efficient performance.
System Tray Integration: Developed a system tray application for quick access to key features without desktop clutter.

Technical Deep Dive

%%{init: {'theme': 'base', 'themeVariables': { 'background': '#ffffff' }}}%%
graph TD
    subgraph Client
        UI[User Interface]
        GVD[Gaze Vector Display]
        GCW[Calibration Window]
        STW[Screen Time Widget]
        STS[Statistics Window]
        CPR[Camera Permission Request]
        RCK[Run-Time Control Keys]
    end

    subgraph Backend
        CL[Core Logic]
        GDM[Gaze Detection Engine]
        GVC[Gaze Vector Calibration]
        EGT[Eye Gaze Time Tracker]
        BNS[Break Notification System]
        SC[Statistics Calculator]
        MC[Metric Calculator]
        PC[Performance Calculator]
    end

    subgraph Data
        UM[Usage Metrics]
    end

    UI <-->|Input/Output| CL
    CPR -->|Permission Status| CL
    RCK -->|Control Commands| CL
    CL <--> UM
    CL <--> GDM
    CL <--> GVC
    CL <--> EGT
    CL --> BNS
    CL <--> SC
    CL <--> MC
    CL <--> PC
    
    BNS --> UI
    CL --> GVD
    CL --> GCW
    CL --> STW
    SC --> STS
    PC --> UI

    style UI fill:#f0f9ff,stroke:#0275d8,stroke-width:2px
    style GVD fill:#f0f9ff,stroke:#0275d8,stroke-width:1px
    style GCW fill:#f0f9ff,stroke:#0275d8,stroke-width:1px
    style STW fill:#f0f9ff,stroke:#0275d8,stroke-width:1px
    style STS fill:#f0f9ff,stroke:#0275d8,stroke-width:1px
    style CPR fill:#f0f9ff,stroke:#0275d8,stroke-width:1px
    style RCK fill:#f0f9ff,stroke:#0275d8,stroke-width:1px
    style CL fill:#fff3cd,stroke:#ffb22b,stroke-width:2px
    style GDM fill:#fff3cd,stroke:#ffb22b,stroke-width:1px
    style GVC fill:#fff3cd,stroke:#ffb22b,stroke-width:1px
    style EGT fill:#fff3cd,stroke:#ffb22b,stroke-width:1px
    style BNS fill:#fff3cd,stroke:#ffb22b,stroke-width:1px
    style SC fill:#fff3cd,stroke:#ffb22b,stroke-width:1px
    style MC fill:#fff3cd,stroke:#ffb22b,stroke-width:1px
    style PC fill:#fff3cd,stroke:#ffb22b,stroke-width:1px
    style UM fill:#f2dede,stroke:#d9534f,stroke-width:1px

Loading

For a detailed architectural overview of each component, please refer to the Detailed Component Architecture document.

Client

The client consists of two main components:

Main Window Application: Provides the primary user interface.
System Tray Application: Runs in the background within the OS system tray.

Gaze Detection Engine

The heart of VisionGuard is its gaze detection engine, leveraging several models from the OpenVINO model zoo:

Face Detection: face-detection-retail-0005
Head Pose Estimation: head-pose-estimation-adas-0001
Facial Landmark Detection: facial-landmarks-35-adas-0002
Eye State Estimation: open-closed-eye-0001
Gaze Estimation: gaze-estimation-adas-0002

These models work together to create a robust gaze detection pipeline.

graph TD
A[Image Input] --> B[Face Detection]
B --> |Face Image| C[Facial Landmark Detection]
B --> |Face Image| D[Head Pose Estimation]
C --> E[Eye State Estimation]
D --> |Head Pose Angles| F[Gaze Estimation]
C --> |Eye Image| F
E --> |Eye State| F
F --> |Gaze Vector|G[Gaze Time Estimation]
G -->  H[Accumulate Screen Gaze Time]

style B fill:#FFDDC1,stroke:#333,stroke-width:2px
style C fill:#FFDDC1,stroke:#333,stroke-width:2px
style D fill:#FFDDC1,stroke:#333,stroke-width:2px
style E fill:#FFDDC1,stroke:#333,stroke-width:2px
style F fill:#FFDDC1,stroke:#333,stroke-width:2px
style G fill:#FFDDC1,stroke:#333,stroke-width:2px

style A fill:#C1E1FF,stroke:#333,stroke-width:2px
style H fill:#C1E1FF,stroke:#333,stroke-width:2px

Loading

Calibration Process

One of the most critical aspects of ensuring VisionGuard’s accuracy is the calibration process. Accurate calibration is essential for precise gaze tracking, as it directly influences how well the application can detect and respond to where the user is looking on the screen. The calibration process I developed is both user-friendly and technically robust, designed to adapt to various screen sizes and user positions.

Four-Point Gaze Capture

The calibration process begins with a Four-Point Gaze Capture. Users are prompted to focus on four green dots that appear sequentially in the corners of the screen. This step is crucial for gathering data on the user's gaze behavior from different angles. The process ensures that multiple gaze points are captured for each corner, improving the overall accuracy of the calibration.

Figure: A screen with four green dots representing the four-point calibration process.

Convex Hull Calculation

Once the gaze data is captured, the next step is the Convex Hull Calculation. The system takes all the captured gaze points and computes the smallest polygon that can enclose these points, known as the convex hull. This polygon represents the boundary within which the user's gaze is expected to fall.

Figure: Visualization of the convex hull enclosing the captured gaze points.

Error Margin Application

To account for potential inaccuracies in gaze detection, an Error Margin Application is performed. The convex hull is expanded by a predetermined margin (typically 150 pixels) to create a buffer zone. This extension ensures that slight deviations in gaze tracking won’t lead to incorrect detections.

Figure: The error margin applied to the convex hull to account for tracking inaccuracies.

Final Calibration Point Determination

The final step in the calibration process is determining the Final Calibration Points. The extended convex hull is intersected with the screen boundaries, and the resulting points form the final calibration set. These points are crucial for accurate gaze tracking, ensuring that the system can reliably detect whether the user is looking at the screen.

Figure: The final calibration points determined after applying the error margin.

This comprehensive calibration process not only improves accuracy but also enhances the user experience by making the setup process straightforward and reliable.

Frame Processing and Gaze Time Update Algorithm

At the core of VisionGuard’s functionality is its ability to process video frames in real time and update gaze-related metrics. This system works by analyzing each frame captured by the webcam to determine the user’s gaze direction and then updating the screen time metrics accordingly.

Face and Gaze Detection

The process starts with Face and Gaze Detection. Using models from the OpenVINO toolkit, the application first detects the user’s face and then estimates their gaze direction. This step is critical as it forms the basis for all subsequent calculations.

Figure: The face and gaze detection process, which identifies the user’s gaze direction.

Gaze Screen Intersection

Once the gaze direction is estimated, the next step is to calculate the Gaze Screen Intersection. Here, the 3D gaze vector is projected onto the 2D screen plane. This conversion is essential to determine whether the user is looking at the screen and, if so, where on the screen their gaze is focused.

Figure: Illustration of the gaze vector intersecting with the 2D screen plane.

Gaze Time Update

Based on the intersection point, the system then performs a Gaze Time Update. If the user’s gaze is on the screen and their eyes are open, the application accumulates screen time. Conversely, if the gaze is off the screen or the eyes are closed, the system updates the gaze lost duration. If this duration exceeds a specified threshold, the accumulated screen time is reset.

Figure: Flowchart showing how gaze time is updated based on user behavior.

Visual Feedback and Performance Metrics

Finally, the system provides Visual Feedback by marking detected facial features and displaying the current gaze time and lost duration on the frame. Alongside this, Performance Metrics such as CPU utilization, memory usage, and frame processing speed are tracked to ensure that VisionGuard runs efficiently.

Figure: Example of visual feedback provided by VisionGuard, along with performance metrics.

Point-in-Polygon Algorithm

Determining whether the user's gaze is within the screen boundaries is a critical task that VisionGuard accomplishes using a Point-in-Polygon Algorithm. Specifically, VisionGuard employs a ray-casting technique, which is a widely-used method for solving this problem.

How the Algorithm Works

The algorithm works by casting a ray from the gaze point and counting the number of intersections this ray has with the edges of the polygon representing the screen area. If the number of intersections is odd, the point lies inside the polygon; if even, it lies outside. This method is effective for both convex and concave polygons, making it highly adaptable.

Figure: Diagram illustrating how the ray-casting algorithm determines if a point is inside a polygon.

The Point-in-Polygon algorithm is particularly suited for VisionGuard’s 3D gaze estimation as it accurately maps the 3D gaze vector onto the 2D screen space. This mapping is crucial for reliable screen time tracking and ensuring that users receive timely notifications to take breaks.

Challenges and Learning

Throughout the GSoC experience, I encountered numerous challenges that significantly contributed to my learning and growth as a developer:

Cross-platform C++ Development: Developing a cross-platform C++ application presented unique challenges, particularly in ensuring compatibility across different operating systems like macOS, Windows, and Linux. I faced difficulties with different compilers, such as issues compiling OpenVINO’s Model Zoo demo with MSVC 2022, which required troubleshooting and problem-solving to ensure smooth builds across platforms.
Understanding and Implementing CMake: CMake, a powerful build system, required me to deepen my understanding of build configurations and dependencies. This knowledge was essential in managing the complexity of a cross-platform project like VisionGuard.
Low-Level Design Issues: Navigating C++'s low-level design complexities, particularly with object-oriented principles (OOP), was challenging. Implementing robust design patterns while maintaining performance required careful consideration of memory management and efficiency.
Screen Calibration for Accurate Gaze Detection: One of the more technically demanding tasks was calibrating the screen to accurately detect if the user was gazing at the screen. This required developing a reliable and user-friendly calibration process that could adapt to different screen sizes and user positions.
Adhering to C++ Development Standards: Ensuring that VisionGuard adhered to modern C++ development standards was vital for the project’s long-term maintainability. I had to revise my approach to permissions and data storage, moving from storing stats in the current working directory to using appropriate libraries for handling resources securely and efficiently.

These challenges not only helped me improve VisionGuard but also significantly enhanced my problem-solving skills and understanding of cross-platform development.

Future Work

While I'm proud of what I've accomplished during the GSoC period, there's always room for improvement. Some areas for future development include:

Implementing comprehensive unit tests to ensure reliability and maintainability
Developing GitHub workflows for automated building, testing, and linting
Adding support for multi-monitor setups and multi-user environments
Enhancing the statistics and reporting features for more detailed insights

Conclusion

My GSoC journey with OpenVINO and VisionGuard has been an incredible learning experience. I've had the opportunity to work with cutting-edge technology, collaborate with talented mentors, and create a tool that I believe can make a real difference in people's lives.

I want to express my heartfelt gratitude to my mentors, Dmitriy Pastushenkov, Ria Cheruvu for their guidance and support throughout this journey. I also want to thank the entire OpenVINO Toolkit community for their invaluable resources and assistance.

If you're interested in trying out VisionGuard or contributing to its development, please check out our GitHub repository. Your feedback and contributions are always welcome!

References

VisionGuard GitHub Repository
OpenVINO Toolkit
Google Summer of Code
OpenVINO Model Zoo
20-20-20 Rule for Eye Strain
Point in Polygon Algorithm
Convex Hull Algorithm
Tait-Bryan Angles

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BLOG_DRAFT.md

BLOG_DRAFT.md

VisionGuard: A Journey Through Google Summer of Code 2024 with OpenVINO

Project Overview

Key Features

Technical Deep Dive

Client

Gaze Detection Engine

Calibration Process

Four-Point Gaze Capture

Convex Hull Calculation

Error Margin Application

Final Calibration Point Determination

Frame Processing and Gaze Time Update Algorithm

Face and Gaze Detection

Gaze Screen Intersection

Gaze Time Update

Visual Feedback and Performance Metrics

Point-in-Polygon Algorithm

How the Algorithm Works

Challenges and Learning

Future Work

Conclusion

References

Files

BLOG_DRAFT.md

Latest commit

History

BLOG_DRAFT.md

File metadata and controls

VisionGuard: A Journey Through Google Summer of Code 2024 with OpenVINO

Project Overview

Key Features

Technical Deep Dive

Client

Gaze Detection Engine

Calibration Process

Four-Point Gaze Capture

Convex Hull Calculation

Error Margin Application

Final Calibration Point Determination

Frame Processing and Gaze Time Update Algorithm

Face and Gaze Detection

Gaze Screen Intersection

Gaze Time Update

Visual Feedback and Performance Metrics

Point-in-Polygon Algorithm

How the Algorithm Works

Challenges and Learning

Future Work

Conclusion

References