Merge pull request #105 from taorye/dev

feat: app gesture classifier
sipeed · Jan 16, 2025 · 2b6050e · 2b6050e
2 parents 9d6dfe9 + b495106
commit 2b6050e
Show file tree

Hide file tree

Showing 16 changed files with 489 additions and 2 deletions.
diff --git a/docs/doc/assets/handposex_14class.jpg b/docs/doc/assets/handposex_14class.jpg
diff --git a/docs/doc/en/sidebar.yaml b/docs/doc/en/sidebar.yaml
@@ -85,6 +85,8 @@ items:
         label: Hand landmarks
     -   file: vision/body_pose_classification.md
         label: Human pose classifier
+    -   file: vision/hand_gesture_classification.md
+        label: Hand geature classifier
     -   file: vision/maixhub_train.md
         label: MaixHub online AI training
     -   file: vision/customize_model_yolov5.md

diff --git a/docs/doc/en/video/uvc_streaming.md b/docs/doc/en/video/uvc_streaming.md
@@ -68,7 +68,7 @@ uvcs.show(img)
 This approach offers high performance with a single-process implementation, but USB functionality will only be available when the process is running. Therefore, when stopping this process, it's important to note that the enabled `Rndis` and `NCM` functionalities will temporarily become inactive, causing a network disconnection.
 
 **Reference example source code path:**  
-`MaixPy/examples/vision/streaming/uvc_stream.py`
+`MaixPy/examples/vision/streaming/uvc_server.py`
 
 **Also packaged as an app source code path:**  
 `MaixCDK/projects/app_uvc_camera/main/src/main.cpp`

diff --git a/docs/doc/en/vision/hand_gesture_classification.md b/docs/doc/en/vision/hand_gesture_classification.md
@@ -0,0 +1,46 @@
+---
+title: MaixCAM MaixPy Hand Gesture Classification Based on Hand Keypoint Detection
+---
+
+## Introduction
+
+The `MaixCAM MaixPy Hand Gesture Classification Based on Hand Keypoint Detection` can classify various hand gestures.
+
+The current dataset used is the `14-class static hand gesture dataset` with a total of 2850 samples divided into 14 categories.  
+[Dataset Download Link (Baidu Netdisk, Password: 6urr)](https://pan.baidu.com/s/1Sd-Ad88Wzp0qjGH6Ngah0g)
+
+![](../../assets/handposex_14class.jpg)
+
+This app is implemented in `MaixPy/projects/app_hand_gesture_classifier/main.py`, and the main logic is as follows:
+
+1. Load the `14-class static hand gesture dataset` processed by the **Hand Keypoint Detection** model, extracting `20` relative wrist coordinate offsets.  
+2. Initially train on the first `4` classes to support basic gesture recognition.  
+3. Use the **Hand Keypoint Detection** model to process the camera input and visualize classification results on the screen.  
+4. Tap the top-right `class14` button to add more samples and retrain the model for full `14-class` gesture recognition.  
+5. Tap the bottom-right `class4` button to remove the added samples and retrain the model back to the `4-class` mode.  
+6. Tap the small area between the buttons to display the last training duration at the top of the screen.  
+7. Tap the remaining large area to show the currently supported gesture classes on the left side—**green** for supported, **yellow** for unsupported.  
+
+## Demo Video
+
+<video playsinline controls autoplay loop muted preload src="/static/video/hand_gesture_demo.mp4" type="video/mp4">
+Classifier Result Video
+</video>
+
+1. The video demonstrates the `14-class` mode after executing step `4`, recognizing gestures `1-10` (default mapped to other meanings), **OK**, **thumbs up**, **finger heart** (requires the back of the hand, hard to demonstrate in the video but can be verified), and **pinky stretch**—a total of `14` gestures.
+
+2. Then, step `5` is executed, reverting to the `4-class` mode, where only gestures **1**, **5**, **10** (fist), and **OK** are recognizable. Other gestures fail to produce correct results. During this process, step `7` was also executed, showing the current `4-class` mode—only the first 4 gestures are green, and the remaining 10 are yellow.
+
+3. Step `4` is executed again, restoring the `14-class` mode, and previously unrecognized gestures in the `4-class` mode are now correctly identified.
+
+4. Finally, dual-hand recognition is demonstrated, and both hands' gestures are accurately recognized simultaneously.
+
+## Others
+
+The demo video captures the **maixvision** screen preview window in the top-right corner, matching the actual on-screen display.
+
+For detailed implementation, please refer to the source code and the above analysis.
+
+Further development or modification can be directly done based on the source code, which includes comments for guidance.
+
+If you need additional assistance, feel free to leave a message on **MaixHub** or send an email to the official company address.
diff --git a/docs/doc/zh/sidebar.yaml b/docs/doc/zh/sidebar.yaml
@@ -85,6 +85,8 @@ items:
         label: 手部关键点检测
     -   file: vision/body_pose_classification.md
         label: 人体姿态分类器
+    -   file: vision/hand_gesture_classification.md
+        label: 手势分类器
     -   file: vision/maixhub_train.md
         label: MaixHub 在线训练 AI 模型
     -   file: vision/customize_model_yolov5.md

diff --git a/docs/doc/zh/video/uvc_streaming.md b/docs/doc/zh/video/uvc_streaming.md
@@ -61,7 +61,7 @@ uvcs.show(img)
 
 高性能单进程实现，但仅在运行时 USB 全部功能才可用，故停止该进程时需要注意仍启用的 `Rndis` 和 `NCM` 会暂时失效，断开网络链接。
 
-参考示例源码路径：`MaixPy/examples/vision/streaming/uvc_stream.py`
+参考示例源码路径：`MaixPy/examples/vision/streaming/uvc_server.py`
 
 另有封装成 APP 的源码路径：`MaixCDK/projects/app_uvc_camera/main/src/main.cpp`
 

diff --git a/docs/doc/zh/vision/hand_gesture_classification.md b/docs/doc/zh/vision/hand_gesture_classification.md
@@ -0,0 +1,50 @@
+---
+title: MaixCAM MaixPy 基于手部关键点检测结果进行进行手势分类
+---
+
+
+## 简介
+
+由`MaixCAM MaixPy 基于手部关键点检测结果进行进行手势分类`可分类手势。
+
+目前使用的数据集为`14 类静态手势数据集`，[数据集下载地址(百度网盘 Password: 6urr )](https://pan.baidu.com/s/1Sd-Ad88Wzp0qjGH6Ngah0g)，数据集共 2850 个样本，分为 14 类。  
+
+
+![](../../assets/handposex_14class.jpg)
+
+
+该 app 实现位于 `MaixPy/projects/app_hand_gesture_classifier/main.py`，主要逻辑是
+
+1. 加载 `14 类静态手势数据集` 经 `手部关键点检测` 处理后的 `20` 个相对手腕的坐标偏移
+2. 初始训练前 `4` 个分类，以支持手势识别
+3. 加载 `手部关键点检测` 模型处理摄像头并通过该分类器将结果可视化在屏幕上
+4. 点击右上角 `class14` 可增添剩余分类样本再训练以达到 `14` 分类手势
+5. 点击右下角 `class4` 可移除上一步添加的分类样本再训练以达到 `4` 分类手势
+6. 点击按钮之间的小块区域，可在顶部显示分类器上一次训练的时长
+7. 点击其余大块区域，可在左侧显示当前支持的分类类别，绿色表示支持，黄色表示不支持
+
+
+
+## 效果视频
+<video playsinline controls autoplay loop muted preload src="/static/video/hand_gesture_demo.mp4" type="video/mp4">
+Classifier Result video
+</video>
+
+1. 视频内容为执行了上述第 `4` 步后的 `14` 分类模式，可识别手势 `1-10` （默认对应其他英文释义），ok，大拇指点赞，比心（需要手背，拍摄时不好演示，可自行验证），小拇指伸展 一共 `14` 种手势。
+
+2. 紧接着执行第 `5` 步，回退到 `4` 分类模式，仅可识别 1，5，10（握拳）和 ok，其余的手势都无法识别到正常结果。期间也有执行 第 `7` 步展示了当前是 `4` 分类模式，因为除了前 4 种手势为绿，后 10 种全部为黄色显示。
+
+3. 再就是执行第 `4` 步，恢复到 `14` 分类模式，`4` 分类模式无法识别的手势现在也恢复正确识别了。
+
+4. 末尾展示了双手的识别，实测可同时正确识别两只手的手势。
+
+
+## 其它
+
+效果视频为捕获的 maixvision 右上的屏幕预览窗口而来，和屏幕实际显示内容一致。
+
+详细实现可见源码和上述分析了。
+
+二次开发或修改也可直接基于源码完成，内附有注释。
+
+如确实仍有需要协助的，可与 maixhub 上发帖留言或发 email 到公司邮箱。
diff --git a/docs/static/video/hand_gesture_demo.mp4 b/docs/static/video/hand_gesture_demo.mp4
diff --git a/projects/app_hand_gesture_classifier/.gitignore b/projects/app_hand_gesture_classifier/.gitignore
@@ -0,0 +1,5 @@
+
+build
+dist
+/CMakeLists.txt
+
diff --git a/projects/app_hand_gesture_classifier/LinearSVC.py b/projects/app_hand_gesture_classifier/LinearSVC.py
@@ -0,0 +1,184 @@
+import numpy as np
+
+class LinearSVC:
+    class StandardScaler:
+        mean:np.ndarray
+        std:np.ndarray
+        def transform(self, X):
+            return (X - self.mean) / self.std
+
+        def fit_transform(self, X):
+            self.mean = np.mean(X, axis=0)
+            self.std = np.std(X, axis=0)
+            return self.transform(X)
+
+    def __init__(self, C=1.0, learning_rate=0.01, max_iter=1000):
+        self.C = C
+        self.learning_rate = learning_rate
+        self.max_iter = max_iter
+        self.scaler = self.StandardScaler()
+
+    def save(self, filename: str):
+        np.savez(filename,
+            C = self.C,
+            learning_rate = self.learning_rate,
+            max_iter = self.max_iter,
+            scaler_mean = self.scaler.mean,
+            scaler_std = self.scaler.std,
+            classes = self.classes,
+            _W =  self._W,
+            _B =  self._B,
+        )
+
+    @classmethod
+    def load(cls, filename: str):
+        npzfile = np.load(filename)
+        self = cls(
+            C=float(npzfile["C"]), 
+            learning_rate=float(npzfile["learning_rate"]), 
+            max_iter=float(npzfile["max_iter"])
+        )
+        self.scaler.mean = npzfile["scaler_mean"]
+        self.scaler.std = npzfile["scaler_std"]
+        self.classes = npzfile["classes"]
+        self._W = npzfile["_W"]
+        self._B = npzfile["_B"]
+        return self
+
+    def _train_binary_svm(self, X, y):
+        """
+        训练一个二分类 SVM。
+        """
+        n_samples, n_features = X.shape
+        w = np.zeros(n_features)
+        b = 0
+        for _ in range(self.max_iter):
+            scores = np.dot(X, w) + b # 计算所有样本的预测得分
+            margin = y * scores  # (n_samples,) 计算每个样本的 margin
+            mask = margin < 1 # 获取不满足条件的样本，满足 condition 即为支持向量
+            X_support = X[mask]  # 支持向量
+            y_support = y[mask]  # 支持向量的标签
+            if len(X_support) > 0: # 向量化更新公式
+                w -= self.learning_rate * (2 * w / n_samples - self.C * np.dot(X_support.T, y_support))  # 批量更新 w
+                b -= self.learning_rate * (-self.C * np.sum(y_support))  # 批量更新 b
+        return w, b
+
+    def fit(self, X, y):
+        """
+        训练多分类 SVM。
+        参数：
+        - X: (n_samples, n_features) 的特征矩阵
+        - y: (n_samples,) 的标签数组，值为多个类别
+        """
+        self.classes = np.unique(y)  # 提取所有类别
+        self._W = np.zeros((len(self.classes), X.shape[1]))
+        self._B = np.zeros(len(self.classes))
+        for i, cls in enumerate(self.classes):
+            binary_y = np.where(y == cls, 1, -1) # 构造一对多的标签
+            w, b = self._train_binary_svm(X, binary_y)
+            self._W[i] = w
+            self._B[i] = b
+
+    def forward(self, X):
+        return np.dot(X, self._W.T) + self._B
+
+    def predict(self, X):
+        return self.classes[np.argmax(self.forward(X), axis=1)]  # 返回得分最高的类别
+
+    def predict_with_confidence(self, X):
+        def softmax(x):
+            x_max = np.max(x, axis=-1, keepdims=True) # 处理数值稳定性：减去最大值
+            exp_x = np.exp(x - x_max)
+            return exp_x / np.sum(exp_x, axis=-1, keepdims=True)
+        res = self.forward(X) # (n_samples, n_classes)
+        confidences = softmax(res) # (n_samples, n_classes)
+        return self.classes[np.argmax(res, axis=1)], np.max(confidences, axis=1)  # 返回得分最高的类别
+
+
+class LinearSVCManager:
+    def __init__(self, clf: LinearSVC=LinearSVC(), X=None, Y=None, pretrained=False):
+        if X is None:
+            X = np.empty((0, 0))
+        if Y is None:
+            Y = np.empty((0,))
+
+        # 转换为 NumPy 数组
+        if isinstance(X, list):
+            X = np.array(X)
+        if isinstance(Y, list):
+            Y = np.array(Y)
+
+        # 类型检查
+        if not isinstance(X, np.ndarray):
+            raise TypeError("X must be a list or numpy array.")
+        if not isinstance(Y, np.ndarray):
+            raise TypeError("Y must be a list or numpy array.")
+
+        if len(X) != len(Y):
+            raise ValueError("Length of X and Y must be equal.")
+        if len(Y) == 0:
+            raise ValueError("A classifier (clf) must be provided with training samples X and Y.")
+
+        if pretrained:
+            if clf is None:
+                raise ValueError("A pretrained classifier (clf) can't be `None`.")
+
+        if clf is None:
+            if pretrained:
+                raise ValueError("A pretrained classifier (clf) can't be `None`.")
+            clf = LinearSVC()
+
+        self.clf = clf
+        self.samples = (X, Y)
+
+        if not pretrained:
+            self.train()
+
+    def train(self):
+        X_scaled = self.clf.scaler.fit_transform(self.samples[0])
+        self.clf.fit(X_scaled, self.samples[1])
+        print(f"{len(self.samples[1])} samples have been trained.")
+
+    def test(self, X):
+        X = np.array(X)
+        if X.shape[-1] != self.samples[0].shape[1]:
+            raise ValueError("Tested data dimension mismatch.")
+        X_scaled = self.clf.scaler.transform(X)
+        return self.clf.predict_with_confidence(X_scaled)
+
+    def add(self, X, Y):
+        X = np.array(X)
+        Y = np.array(Y)
+
+        if X.shape[-1] != self.samples[0].shape[1]:
+            raise ValueError("Added data dimension mismatch.")
+
+        if len(self.samples[0])>0:
+            self.samples = (
+                np.vstack([self.samples[0], X]),
+                np.concatenate([self.samples[1], Y])
+            )
+        else:
+            self.samples = (X, Y)
+
+        self.train()
+
+    def rm(self, indices):
+        X, Y = self.samples
+
+        if any(idx < 0 or idx >= len(Y) for idx in indices):
+            raise IndexError("Index out of bounds.")
+
+        mask = np.ones(len(Y), dtype=bool)
+        mask[indices] = False
+
+        self.samples = (X[mask], Y[mask])
+
+        if len(self.samples[1]) > 0:
+            self.train()
+        else:
+            print("Warning: All data has been removed. Model is untrained now.")
+
+    def clear_samples(self):
+        self.samples = (np.empty((0, self.samples[0].shape[1])), np.empty((0,)))
+        print("All training samples have been cleared.")
diff --git a/projects/app_hand_gesture_classifier/README.md b/projects/app_hand_gesture_classifier/README.md
@@ -0,0 +1,15 @@
+The touchscreen is segmented into four sections:
+
+1. The first two are circles located in the upper-right and lower-right corners.
+
+2. The third section is the area between these two circles.
+
+3. The fourth section is the largest, covering the entire left area.
+
+Upon pressing them, the display shows the following messages:
+
+1. Releasing without moving away will activate them.
+
+2. It indicates the elapsed time since the last training session.
+
+3. It shows the number of active classes.
diff --git a/projects/app_hand_gesture_classifier/app.yaml b/projects/app_hand_gesture_classifier/app.yaml
@@ -0,0 +1,14 @@
+id: gesture_classifier
+name: Gesture Classifier
+name[zh]: 手势分类
+version: 1.0.0
+author: Taorye@Sipeed
+icon: icon.png
+desc: Classify the hand gesture.
+files:
+  - app.yaml
+  - icon.png
+  - main.py
+  - LinearSVC.py
+  - clf_dump.npz
+  - trainSets.npz
diff --git a/projects/app_hand_gesture_classifier/clf_dump.npz b/projects/app_hand_gesture_classifier/clf_dump.npz
diff --git a/projects/app_hand_gesture_classifier/icon.png b/projects/app_hand_gesture_classifier/icon.png