Added a 13th camera to Vision Pro
Trying to use Image recognition AI to analyze the video I'm watching on Vision Pro in real time. However, with Vision Pro, it is not possible to programmatically obtain videos and photos from the camera, which was very easy with iPhones and iPads. Vision Pro is equipped with 12 cameras, but their use is prohibited because it violates privacy to see what users are looking at in real time. So I decided to add a 13th camera (Cam13) myself.
Add an external camera, receive the video via WKWebView via WiFi, and overlay it on the Vision Pro screen. My idea is to perform AI processing on images captured by Cam13 to obtain an effect similar to analyzing images from Vision Pro camera.
*Once the ban on camera footage is lifted at this year's WWDC24, I will abandon Cam13 and use the official API (lol)
I could use a commercially available webcam, but I wanted to have more freedom in mounting it on Vision Pro, so I looked for an inexpensive, lightweight unit that could be controlled by programs and had an easy-to-handle power supply. I chose the M5Stack series UnitCAMS3 equipped with ESP32S3. It costs less than $15.
UnitCAMS3 ships with a demo sketch installed, but I have to manually initiate video transmission every time. I want it to start automatically, so I decided to program the sketch myself. I installed VSCode and PlatformIO on my Mac and built the sketch below.
// Access
// http://192.168.4.1/stream
// http://192.168.4.1/still
#include <Arduino.h>
#include <WiFi.h>
#include <WiFiClient.h>
#include <AsyncTCP.h>
#include <ESPAsyncWebServer.h>
#include <esp32cam.h>
#include <esp32cam-asyncweb.h>
#include "apis/camera/api_cam.h"
// WiFi
#define WIFI_SSID "--YourSSID--"
#define WIFI_PASSWORD "--password--"
bool isAccessPointMode = false;
// Camera
esp32cam::Resolution initialResolution;
constexpr esp32cam::Pins UnitCamS3{
D0: 6, D1: 15, D2: 16, D3: 7,
D4: 5, D5: 10, D6: 4, D7: 13,
XCLK: 11, PCLK: 12, VSYNC: 42,
HREF: 18, SDA: 17, SCL: 41,
RESET: 21, PWDN: -1,
};
// Web server
static void serveStill(AsyncWebServerRequest *request);
AsyncWebServer server(80);
void setup() {
Serial.begin(115200);
IPAddress ipAp(192, 168, 4, 1);
IPAddress ip(192, 168, 1, 123);
IPAddress gateway(192, 168, 1, 1);
IPAddress subnet(255, 255, 255, 0);
if(isAccessPointMode) {
WiFi.mode(WIFI_AP);
WiFi.softAP("VisionProCam13-WiFi");
delay(100);
WiFi.softAPConfig(ipAp, ip, subnet);
}
else {
if (!WiFi.config(ip,gateway,subnet)){
Serial.println("Failed to configure!");
}
WiFi.begin(WIFI_SSID, WIFI_PASSWORD);
while (WiFi.status() != WL_CONNECTED) {
delay(500);
Serial.print(".");
}
Serial.println("");
Serial.println("WiFi connected");
Serial.println("Access 'http://");
Serial.print(WiFi.localIP());
Serial.println("/stream' to connect webcam");
}
{
using namespace esp32cam;
initialResolution = Resolution::find(800, 600);
Config cfg;
cfg.setPins(UnitCamS3);
cfg.setResolution(initialResolution);
cfg.setJpeg(80);
bool ok = Camera.begin(cfg);
if (!ok) {
Serial.println("camera initialize failure");
delay(5000);
ESP.restart();
}
Serial.println("camera initialize success");
}
server.on("/still", HTTP_GET, serveStill);
server.on("/stream", HTTP_GET, streamJpg);
server.begin();
}
void loop() {
delay(1);
}
// Photo
static void serveStill(AsyncWebServerRequest *request) {
auto frame = esp32cam::capture();
if (frame == nullptr) {
Serial.println("capture() failure");
request->send(500, "text/plain", "still capture error\n");
return;
}
AsyncWebServerResponse *response = request->beginResponse_P(200, "image/jpeg", frame->data(), frame->size());
request->send(response);
}
Build → Run on PlatformIO, the program will wait in server mode. If isAccessPointMode is set to true, the server will wait at 192.168.4.1 in the access point mode. If set to false, it will wait at 192.168.1.123 in the station mode. Access point mode works in 30fps, station mode 10fps. As Vision Pro can only connect to the internet via WiFi, so you will need to operate it in station mode to access the internet and camera at the same time.
The visionOS app is created using SwiftUI. WKWebView receives streaming from "http://192.168.1.123/stream".
import SwiftUI
import RealityKit
import RealityKitContent
import WebKit
#if os(macOS)
struct WebView: NSViewRepresentable {
let loadUrl: URL
func makeNSView(context: Context) -> WKWebView {
return WKWebView()
}
func updateNSView(_ uiView: WKWebView, context: Context) {
let request = URLRequest(url: loadUrl)
uiView.load(request)
}
}
#else
struct WebView: UIViewRepresentable {
let loadUrl: URL
func makeUIView(context: Context) -> WKWebView {
return WKWebView()
}
func updateUIView(_ uiView: WKWebView, context: Context) {
let request = URLRequest(url: loadUrl)
uiView.load(request)
}
}
#endif
struct ContentView: View {
var body: some View {
WebView(loadUrl: URL(string: "http://192.168.1.123/stream")!)
}
}
If you do not add "Privacy - Local Network Usage Description" to Info.plist, you will not be able to communicate via LAN and WebView will show nothing. Access "http://192.168.1.123/still" to obtain still footage.
Now let's superimpose the Cam13 video onto the Vision Pro original screen.
Cam13onVP.mp4
Image scaling and positioning are manual operation by myself (^^; The next challenge is to automate this calibration, but I am optimistic that once the camera is mounted and the settings are made, there will be no major discrepancies.
- Make a camera mount for Cam13 with a 3D printer and fix Cam13
- Automate the calibration of Vision Pro screen and Cam13 video overlapping
- Perform AI image analysis in real time
- Display analysis result information on the HUD which floats in the Vision Pro screen
- Do everything on-device of Vision Pro (considering privacy)