Michael Siebenmann, Hanqiu Li Cai, Omar Majzoub, Alexander Zank
OpenMaskXR is our semester project for ETH's Mixed Reality course. We bring OpenMask3D to Extended Reality. With OpenMaskXR, we demonstrate an end-to-end workflow for advanced scene understanding in XR. We implement various software components whose tasks range from scanning the environment using commodity hardware to processing and displaying it for open-vocabulary object querying.
Watch our video and read our paper to learn more. We extend our deepest gratitude to Elisabetta Fedele and Alexandros Delitzas for their advice and the Mixed Reality Teaching Staff for creating such an amazing course.
If you want to run OpenMaskXR yourself, you need to run both the XR client and the server at the same time.
We include a Meta Quest 3 build under the Releases tab of this repository. Note that our application requires internet access.
OpenMaskXR was designed to target OpenXR-compliant runtimes. That means you may also create a build for other MR headsets, such as the Magic Leap 2, HTC Vive Focus 3 or Pico 4. Additionally, we support Apple Vision Pro in a separate (experimental) build target. To build the OpenMaskXR client yourself, follow these steps:
- Install Unity 6000.0.23f1 (higher versions of Unity 6 are untested, but likely to work as well).
- Clone this repository and open the folder
<Root>/OpenMaskXR
with Unity. - Follow the (Unity) guide for setting up your XR headset in an OpenXR project.
- Adapt line 403 in
ModelManager.cs
to use your API's base URL (see the next section for tips on hosting) with the route/text-to-CLIP
:StartCoroutine(TextQuery("<your-api-base-url>/text-to-CLIP", $"{{\"text\":\"{query}\"}}"));
- Select the build target (Android via OpenXR or visionOS), then create and run a build through
File > Build And Run
or useCtrl + B
. (If deploying to visionOS, remember to replace our development team with yours inproject.pbxproj
.)
To capture reconstruction meshes, posed RGB-frames, intrinsics, and marker transforms, install our sensing emulator on a modern iOS / iPadOS / visionOS device:
- Open the project contained in
<Root>/Experimental/OpenMaskXRSensingEmulator
with Xcode. - In
project.pbxproj
, replace our development team with yours, keeping automatic signing on. (If deploying on visionOS, obtain the Enterprise Entitlement for Main Camera Access from Apple's Developer Page and include it too.) - Select your target device and build and run with
⌘R
.
While you can explore our pre-processed ScanNet200 scenes in XR without having to run the server, it is required for querying. The server exposes an API that can be used to embed text into CLIP vectors, as CLIP unfortunately cannot run on the headset. We first describe how to achieve this minimal setup, then discuss how to run our other software components offline.
Note that the following steps only set up a server for this sole purpose of embedding text into CLIP. To run the minimal server (and easily expose it to the internet), follow these steps:
- Install ngrok and Python 3.11.
- Create an ngrok account and connect the agent to your account (see their Quickstart linked in step 1).
- Create a static domain in your ngrok dashboard.
- Clone this repository and checkout the
laptop-workaround
branch. (This branch skips imports not required for CLIP-only mode, saving time on low-spec hardware.) - Navigate to
<Root>/Server/main
. - Create a virtual environment with
python -m venv .venv
and activate it. - Install necessary packages through
pip install -r requirements.txt
. - Run the API script:
python api.py
. - In a parallel terminal, run the following:
ngrok http 1234 --url=<your-static-url>
.
If you wish to use our pre- and post-processing scripts working with point cloud and RGB-D data,
- Checkout the
main
branch of this repository. - Inspect the contents of
<Root>/Server/main
. Invoke scripts manually, for instance in a Python REPL session.
If you followed the minimal setup, all dependencies should already be installed.
If you wish to run our dockerized OpenMask3D service,
- Navigate to
<Root>/Server/openmask3d
. - Build the Docker image using
docker build --tag openmask3d
.
To run the image in server mode and start processing the contents of <your-indir>
,
-
Run
docker run --gpus all -v <your-indir>:/root/input -v <your-outdir>:/root/output -p 2345:80 openmask3d
. -
Perform an HTTP GET request to
localhost:2345
.