This challenge is held at ICML 2024 WORKSHOP: Multi-modal Foundation Model meets Embodied AI.
This repository contains the benchmark toolkit and the RH20T-P data api.
sudo apt-get install libvulkan1
To test your installation of Vulkan
sudo apt-get install vulkan-utils
vulkaninfo
If vulkaninfo
fails to show the information about Vulkan, you can check this page for troubleshooting.
conda create -n maniskill python=3.10
conda activate maniskill
pip install --upgrade mani_skill
pip install torch torchvision torchaudio # make sure torch >= 2.3
To test your installation of ManiSkill
python -m mani_skill.examples.demo_random_action
git clone https://github.com/Zx55/cga-challenge.git
pip install -r requirements.txt
We provide a simple test case (Rh20t-PickObject-v0
) in this repository to facilitate participants in debugging their CGAs.
You can run the following command to show this test case.
python main.py --env Rh20t-PickObject-v0 --gt --seed 0
The argument --gt
will sample a predefined gt trajectory to complete the task (if possible).
-
Input
-
Open-world instruction (in language)
-
RGB image rendered by simulator (you can also store these images as historical inputs)
-
Camera information (extrinsic, intrinsic)
-
Position of end-effector
-
-
Output
-
Actions for each step
-
End-effector coordinate system
-
8-d vector: position (xyz) + rotation (quat) + gripper (width)
-
-
(option) Execution plans in order
-
-
Metric: We will have 2 leaderboard, one for execution scores, and the other for planning accuracy
-
Execution scores (see here)
-
We have set multiple conditions for each test case so that it can be executed successfully.
-
E.g., the condition for the test case
Rh20t-PickObject-v0
is:-
Move near the target object
-
Grasp the target object
-
Lift the grasped object
-
-
You can achieve corresponding score for each condition completed.
-
-
Planning Accuracy
-
If you do not select to output plans, you will not participate in the rankings of planning accuracy leaderboard
-
No restriction on the definition of primitive skills
-
Manual check will be conduct (at least 3 reviewers)
-
-
We provide basic components in model/
to help you integrate your CGAs to benchmark toolkit. Specifically, you should first wrap CGAs with ModelWrapper
and implement the corresponding methods defined in model_wrapper.py to communicate with simulator. See protocal for input output definition.
-
load_model()
: initialize your model, load the ckpt and prepare for inference. -
initialize()
: initialize inference settings, preparing for the next new manipulation task. -
pred_action()
: the inference process.
sample_model.py shows a CGA instance.
-
Run the simulation
python main.py --env Rh20t-PickObject-v0 --eval --seed 0
-
Start your CGA
To deploy your CGA, define the [Dockerfile](model/ Dockerfile). And then build the docker and start it.
cd model docker build --no-cache -t {docker_name} . cd .. docker run --gpus all -v comm:/comm {docker_name}
You can see data transfer in
comm/
if everything works.
Save the docker image as tar
.
docker save -o {docker_name}.tar {docker_name}:latest
And then push it to the DockerHub.
After the test server opens since Aug 30, 2024, you can submit the name and tag of uploaded model image on Dockerhub.
We will pull the image on the test server, and conduct the benchmark.
* The page for submission will be released later.
You can also collect data from simulators to fine-tune your CGAs.
You can refer to here or MotionPlanning API provided by ManiSkill.
-
For participation in the challenge, it is a strict requirement to register for your team by filling out the Google Form.
-
Any kind of Large Language Models (LLMs) & Multimodal Large Language Models (MLLMs) can be used in this challenge. Both open-sourced models, e.g., LLaMA or LLaVA, and close-sourced models that can be accessed via network requests, e.g., GPT-4, are allowed.
-
Any kind of existing real-world/simulated robotic manipulation dataset can be used in this challenge.
-
No restriction on definition of primitive skills. You can design any format of that, e.g., natural language, code. We will check the output plans manually.
-
In order to check for compliance, we will ask the participants to provide technical reports to the challenge committee and participants will be asked to provide a public talk about their works after winning the award.
Sources | URL |
---|---|
RH20T dataset | download |
RH20T-P annotation | download |
We provide the api for RH20T-P in data/rh20tp/dataset.py
. See here for details.