Composable Generalization Agents Challenge

This challenge is held at ICML 2024 WORKSHOP: Multi-modal Foundation Model meets Embodied AI.

This repository contains the benchmark toolkit and the RH20T-P data api.

1. Benchmark Toolkit

1.1 Installation

Install Vulkan (e.g., on Ubuntu)

sudo apt-get install libvulkan1

To test your installation of Vulkan

sudo apt-get install vulkan-utils
vulkaninfo

If vulkaninfo fails to show the information about Vulkan, you can check this page for troubleshooting.

Install ManiSkill 3

conda create -n maniskill python=3.10
conda activate maniskill

pip install --upgrade mani_skill
pip install torch torchvision torchaudio  # make sure torch >= 2.3

To test your installation of ManiSkill

python -m mani_skill.examples.demo_random_action

Clone the Repo

git clone https://github.com/Zx55/cga-challenge.git
pip install -r requirements.txt

1.2 Benchmark

We provide a simple test case (Rh20t-PickObject-v0) in this repository to facilitate participants in debugging their CGAs.

You can run the following command to show this test case.

python main.py --env Rh20t-PickObject-v0 --gt --seed 0

The argument --gt will sample a predefined gt trajectory to complete the task (if possible).

Evaluation Protocal

Input
- Open-world instruction (in language)
- RGB image rendered by simulator (you can also store these images as historical inputs)
- Camera information (extrinsic, intrinsic)
- Position of end-effector
Output
- Actions for each step
  - End-effector coordinate system
  - 8-d vector: position (xyz) + rotation (quat) + gripper (width)
- (option) Execution plans in order
Metric: We will have 2 leaderboard, one for execution scores, and the other for planning accuracy
- Execution scores (see here)
  - We have set multiple conditions for each test case so that it can be executed successfully.
  - E.g., the condition for the test case Rh20t-PickObject-v0 is:
    - Move near the target object
    - Grasp the target object
    - Lift the grasped object
  - You can achieve corresponding score for each condition completed.
- Planning Accuracy
  - If you do not select to output plans, you will not participate in the rankings of planning accuracy leaderboard
  - No restriction on the definition of primitive skills
  - Manual check will be conduct (at least 3 reviewers)

Integrate Your CGAs by Wrapper

We provide basic components in model/ to help you integrate your CGAs to benchmark toolkit. Specifically, you should first wrap CGAs with ModelWrapper and implement the corresponding methods defined in model_wrapper.py to communicate with simulator. See protocal for input output definition.

load_model(): initialize your model, load the ckpt and prepare for inference.
initialize(): initialize inference settings, preparing for the next new manipulation task.
pred_action(): the inference process.

sample_model.py shows a CGA instance.

Test Your CGAs Locally

Run the simulation

python main.py --env Rh20t-PickObject-v0 --eval --seed 0

Start your CGA

To deploy your CGA, define the [Dockerfile](model/ Dockerfile). And then build the docker and start it.
```
cd model
docker build --no-cache -t {docker_name} .
cd ..
docker run --gpus all -v comm:/comm {docker_name}
```
You can see data transfer in comm/ if everything works.

1.3 Submission

Upload to DockerHub

Save the docker image as tar.

docker save -o {docker_name}.tar {docker_name}:latest

And then push it to the DockerHub.

Submission

After the test server opens since Aug 30, 2024, you can submit the name and tag of uploaded model image on Dockerhub.

We will pull the image on the test server, and conduct the benchmark.

* The page for submission will be released later.

1.4 Advance Usage

You can also collect data from simulators to fine-tune your CGAs.

You can refer to here or MotionPlanning API provided by ManiSkill.

1.5 Rules

For participation in the challenge, it is a strict requirement to register for your team by filling out the Google Form.
Any kind of Large Language Models (LLMs) & Multimodal Large Language Models (MLLMs) can be used in this challenge. Both open-sourced models, e.g., LLaMA or LLaVA, and close-sourced models that can be accessed via network requests, e.g., GPT-4, are allowed.
Any kind of existing real-world/simulated robotic manipulation dataset can be used in this challenge.
No restriction on definition of primitive skills. You can design any format of that, e.g., natural language, code. We will check the output plans manually.
In order to check for compliance, we will ask the participants to provide technical reports to the challenge committee and participants will be asked to provide a public talk about their works after winning the award.

2. RH20T-P Resources

2.1 Download the Dataset

Sources	URL
RH20T dataset	download
RH20T-P annotation	download

2.2 RH20T-P API

We provide the api for RH20T-P in data/rh20tp/dataset.py. See here for details.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
comm		comm
data/rh20tp		data/rh20tp
model		model
rh20tp_envs		rh20tp_envs
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Composable Generalization Agents Challenge

1. Benchmark Toolkit

1.1 Installation

Install Vulkan (e.g., on Ubuntu)

Install ManiSkill 3

Clone the Repo

1.2 Benchmark

Evaluation Protocal

Integrate Your CGAs by Wrapper

Test Your CGAs Locally

1.3 Submission

Upload to DockerHub

Submission

1.4 Advance Usage

1.5 Rules

2. RH20T-P Resources

2.1 Download the Dataset

2.2 RH20T-P API

About

Releases

Packages

Contributors 2

Languages

Zx55/cga-challenge

Folders and files

Latest commit

History

Repository files navigation

Composable Generalization Agents Challenge

1. Benchmark Toolkit

1.1 Installation

Install Vulkan (e.g., on Ubuntu)

Install ManiSkill 3

Clone the Repo

1.2 Benchmark

Evaluation Protocal

Integrate Your CGAs by Wrapper

Test Your CGAs Locally

1.3 Submission

Upload to DockerHub

Submission

1.4 Advance Usage

1.5 Rules

2. RH20T-P Resources

2.1 Download the Dataset

2.2 RH20T-P API

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages