We will update the instructions for this codebase as soon as possible.
See INSTALLATION.md
- Download the EgoCOT dataset.
- Download the COCO-2017 dataset.
Download the testing model Embodied_family_7btiny.
- Unzip
datasets_share.zip
, which contains the text part of the multi-modal dataset, to the./datasets/
directory.
This repo can be used in conjunction with PyTorch's Dataset
and DataLoader
for training models on heterogeneous
data. Here's a brief overview of the classes and their functionalities:
The BaseDataset
class extends PyTorch's Dataset
and is designed to handle different media types (images, videos, and
text). It includes a transformation process to standardize the input data and a processor to handle the data specific to
the task.
from robohusky.base_dataset_uni import BaseDataset
# Initialize the dataset with the required parameters
dataset = BaseDataset(
dataset, # Your dataset here
processor, # Your processor here
image_path="path/to/images",
input_size=224,
num_segments=8,
norm_type="openai",
media_type="image"
)
# Use the dataset with a PyTorch DataLoader
from torch.utils.data import DataLoader
data_loader = DataLoader(dataset, batch_size=32, shuffle=True)
The WeightedConcatDataset
class extends PyTorch's ConcatDataset
and allows for the creation of a unified dataset by
concatenating multiple datasets with specified weights.
from robohusky.base_dataset_uni import WeightedConcatDataset
# Assume we have multiple datasets for different tasks
dataset1 = BaseDataset(...)
dataset2 = BaseDataset(...)
dataset3 = BaseDataset(...)
# Define the weights for each dataset
weights = [0.5, 0.3, 0.2]
# Create a weighted concatenated dataset
weighted_dataset = WeightedConcatDataset([dataset1, dataset2, dataset3], weights=weights)
# Use the weighted dataset with a PyTorch DataLoader
data_loader = DataLoader(weighted_dataset, batch_size=32, shuffle=True)
The package is designed to be flexible and customizable. You can implement your own transformation and processing logic
by subclassing BaseDataset
and overriding the necessary methods.
This project is released under the Apache 2.0 license.
If you find this project useful in your research, please consider cite:
@article{mu2024embodiedgpt,
title={Embodiedgpt: Vision-language pre-training via embodied chain of thought},
author={Mu, Yao and Zhang, Qinglong and Hu, Mengkang and Wang, Wenhai and Ding, Mingyu and Jin, Jun and Wang, Bin and Dai, Jifeng and Qiao, Yu and Luo, Ping},
journal={Advances in Neural Information Processing Systems},
volume={36},
year={2024}
}