This version is immigrated from a internal implementation of Alibaba Group, feel free to open an issue to address any problem!
conda create -n arldm python=3.8
conda activate arldm
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch-lts
git clone https://github.com/Flash-321/ARLDM.git
cd ARLDM
pip install -r requirements.txt
- Download the PororoSV dataset here.
- Download the FlintstonesSV dataset here.
- Download the VIST-SIS url links here
- Download the VIST-DII url links here
- Download the VIST images running
python data_script/vist_img_download.py
--json_dir /path/to/dii_json_files
--img_dir /path/to/save_images
--num_process 32
- To accelerate I/O, using the following scrips to convert your downloaded data to HDF5
python data_script/pororo_hdf5.py
--data_dir /path/to/pororo_data
--save_path /path/to/save_hdf5_file
python data_script/flintstones_hdf5.py
--data_dir /path/to/flintstones_data
--save_path /path/to/save_hdf5_file
python data_script/vist_hdf5.py
--sis_json_dir /path/to/sis_json_files
--dii_json_dir /path/to/dii_json_files
--img_dir /path/to/vist_images
--save_path /path/to/save_hdf5_file
Specify your directory and device configuration in config.yaml
and run
python main.py
Specify your directory and device configuration in config.yaml
and run
python main.py
Thanks a lot to @adymaharana for kindly sharing FlintstonesSV and PororoSV datasets (and the code), as well as the PororoSV pretrained checkpoint and Flintstones sampled results of StoryDALL·E.
If you find this code useful for your research, please cite our paper:
@article{pan2022synthesizing,
title={Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models},
author={Pan, Xichen and Qin, Pengda and Li, Yuhong and Xue, Hui and Chen, Wenhu},
journal={arXiv preprint arXiv:2211.10950},
year={2022}
}