Human-Object Interaction Prediction

Introduction

The official implementation of our paper "Human-Object Interaction Prediction in Videos through Gaze Following" accepted by CVIU with doi: https://doi.org/10.1016/j.cviu.2023.103741. ArXiv preprint available: https://arxiv.org/abs/2306.03597.

Our results

On VidHOI (Oracle mode):

Future Time	mAP (QPIC)	Person-wise top-5 Recall	Precision	Accuracy	F1-score
0s (Detection)	38.61	70.91	59.84	51.29	62.24
1s	37.59	72.17	59.98	51.65	62.78
3s	33.14	71.88	60.44	52.08	62.87
5s	32.75	71.25	59.09	51.14	61.92
7s	31.70	70.48	58.80	50.56	61.36

On Action Genome (PredCls mode):

Rec@10	Rec@20	Rec@50
75.4	83.7	84.3

TODO: show qualitative result

Install

Clone the repository recursively:
git clone --recurse-submodules https://github.com/nizhf/hoi-prediction-gaze-transformer.git
Create conda environment. We use mamba to accelerate the installation. In addition, as issue #7, opencv in conda-forge seems to be incompatible with torchvision 0.11.0, so we install opencv via pip.

conda install mamba -c conda-forge  # install mamba in base environment
mamba create -n hoi_torch110 python=3.9 -c conda-forge 
conda activate hoi_torch110  
mamba install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 torchtext==0.11.0 cudatoolkit=11.3 black Cython easydict gdown imageio ipywidgets matplotlib notebook numpy pandas Pillow PyYAML requests scikit-learn scipy seaborn tqdm tensorboard wandb -c pytorch -c conda-forge
pip install opencv-python

Our training is using wandb to record training and validation metrics. You may create an account at https://wandb.ai and follow their instruction to login on your PC.

Inference on an arbitrary video

Some weights can be downloaded automatically. Some weights need to be downloaded manually from here: all weights in weights/sttrangaze, weights/yolov5/vidor_yolov5l.pt. Put them into weights/... folder in this repo. Also, if automatic download does not work, you can download them from the same link.
Run the run.py script

# Detection
python run.py --source path/to/video --out path/to/output_folder --future 0 --hoi-thres 0.3 --print
# For anticipation, set future to 1, 3, 5, or 7

This script will create a video with object tracking and gaze following. The HOIs are saved in the output folder and printed in 1 FPS in the console.

Train and evaluate the model

Please follow this instruction to prepare the datasets and train our model.

Citation

If our work is helpful for your research, please consider citing our publication:

@article{NI2023103741,
  title = {Human–Object Interaction Prediction in Videos through Gaze Following},
  journal = {Computer Vision and Image Understanding},
  volume = {233},
  pages = {103741},
  year = {2023},
  issn = {1077-3142},
  doi = {https://doi.org/10.1016/j.cviu.2023.103741},
  url = {https://www.sciencedirect.com/science/article/pii/S1077314223001212},
  author = {Zhifan Ni and Esteve {Valls Mascar\'o} and Hyemin Ahn and Dongheui Lee},
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
common		common
configs		configs
misc_scripts		misc_scripts
modules		modules
stats		stats
vidhoi_related		vidhoi_related
weights		weights
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
eval_inference_ag.py		eval_inference_ag.py
eval_inference_vidhoi.py		eval_inference_vidhoi.py
eval_metrics_ag.py		eval_metrics_ag.py
eval_metrics_vidhoi.py		eval_metrics_vidhoi.py
run.py		run.py
train.md		train.md
train_ag.py		train_ag.py
train_vidhoi.py		train_vidhoi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human-Object Interaction Prediction

Introduction

Our results

Install

Inference on an arbitrary video

Train and evaluate the model

Citation

About

Releases

Packages

Languages

License

nizhf/hoi-prediction-gaze-transformer

Folders and files

Latest commit

History

Repository files navigation

Human-Object Interaction Prediction

Introduction

Our results

Install

Inference on an arbitrary video

Train and evaluate the model

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages