diff --git a/404.html b/404.html index 229d521..7f5ff72 100644 --- a/404.html +++ b/404.html @@ -1,4 +1,4 @@ -404: This page could not be found

404

This page could not be found.

\ No newline at end of file + }

404

This page could not be found.

\ No newline at end of file diff --git a/AT-field.html b/AT-field.html index efd2f33..2b11b4e 100644 --- a/AT-field.html +++ b/AT-field.html @@ -1 +1 @@ -Shadow-Teleop

Shadow Hand Teleop System

MVIG-RHOS, SJTU

About

  Recently, with the emergence of GPT, robot planning and reasoning systems based on basic large language models have been rapidly developed, but the current SOTA system (Robot Transformer-2, RoboAgent) already suffers from a number of shortcomings:
  (a) Insufficient robustness to external disturbances: current mainstream models perform poorly in the face of external disturbances, are difficult to replan strategies in real time in dynamic environments, and often require manual intervention or are unable to automatically adapt to new situations, and thus are inefficient in the face of external changes.
  (b) Data Efficiency Problem: Many SOTA systems require a large amount of data during training, which can lead to data inefficiency. Improving these systems to increase data efficiency is an important challenge.
  (c) Insufficient scalability and migratability: current SOTA systems are usually only able to perform well in specific tasks or domains, and struggle to handle a wider range of tasks. Even for very similar environments, it is difficult to do policy migration.
  Specifically, on dexterous hand reasoning and planning: current mainstream models perform well in executing strategies in static environments, but still lack robust and effective Re-Planning capabilities in the face of external disturbances. Our research goal is to address this problem by proposing a new planning approach with a dual-layer planner with replaceable strategies and goals at both coarse and fine grains to cope with changing environments; at the same time, we will introduce a planner and an executor capable of generating or replacing new strategies when planning errors are detected at both the vision and executor side to ensure that the target task is accomplished.

Demo

Resources

Our code is available on Github

© Copyright 2022 MVIG-RHOS • Based on tbakerx
\ No newline at end of file +Shadow-Teleop

Shadow Hand Teleop System

MVIG-RHOS, SJTU

About

  Recently, with the emergence of GPT, robot planning and reasoning systems based on basic large language models have been rapidly developed, but the current SOTA system (Robot Transformer-2, RoboAgent) already suffers from a number of shortcomings:
  (a) Insufficient robustness to external disturbances: current mainstream models perform poorly in the face of external disturbances, are difficult to replan strategies in real time in dynamic environments, and often require manual intervention or are unable to automatically adapt to new situations, and thus are inefficient in the face of external changes.
  (b) Data Efficiency Problem: Many SOTA systems require a large amount of data during training, which can lead to data inefficiency. Improving these systems to increase data efficiency is an important challenge.
  (c) Insufficient scalability and migratability: current SOTA systems are usually only able to perform well in specific tasks or domains, and struggle to handle a wider range of tasks. Even for very similar environments, it is difficult to do policy migration.
  Specifically, on dexterous hand reasoning and planning: current mainstream models perform well in executing strategies in static environments, but still lack robust and effective Re-Planning capabilities in the face of external disturbances. Our research goal is to address this problem by proposing a new planning approach with a dual-layer planner with replaceable strategies and goals at both coarse and fine grains to cope with changing environments; at the same time, we will introduce a planner and an executor capable of generating or replacing new strategies when planning errors are detected at both the vision and executor side to ensure that the target task is accomplished.

Demo

Resources

Our code is available on Github

© Copyright 2022 MVIG-RHOS • Based on tbakerx
\ No newline at end of file diff --git a/_next/static/SSc8pvUTAtr1PpK8nQLJx/_buildManifest.js b/_next/static/beBhwobWP8uI3Ksomq3J_/_buildManifest.js similarity index 100% rename from _next/static/SSc8pvUTAtr1PpK8nQLJx/_buildManifest.js rename to _next/static/beBhwobWP8uI3Ksomq3J_/_buildManifest.js diff --git a/_next/static/SSc8pvUTAtr1PpK8nQLJx/_ssgManifest.js b/_next/static/beBhwobWP8uI3Ksomq3J_/_ssgManifest.js similarity index 100% rename from _next/static/SSc8pvUTAtr1PpK8nQLJx/_ssgManifest.js rename to _next/static/beBhwobWP8uI3Ksomq3J_/_ssgManifest.js diff --git a/ego_pca.html b/ego_pca.html index 6264b9a..2f246cf 100644 --- a/ego_pca.html +++ b/ego_pca.html @@ -1,7 +1,7 @@ -EgoPCA

EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding

MVIG-RHOS, SJTU

demo
With the surge in attention to Egocentric Hand-Object Interaction (Ego-HOI), large-scale datasets such as Ego4D and EPIC-KITCHENS have been proposed. However, most current research is built on resources derived from third-person video action recognition. This inherent domain gap between first- and third-person action videos, which have not been adequately addressed before, makes current Ego-HOI suboptimal. This paper rethinks and proposes a new framework as an infrastructure to advance Ego-HOI recognition, contributing a new baseline, comprehensive pretrain sets, and balanced test sets, which are complete with a training-finetuning strategy. With our new framework, we not only achieve state-of-the-art performance on Ego-HOI benchmarks but also build several new and effective mechanisms and settings to advance further research. We believe our data and the findings will pave a new way for Ego-HOI understanding.

News and Olds

[2023.09] Our paper is available on arXiv.
[2023.07] EgoPCA will appear at ICCV 2023.

Download

Our data subset and code will come soon!

Publications

If you find our paper, data or code usefull, please cite:
@article{egopca,
+EgoPCA

EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding

MVIG-RHOS, SJTU

demo
With the surge in attention to Egocentric Hand-Object Interaction (Ego-HOI), large-scale datasets such as Ego4D and EPIC-KITCHENS have been proposed. However, most current research is built on resources derived from third-person video action recognition. This inherent domain gap between first- and third-person action videos, which have not been adequately addressed before, makes current Ego-HOI suboptimal. This paper rethinks and proposes a new framework as an infrastructure to advance Ego-HOI recognition, contributing a new baseline, comprehensive pretrain sets, and balanced test sets, which are complete with a training-finetuning strategy. With our new framework, we not only achieve state-of-the-art performance on Ego-HOI benchmarks but also build several new and effective mechanisms and settings to advance further research. We believe our data and the findings will pave a new way for Ego-HOI understanding.

News and Olds

[2023.09] Our paper is available on arXiv.
[2023.07] EgoPCA will appear at ICCV 2023.

Download

Our data subset and code will come soon!

Publications

If you find our paper, data or code usefull, please cite:
@article{egopca,
   title={EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding},
   author={Xu, Yue and Li, Yong-Lu and Huang, Zhemin and Liu, Michael Xu
           and Lu, Cewu and Tai, Yu-Wing and Tang, Chi-Keung},
   journal={arXiv preprint arXiv:2309.02423},
   year={2023}
-}
© Copyright 2022 MVIG-RHOS • Based on tbakerx
\ No newline at end of file +}
© Copyright 2022 MVIG-RHOS • Based on tbakerx
\ No newline at end of file diff --git a/index.html b/index.html index 3c28ca8..d89e372 100644 --- a/index.html +++ b/index.html @@ -1 +1 @@ -RHOS
RHOS-image

RHOS

Robot • Human • Object • Scene

RHOS

About

Hi, this is the website of RHOS team at MVIG. We study Human Activity Understanding, Visual Reasoning, and Embodied AI. We are building a knowledge-driven system that enables intelligent agents to perceive human activities, reason human behavior logics, learn skills from human activities, and interact with environment.

Research Interests:

(S) Embodied AI: how to make agents learn skills from humans and interact with human & scene & object.
(S-1) Human Activity Understanding: how to learn and ground complex/ambiguous human activity concepts (body motion, human-object/human/scene interaction) and object concepts from multi-modal information (2D-3D-4D).
(S-2) Visual Reasoning: how to mine, capture, and embed the logics and causal relations from human activities.
(S-3) General Multi-Modal Foundation Models: especially for human-centric perception tasks.
(S-4) Activity Understanding from A Cognitive Perspective: work with multidisciplinary researchers to study how the brain perceives activities.
(E) Human-Robot Interaction (e.g. for Smart Hospital): work with the healthcare team (doctors and engineers) in SJTU to develop intelligent robots to help people.

Contact

Yong-Lu Li
Email: yonglu_li[at]sjtu[dot]edu[dot]cn
Office: SEIEE-3-301
Shanghai Jiao Tong University

Recruitment

We are actively looking for self-motivated students (Master/PhD, 2025 spring & fall), interns / engineers / visitors (CV/ML/ROB/NLP/Math/Phys background, always welcome) to join us in Machine Vision and Intelligence Group (MVIG). If you share same/similar interests, feel free to drop me an email with your resume.

Click: Eng or for more details.

News and Olds

[2024.7] Five works on visual reasoning, 4D human motions, embodied AI, and dataset distillation will appear at ECCV 2024!
[2024.2] Our work Pangea and Video Distillation will appear at CVPR 2024.
[2023.12] Our work on primitive-based HOI reconstruction (P3HAOI) will appear at AAAI 2024!
[2023.9] The advanced HAKE reasoning engine based on LLM (Symbol-LLM) will appear at NeurIPS'23!
[2023.7] Our works on ego-centric video understanding and object concept learning will appear at ICCV'23!
[2023.7] The upgrade version of DCR will appear at IJCV!
[2022.12] HAKE 2.0 will appear at TPAMI!
[2022.12] OCL (Object Concept Leanring) is released on arXiv. Please visit the project page for details.
[2022.11] We release the human body part states and interactive object bounding box annotations upon AVA (2.1 & 2.2): [HAKE-AVA], and a CLIP-based human part state & verb recognizer: [CLIP-Activity2Vec].
[2022.11] AlphaPose will appear at TPAMI!
[2022.07] Two papers on longtailed learning, HOI detection are accepted by ECCV'22, arXivs and code are coming soon
[2022.03] Five papers on HOI detection/prediction, trajection prediction, 3D detection/keypoints are accepted by CVPR'22, papers and code are coming soon.
[2022.02] We release the human body part state labels based on AVA: HAKE-AVA and HAKE 2.0.
[2021.12] Our work on HOI generalization will appear at AAAI'22.
[2021.10] Learning Single/Multi-Attribute of Object with Symmetry and Group is accepted by TPAMI.
[2021.09] Our work Localization with Sampling-Argmax will appear at NeurIPS'21.
[2021.02] Upgraded HAKE-Activity2Vec is released! Images/Videos --> human box + ID + skeleton + part states + action + representation. [Demo] [Description]
[2021.01] TIN (Transferable Interactiveness Network) is accepted by TPAMI.
[2020.12] DecAug is accepted by AAAI'21.
[2020.09] Our work HOI Analysis will appear at NeurIPS 2020.
[2020.06] The larger HAKE-Large (>120K images with activity and part state labels) is released.
[2020.02] Three papers Image-based HAKE: PaSta-Net, 2D-3D Joint HOI Learning, Symmetry-based Attribute-Object Learning are accepted in CVPR'20! Papers and corresponding resources (code, data) will be released soon.
[2019.07] Our paper InstaBoost is accepted in ICCV'19.
[2019.06] The Part I of our HAKE: HAKE-HICO which contains the image-level part-state annotations is released.
[2019.04] Our project HAKE (Human Activity Knowledge Engine) begins trial operation.
[2019.02] Our paper on Interactiveness is accepted in CVPR'19.
[2018.07] Our paper on GAN & Annotation Generation is accepted in ECCV'18.
[2018.05] Presentation (Kaibot Team) in TIDY UP MY ROOM CHALLENGE | ICRA'18.
[2018.02] Our paper on Object Part States is accepted in CVPR'18.

Projects

HAKE

Human Activity Knowledge Engine2018Project Page

Human Activity Knowledge Engine (HAKE) is a knowledge-driven system that aims at enabling intelligent agents to perceive human activities, reason human behavior logics, learn skills from human activities, and interact with objects and environments.

OCL

Object Concept Learning2022Project Page

We propose a challenging Object Concept Learning (OCL) task to push the envelope of object understanding. It requires machines to reason out object affordances and simultaneously give the reason: what attributes make an object possess these affordances.

Pangea

Unified Action Semantic Space2023Project Page

We design an action semantic space given verb taxonomy hierarchy and covering massive actions. Thus, we can gather multi-modal datasets into a unified database in a unified label system, i.e., bridging “isolated islands” into a “Pangea”. So then, we propose a bidirectional mapping model between physical and semantic space to use Pangea fully.

EgoPCA

EgoPCA: A New Framework for EgoHOI2023Project Page

We rethink and propose a new framework as an infrastructure to advance Ego-HOI recognition by Probing, Curation and Adaption (EgoPCA). We contribute comprehensive pre-train sets, balanced test sets and a new baseline, which are complete with a training-finetuning strategy and several new and effective mechanisms and settings to advance further research.

Human-Agent Joint Learning

Robotic Teleoperation, Robot Manipulation, Imitation Learning2023Project Page

A human-agent joint learning teleoperation system for faster data collection, less human effort, and efficient robot manipulation skill acquisition.

Video-Distillation

Video Distillation via Static-Dynamic Disentanglement2023Project Page

We provide the first systematic study of video distillation and introduce a taxonomy to categorize temporal compression, which motivates our unified framework of disentangling the dynamic and static information in the videos. It first distills the videos into still images as static memory and then compensates the dynamic and motion information with a learnable dynamic memory block.

Publications


*=equal contribution
#=corresponding author

Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition

Shengcheng Luo*, Quanquan Peng*, Jun Lv, Kaiwen Hong, Katherine Rose Driggs-Campbell, Cewu Lu, Yong-Lu Li#
arXiv 2024arXivPDFProjectCode

Take A Step Back: Rethinking the Two Stages in Visual Reasoning

Mingyu Zhang*, Jiting Cai*, Mingyu Liu, Yue Xu, Cewu Lu, Yong-Lu Li#

Revisit Human-Scene Interaction via Space Occupancy

Xinpeng Liu*, Haowen Hou*, Yanchao Yang, Yong-Lu Li#, Cewu Lu

Bridging the Gap between Human Motion and Action Semantics via Kinematic Phrases

Xinpeng Liu, Yong-Lu Li#, Ailing Zeng, Zizheng Zhou, Yang You, Cewu Lu#

Distill Gold from Massive Ores: Efficient Dataset Distillation via Critical Samples Selection

Yue Xu, Yong-Lu Li#, Kaitong Cui, Ziyu Wang, Cewu Lu, Yu-Wing Tai, Chi-Keung Tang
ECCV 2024arXivPDFCode

DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control

Xinyu Xu, Shengcheng Luo, Yanchao Yang, Yong-Lu Li, Cewu Lu
ECCV 2024arXivPDFCode

HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid

Xinyu Xu, Yizheng Zhang, Yong-Lu Li, Lei Han, Cewu Lu
arXiv 2024arXivPDFCode

Low-Rank Similarity Mining for Multimodal Dataset Distillation

Yue Xu, Zhilin Lin, Yusong Qiu, Cewu Lu, Yong-Lu Li#

Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement

Ziyu Wang*, Yue Xu*, Cewu Lu, Yong-Lu Li#

From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding

Yong-Lu Li*, Xiaoqian Wu*, Xinpeng Liu, Yiming Dou, Yikun Ji, Junyi Zhang, Yixing Li, Xudong Lu, Jingru Tan, Cewu Lu
CVPR 2024, HighlightarXivPDFProjectCode

Primitive-based 3D Human-Object Interaction Modelling and Programming

Siqi Liu, Yong-Lu Li#, Zhou Fang, Xinpeng Liu, Yang You, Cewu Lu#

Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning

Xiaoqian Wu, Yong-Lu Li#, Jianhua Sun, Cewu Lu#
NeurIPS 2023arXivPDFProjectCode

Beyond Object Recognition: A New Benchmark towards Object Concept Learning

Yong-Lu Li, Yue Xu, Xinyu Xu, Xiaohan Mao, Yuan Yao, Siqi Liu, Cewu Lu

EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding

Yue Xu, Yong-Lu Li#, Zhemin Huang, Michael Xu LIU, Cewu Lu, Yu-Wing Tai, Chi Keung Tang.

Dynamic Context Removal: A General Training Strategy for Robust Models on Video Action Predictive Tasks

Xinyu Xu, Yong-Lu Li#, Cewu Lu#.
IJCV 2023arXivPDFCode

Discovering A Variety of Objects in Spatio-Temporal Human-Object Interactions

Yong-Lu Li*, Hongwei Fan*, Zuoyu Qiu, Yiming Dou, Liang Xu, Hao-Shu Fang, Peiyang Guo, Haisheng Su, Dongliang Wang, Wei Wu, Cewu Lu
A part of the HAKE Project

HAKE: Human Activity Knowledge Engine

Yong-Lu Li, Liang Xu, Xinpeng Liu, Xijie Huang, Yue Xu, Mingyang Chen, Ze Ma, Shiyi Wang, Hao-Shu Fang, Cewu Lu
Tech ReportHAKE1.0arXivPDFProjectCode
Main Repo: HAKE Star
Sub-repos: Torch StarTF StarHAKE-AVA Star
Halpe StarHOI List Star

HAKE: A Knowledge Engine Foundation for Human Activity Understanding

Yong-Lu Li, Xinpeng Liu, Xiaoqian Wu, Yizhuo Li, Zuoyu Qiu, Liang Xu, Yue Xu, Hao-Shu Fang, Cewu Lu
TPAMI 2023HAKE2.0arXivPDFProjectCodePress

AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time

Hao-Shu Fang*, Jiefeng Li*, Hongyang Tang, Chao Xu, Haoyi Zhu, Yuliang Xiu, Yong-Lu Li, Cewu Lu
TPAMI 2022arXivPDFCodeStar

Constructing Balance from Imbalance for Long-tailed Image Recognition

Yue Xu*, Yong-Lu Li*, Jiefeng Li, Cewu Lu
ECCV 2022DLSAarXivPDFCodeStar

Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection

Xiaoqian Wu*, Yong-Lu Li*, Xinpeng Liu, Junyi Zhang, Yuzhe Wu, Cewu Lu
ECCV 2022PartMaparXivPDFCodeStar

Interactiveness Field of Human-Object Interactions

Xinpeng Liu*, Yong-Lu Li*, Xiaoqian Wu, Yu-Wing Tai, Cewu Lu, Chi Keung Tang
CVPR 2022arXivPDFCodeStar

Human Trajectory Prediction with Momentary Observation

Jianhua Sun, Yuxuan Li, Liang Chai, Hao-Shu Fang, Yong-Lu Li, Cewu Lu
CVPR 2022PDF

Learn to Anticipate Future with Dynamic Context Removal

Xinyu Xu, Yong-Lu Li, Cewu Lu
CVPR 2022DCRarXivPDFCodeStar

Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes

Yang You, Zelin Ye, Yujing Lou, Chengkun Li, Yong-Lu Li, Lizhuang Ma, Weiming Wang, Cewu Lu
CVPR 2022arXivPDFCodeStar

UKPGAN: Unsupervised KeyPoint GANeration

Yang You, Wenhai Liu, Yong-Lu Li, Weiming Wang, Cewu Lu
CVPR 2022arXivPDFCodeStar

Highlighting Object Category Immunity for the Generalization of Human-Object Interaction Detection

Xinpeng Liu*, Yong-Lu Li*, Cewu Lu
AAAI 2022arXivPDFCodeStar

Learning Single/Multi-Attribute of Object with Symmetry and Group

Yong-Lu Li, Yue Xu, Xinyu Xu, Xiaohan Mao, Cewu Lu
TPAMI 2021SymNetarXivPDFCodeStar
An extension of our CVPR 2020 work (Symmetry and Group in Attribute-Object Compositions, SymNet).

Localization with Sampling-Argmax

Jiefeng Li, Tong Chen, Ruiqi Shi, Yujing Lou, Yong-Lu Li, Cewu Lu
NeurIPS 2021arXivPDFCodeStar

Transferable Interactiveness Knowledge for Human-Object Interaction Detection

Yong-Lu Li, Xinpeng Liu, Xiaoqian Wu, Xijie Huang, Liang Xu, Cewu Lu
TPAMI 2021TIN++arXivPDFCodeStar
An extension of our CVPR 2019 work (Transferable Interactiveness Network, TIN).

DecAug: Augmenting HOI Detection via Decomposition

Yichen Xie, Hao-Shu Fang, Dian Shao, Yong-Lu Li, Cewu Lu
AAAI 2021arXivPDF

HOI Analysis: Integrating and Decomposing Human-Object Interaction

Yong-Lu Li*, Xinpeng Liu*, Xiaoqian Wu, Yizhuo Li, Cewu Lu

PaStaNet: Toward Human Activity Knowledge Engine

Yong-Lu Li, Liang Xu, Xinpeng Liu, Xijie Huang, Yue Xu, Shiyi Wang, Hao-Shu Fang, Ze Ma, Mingyang Chen, Cewu Lu.

Oral Talk: Compositionality in Computer Vision in CVPR 2020

Detailed 2D-3D Joint Representation for Human-Object Interaction

Yong-Lu Li, Xinpeng Liu, Han Lu, Shiyi Wang, Junqi Liu, Jiefeng Li, Cewu Lu

Symmetry and Group in Attribute-Object Compositions

Yong-Lu Li, Yue Xu, Xiaohan Mao, Cewu Lu
CVPR 2020SymNetarXivPDFVideoSlidesCodeStar

InstaBoost: Boosting Instance Segmentation Via Probability Map Guided Copy-Pasting

Hao-Shu Fang*, Jianhua Sun*, Runzhong Wang*, Minghao Gou, Yong-Lu Li, Cewu Lu
ICCV 2019arXivPDFCodeStar

Transferable Interactiveness Knowledge for Human-Object Interaction Detection

Yong-Lu Li, Siyuan Zhou, Xijie Huang, Liang Xu, Ze Ma, Hao-Shu Fang, Yan-Feng Wang, Cewu Lu
CVPR 2019TINarXivPDFCodeStar

SRDA: Generating Instance Segmentation Annotation via Scanning, Reasoning and Domain Adaptation

Wenqiang Xu*, Yong-Lu Li*, Cewu Lu

Beyond Holistic Object Recognition: Enriching Image Understanding with Part States

Cewu Lu, Hao Su, Yong-Lu Li, Yongyi Lu, Li Yi, Chi-Keung Tang, Leonidas J. Guibas
CVPR 2018PDF

Optimization of Radial Distortion Self-Calibration for Structure from Motion from Uncalibrated UAV Images

Yong-Lu Li, Yinghao Cai, Dayong Wen, Yiping Yang
ICPR 2016PDF

People

Cewu Lu
Professor
Yong-Lu Li
Assistant Professor
Xinpeng Liu
PhD. Student
Yue Xu
PhD. Student
Xiaoqian Wu
PhD. Student
Siqi Liu
PhD. Student
Hong Li
PhD. Student
Zehao Wang
Ph.D. Student
Zixuan Chen
Ph.D. Student
Yusong Qiu
Master Student
Yushun Xiang
Master Student

Alumni:

Yuyang Zhang: EIAS & SJTU, Ph.D.
Zhilin Lin: SJTU, Ph.D.
Zili Lin: EIAS & SJTU, Ph.D.
Yifan Shi: EIAS & SJTU, Ph.D.
Quanquan Peng: UW, Intern
Haowen Hou: UCSD, Intern
Zizheng Zhou: UC Merced, Intern
Jiting Cai: UMass, Intern
Yixing Li: CUHK, Ph.D.
Mingyu Liu: ZJU, Ph.D.
Kaitong Cui: HKU, Intern
Yiming Dou: UMich, Ph.D.
Xiaohan Mao: Shanghai AI Lab & SJTU, Ph.D.
Zhemin Huang: Stanford University, MS
Shaopeng Guo: UCSD, Ph.D.
Xudong Lu: CUHK, Ph.D.
Hongwei Fan: Sensetime, Research Engineer
Yuan Yao: U of Rochester, Ph.D.
Zuoyu Qiu: SJTU, MS
Han Lu: SJTU, Ph.D.
Zhanke Zhou: HKBU, Ph.D.
Mingyang Chen: UCSD, MS
Liang Xu: EIAS & SJTU, Ph.D.
Ze Ma: Columbia University, MS
Xijie Huang: HKUST, Ph.D.
© Copyright 2022 MVIG-RHOS • Based on tbakerx
\ No newline at end of file +RHOS
RHOS-image

RHOS

Robot • Human • Object • Scene

RHOS

About

Hi, this is the website of RHOS team at MVIG. We study Human Activity Understanding, Visual Reasoning, and Embodied AI. We are building a knowledge-driven system that enables intelligent agents to perceive human activities, reason human behavior logics, learn skills from human activities, and interact with environment.

Research Interests:

(S) Embodied AI: how to make agents learn skills from humans and interact with human & scene & object.
(S-1) Human Activity Understanding: how to learn and ground complex/ambiguous human activity concepts (body motion, human-object/human/scene interaction) and object concepts from multi-modal information (2D-3D-4D).
(S-2) Visual Reasoning: how to mine, capture, and embed the logics and causal relations from human activities.
(S-3) General Multi-Modal Foundation Models: especially for human-centric perception tasks.
(S-4) Activity Understanding from A Cognitive Perspective: work with multidisciplinary researchers to study how the brain perceives activities.
(E) Human-Robot Interaction (e.g. for Smart Hospital): work with the healthcare team (doctors and engineers) in SJTU to develop intelligent robots to help people.

Contact

Yong-Lu Li
Email: yonglu_li[at]sjtu[dot]edu[dot]cn
Office: SEIEE-3-301
Shanghai Jiao Tong University

Recruitment

We are actively looking for self-motivated students (Master/PhD, 2025 spring & fall), interns / engineers / visitors (CV/ML/ROB/NLP/Math/Phys background, always welcome) to join us in Machine Vision and Intelligence Group (MVIG). If you share same/similar interests, feel free to drop me an email with your resume.

Click: Eng or for more details.

News and Olds

[2024.7] Five works on visual reasoning, 4D human motions, embodied AI, and dataset distillation will appear at ECCV 2024!
[2024.2] Our work Pangea and Video Distillation will appear at CVPR 2024.
[2023.12] Our work on primitive-based HOI reconstruction (P3HAOI) will appear at AAAI 2024!
[2023.9] The advanced HAKE reasoning engine based on LLM (Symbol-LLM) will appear at NeurIPS'23!
[2023.7] Our works on ego-centric video understanding and object concept learning will appear at ICCV'23!
[2023.7] The upgrade version of DCR will appear at IJCV!
[2022.12] HAKE 2.0 will appear at TPAMI!
[2022.12] OCL (Object Concept Leanring) is released on arXiv. Please visit the project page for details.
[2022.11] We release the human body part states and interactive object bounding box annotations upon AVA (2.1 & 2.2): [HAKE-AVA], and a CLIP-based human part state & verb recognizer: [CLIP-Activity2Vec].
[2022.11] AlphaPose will appear at TPAMI!
[2022.07] Two papers on longtailed learning, HOI detection are accepted by ECCV'22, arXivs and code are coming soon
[2022.03] Five papers on HOI detection/prediction, trajection prediction, 3D detection/keypoints are accepted by CVPR'22, papers and code are coming soon.
[2022.02] We release the human body part state labels based on AVA: HAKE-AVA and HAKE 2.0.
[2021.12] Our work on HOI generalization will appear at AAAI'22.
[2021.10] Learning Single/Multi-Attribute of Object with Symmetry and Group is accepted by TPAMI.
[2021.09] Our work Localization with Sampling-Argmax will appear at NeurIPS'21.
[2021.02] Upgraded HAKE-Activity2Vec is released! Images/Videos --> human box + ID + skeleton + part states + action + representation. [Demo] [Description]
[2021.01] TIN (Transferable Interactiveness Network) is accepted by TPAMI.
[2020.12] DecAug is accepted by AAAI'21.
[2020.09] Our work HOI Analysis will appear at NeurIPS 2020.
[2020.06] The larger HAKE-Large (>120K images with activity and part state labels) is released.
[2020.02] Three papers Image-based HAKE: PaSta-Net, 2D-3D Joint HOI Learning, Symmetry-based Attribute-Object Learning are accepted in CVPR'20! Papers and corresponding resources (code, data) will be released soon.
[2019.07] Our paper InstaBoost is accepted in ICCV'19.
[2019.06] The Part I of our HAKE: HAKE-HICO which contains the image-level part-state annotations is released.
[2019.04] Our project HAKE (Human Activity Knowledge Engine) begins trial operation.
[2019.02] Our paper on Interactiveness is accepted in CVPR'19.
[2018.07] Our paper on GAN & Annotation Generation is accepted in ECCV'18.
[2018.05] Presentation (Kaibot Team) in TIDY UP MY ROOM CHALLENGE | ICRA'18.
[2018.02] Our paper on Object Part States is accepted in CVPR'18.

Projects

HAKE

Human Activity Knowledge Engine2018Project Page

Human Activity Knowledge Engine (HAKE) is a knowledge-driven system that aims at enabling intelligent agents to perceive human activities, reason human behavior logics, learn skills from human activities, and interact with objects and environments.

OCL

Object Concept Learning2022Project Page

We propose a challenging Object Concept Learning (OCL) task to push the envelope of object understanding. It requires machines to reason out object affordances and simultaneously give the reason: what attributes make an object possess these affordances.

Pangea

Unified Action Semantic Space2023Project Page

We design an action semantic space given verb taxonomy hierarchy and covering massive actions. Thus, we can gather multi-modal datasets into a unified database in a unified label system, i.e., bridging “isolated islands” into a “Pangea”. So then, we propose a bidirectional mapping model between physical and semantic space to use Pangea fully.

EgoPCA

EgoPCA: A New Framework for EgoHOI2023Project Page

We rethink and propose a new framework as an infrastructure to advance Ego-HOI recognition by Probing, Curation and Adaption (EgoPCA). We contribute comprehensive pre-train sets, balanced test sets and a new baseline, which are complete with a training-finetuning strategy and several new and effective mechanisms and settings to advance further research.

Human-Agent Joint Learning

Robotic Teleoperation, Robot Manipulation, Imitation Learning2023Project Page

A human-agent joint learning teleoperation system for faster data collection, less human effort, and efficient robot manipulation skill acquisition.

Video-Distillation

Video Distillation via Static-Dynamic Disentanglement2023Project Page

We provide the first systematic study of video distillation and introduce a taxonomy to categorize temporal compression, which motivates our unified framework of disentangling the dynamic and static information in the videos. It first distills the videos into still images as static memory and then compensates the dynamic and motion information with a learnable dynamic memory block.

Publications


*=equal contribution
#=corresponding author

Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition

Shengcheng Luo*, Quanquan Peng*, Jun Lv, Kaiwen Hong, Katherine Rose Driggs-Campbell, Cewu Lu, Yong-Lu Li#
arXiv 2024arXivPDFProjectCode

Take A Step Back: Rethinking the Two Stages in Visual Reasoning

Mingyu Zhang*, Jiting Cai*, Mingyu Liu, Yue Xu, Cewu Lu, Yong-Lu Li#

Revisit Human-Scene Interaction via Space Occupancy

Xinpeng Liu*, Haowen Hou*, Yanchao Yang, Yong-Lu Li#, Cewu Lu

Bridging the Gap between Human Motion and Action Semantics via Kinematic Phrases

Xinpeng Liu, Yong-Lu Li#, Ailing Zeng, Zizheng Zhou, Yang You, Cewu Lu#

Distill Gold from Massive Ores: Efficient Dataset Distillation via Critical Samples Selection

Yue Xu, Yong-Lu Li#, Kaitong Cui, Ziyu Wang, Cewu Lu, Yu-Wing Tai, Chi-Keung Tang
ECCV 2024arXivPDFCode

DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control

Xinyu Xu, Shengcheng Luo, Yanchao Yang, Yong-Lu Li, Cewu Lu
ECCV 2024arXivPDFCode

HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid

Xinyu Xu, Yizheng Zhang, Yong-Lu Li, Lei Han, Cewu Lu
arXiv 2024arXivPDFCode

Low-Rank Similarity Mining for Multimodal Dataset Distillation

Yue Xu, Zhilin Lin, Yusong Qiu, Cewu Lu, Yong-Lu Li#

Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement

Ziyu Wang*, Yue Xu*, Cewu Lu, Yong-Lu Li#

From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding

Yong-Lu Li*, Xiaoqian Wu*, Xinpeng Liu, Yiming Dou, Yikun Ji, Junyi Zhang, Yixing Li, Xudong Lu, Jingru Tan, Cewu Lu
CVPR 2024, HighlightarXivPDFProjectCode

Primitive-based 3D Human-Object Interaction Modelling and Programming

Siqi Liu, Yong-Lu Li#, Zhou Fang, Xinpeng Liu, Yang You, Cewu Lu#

Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning

Xiaoqian Wu, Yong-Lu Li#, Jianhua Sun, Cewu Lu#
NeurIPS 2023arXivPDFProjectCode

Beyond Object Recognition: A New Benchmark towards Object Concept Learning

Yong-Lu Li, Yue Xu, Xinyu Xu, Xiaohan Mao, Yuan Yao, Siqi Liu, Cewu Lu

EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding

Yue Xu, Yong-Lu Li#, Zhemin Huang, Michael Xu LIU, Cewu Lu, Yu-Wing Tai, Chi Keung Tang.

Dynamic Context Removal: A General Training Strategy for Robust Models on Video Action Predictive Tasks

Xinyu Xu, Yong-Lu Li#, Cewu Lu#.
IJCV 2023arXivPDFCode

Discovering A Variety of Objects in Spatio-Temporal Human-Object Interactions

Yong-Lu Li*, Hongwei Fan*, Zuoyu Qiu, Yiming Dou, Liang Xu, Hao-Shu Fang, Peiyang Guo, Haisheng Su, Dongliang Wang, Wei Wu, Cewu Lu
A part of the HAKE Project

HAKE: Human Activity Knowledge Engine

Yong-Lu Li, Liang Xu, Xinpeng Liu, Xijie Huang, Yue Xu, Mingyang Chen, Ze Ma, Shiyi Wang, Hao-Shu Fang, Cewu Lu
Tech ReportHAKE1.0arXivPDFProjectCode
Main Repo: HAKE Star
Sub-repos: Torch StarTF StarHAKE-AVA Star
Halpe StarHOI List Star

HAKE: A Knowledge Engine Foundation for Human Activity Understanding

Yong-Lu Li, Xinpeng Liu, Xiaoqian Wu, Yizhuo Li, Zuoyu Qiu, Liang Xu, Yue Xu, Hao-Shu Fang, Cewu Lu
TPAMI 2023HAKE2.0arXivPDFProjectCodePress

AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time

Hao-Shu Fang*, Jiefeng Li*, Hongyang Tang, Chao Xu, Haoyi Zhu, Yuliang Xiu, Yong-Lu Li, Cewu Lu
TPAMI 2022arXivPDFCodeStar

Constructing Balance from Imbalance for Long-tailed Image Recognition

Yue Xu*, Yong-Lu Li*, Jiefeng Li, Cewu Lu
ECCV 2022DLSAarXivPDFCodeStar

Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection

Xiaoqian Wu*, Yong-Lu Li*, Xinpeng Liu, Junyi Zhang, Yuzhe Wu, Cewu Lu
ECCV 2022PartMaparXivPDFCodeStar

Interactiveness Field of Human-Object Interactions

Xinpeng Liu*, Yong-Lu Li*, Xiaoqian Wu, Yu-Wing Tai, Cewu Lu, Chi Keung Tang
CVPR 2022arXivPDFCodeStar

Human Trajectory Prediction with Momentary Observation

Jianhua Sun, Yuxuan Li, Liang Chai, Hao-Shu Fang, Yong-Lu Li, Cewu Lu
CVPR 2022PDF

Learn to Anticipate Future with Dynamic Context Removal

Xinyu Xu, Yong-Lu Li, Cewu Lu
CVPR 2022DCRarXivPDFCodeStar

Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes

Yang You, Zelin Ye, Yujing Lou, Chengkun Li, Yong-Lu Li, Lizhuang Ma, Weiming Wang, Cewu Lu
CVPR 2022arXivPDFCodeStar

UKPGAN: Unsupervised KeyPoint GANeration

Yang You, Wenhai Liu, Yong-Lu Li, Weiming Wang, Cewu Lu
CVPR 2022arXivPDFCodeStar

Highlighting Object Category Immunity for the Generalization of Human-Object Interaction Detection

Xinpeng Liu*, Yong-Lu Li*, Cewu Lu
AAAI 2022arXivPDFCodeStar

Learning Single/Multi-Attribute of Object with Symmetry and Group

Yong-Lu Li, Yue Xu, Xinyu Xu, Xiaohan Mao, Cewu Lu
TPAMI 2021SymNetarXivPDFCodeStar
An extension of our CVPR 2020 work (Symmetry and Group in Attribute-Object Compositions, SymNet).

Localization with Sampling-Argmax

Jiefeng Li, Tong Chen, Ruiqi Shi, Yujing Lou, Yong-Lu Li, Cewu Lu
NeurIPS 2021arXivPDFCodeStar

Transferable Interactiveness Knowledge for Human-Object Interaction Detection

Yong-Lu Li, Xinpeng Liu, Xiaoqian Wu, Xijie Huang, Liang Xu, Cewu Lu
TPAMI 2021TIN++arXivPDFCodeStar
An extension of our CVPR 2019 work (Transferable Interactiveness Network, TIN).

DecAug: Augmenting HOI Detection via Decomposition

Yichen Xie, Hao-Shu Fang, Dian Shao, Yong-Lu Li, Cewu Lu
AAAI 2021arXivPDF

HOI Analysis: Integrating and Decomposing Human-Object Interaction

Yong-Lu Li*, Xinpeng Liu*, Xiaoqian Wu, Yizhuo Li, Cewu Lu

PaStaNet: Toward Human Activity Knowledge Engine

Yong-Lu Li, Liang Xu, Xinpeng Liu, Xijie Huang, Yue Xu, Shiyi Wang, Hao-Shu Fang, Ze Ma, Mingyang Chen, Cewu Lu.

Oral Talk: Compositionality in Computer Vision in CVPR 2020

Detailed 2D-3D Joint Representation for Human-Object Interaction

Yong-Lu Li, Xinpeng Liu, Han Lu, Shiyi Wang, Junqi Liu, Jiefeng Li, Cewu Lu

Symmetry and Group in Attribute-Object Compositions

Yong-Lu Li, Yue Xu, Xiaohan Mao, Cewu Lu
CVPR 2020SymNetarXivPDFVideoSlidesCodeStar

InstaBoost: Boosting Instance Segmentation Via Probability Map Guided Copy-Pasting

Hao-Shu Fang*, Jianhua Sun*, Runzhong Wang*, Minghao Gou, Yong-Lu Li, Cewu Lu
ICCV 2019arXivPDFCodeStar

Transferable Interactiveness Knowledge for Human-Object Interaction Detection

Yong-Lu Li, Siyuan Zhou, Xijie Huang, Liang Xu, Ze Ma, Hao-Shu Fang, Yan-Feng Wang, Cewu Lu
CVPR 2019TINarXivPDFCodeStar

SRDA: Generating Instance Segmentation Annotation via Scanning, Reasoning and Domain Adaptation

Wenqiang Xu*, Yong-Lu Li*, Cewu Lu

Beyond Holistic Object Recognition: Enriching Image Understanding with Part States

Cewu Lu, Hao Su, Yong-Lu Li, Yongyi Lu, Li Yi, Chi-Keung Tang, Leonidas J. Guibas
CVPR 2018PDF

Optimization of Radial Distortion Self-Calibration for Structure from Motion from Uncalibrated UAV Images

Yong-Lu Li, Yinghao Cai, Dayong Wen, Yiping Yang
ICPR 2016PDF

People

Cewu Lu
Professor
Yong-Lu Li
Assistant Professor
Xinpeng Liu
PhD. Student
Yue Xu
PhD. Student
Xiaoqian Wu
PhD. Student
Siqi Liu
PhD. Student
Hong Li
PhD. Student
Zehao Wang
Ph.D. Student
Zixuan Chen
Ph.D. Student
Yusong Qiu
Master Student
Yushun Xiang
Master Student

Alumni:

Yuyang Zhang: EIAS & SJTU, Ph.D.
Zhilin Lin: SJTU, Ph.D.
Zili Lin: EIAS & SJTU, Ph.D.
Yifan Shi: EIAS & SJTU, Ph.D.
Quanquan Peng: UW, Intern
Haowen Hou: UCSD, Intern
Zizheng Zhou: UC Merced, Intern
Jiting Cai: UMass, Intern
Yixing Li: CUHK, Ph.D.
Mingyu Liu: ZJU, Ph.D.
Kaitong Cui: HKU, Intern
Yiming Dou: UMich, Ph.D.
Xiaohan Mao: Shanghai AI Lab & SJTU, Ph.D.
Zhemin Huang: Stanford University, MS
Shaopeng Guo: UCSD, Ph.D.
Xudong Lu: CUHK, Ph.D.
Hongwei Fan: Sensetime, Research Engineer
Yuan Yao: U of Rochester, Ph.D.
Zuoyu Qiu: SJTU, MS
Han Lu: SJTU, Ph.D.
Zhanke Zhou: HKBU, Ph.D.
Mingyang Chen: UCSD, MS
Liang Xu: EIAS & SJTU, Ph.D.
Ze Ma: Columbia University, MS
Xijie Huang: HKUST, Ph.D.
© Copyright 2022 MVIG-RHOS • Based on tbakerx
\ No newline at end of file diff --git a/joint_learning.html b/joint_learning.html index 48f1618..73d43b1 100644 --- a/joint_learning.html +++ b/joint_learning.html @@ -1,8 +1,8 @@ -Human Robot Joint Learning

Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition

MVIG-RHOS, SJTU

demodemodemo
Employing a teleoperation system for gathering demonstrations offers the potential for more efficient learning of robot manipulation. However, teleoperating a robot arm equipped with a dexterous hand or gripper, via a teleoperation system poses significant challenges due to its high dimensionality, complex motions, and differences in physiological structure. In this study, we introduce a novel system for joint learning between human operators and robots, that enables human operators to share control of a robot end-effector with a learned assistive agent, facilitating simultaneous human demonstration collection and robot manipulation teaching. In this setup, as data accumulates, the assistive agent gradually learns. Consequently, less human effort and attention are required, enhancing the efficiency of the data collection process. It also allows the human operator to adjust the control ratio to achieve a trade-off between manual and automated control. We conducted experiments in both simulated environments and physical real-world settings. Through user studies and quantitative evaluations, it is evident that the proposed system could enhance data collection efficiency and reduce the need for human adaptation while ensuring the collected data is of sufficient quality for downstream tasks.

Demo

News and Olds

[2023.4] Our paper is available on arXiv.
[2023.3] Trail run

Results

pentagon

Download

Our data and code will come very soon!

Publications

Before using our data and code in your project, please cite:
@misc{luo2024humanagentjointlearningefficient,
+Human Robot Joint Learning

Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition

MVIG-RHOS, SJTU

demodemodemo
Employing a teleoperation system for gathering demonstrations offers the potential for more efficient learning of robot manipulation. However, teleoperating a robot arm equipped with a dexterous hand or gripper, via a teleoperation system poses significant challenges due to its high dimensionality, complex motions, and differences in physiological structure. In this study, we introduce a novel system for joint learning between human operators and robots, that enables human operators to share control of a robot end-effector with a learned assistive agent, facilitating simultaneous human demonstration collection and robot manipulation teaching. In this setup, as data accumulates, the assistive agent gradually learns. Consequently, less human effort and attention are required, enhancing the efficiency of the data collection process. It also allows the human operator to adjust the control ratio to achieve a trade-off between manual and automated control. We conducted experiments in both simulated environments and physical real-world settings. Through user studies and quantitative evaluations, it is evident that the proposed system could enhance data collection efficiency and reduce the need for human adaptation while ensuring the collected data is of sufficient quality for downstream tasks.

Demo

News and Olds

[2023.4] Our paper is available on arXiv.
[2023.3] Trail run

Results

pentagon

Download

Our data and code will come very soon!

Publications

Before using our data and code in your project, please cite:
@misc{luo2024humanagentjointlearningefficient,
       title={Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition}, 
       author={Shengcheng Luo and Quanquan Peng and Jun Lv and Kaiwen Hong and Katherine Rose Driggs-Campbell and Cewu Lu and Yong-Lu Li},
       year={2024},
       eprint={2407.00299},
       archivePrefix={arXiv},
       primaryClass={cs.RO},
-}
© Copyright 2022 MVIG-RHOS • Based on tbakerx
\ No newline at end of file +}
© Copyright 2022 MVIG-RHOS • Based on tbakerx
\ No newline at end of file diff --git a/ocl.html b/ocl.html index 39a6b8f..45c040b 100644 --- a/ocl.html +++ b/ocl.html @@ -1,7 +1,7 @@ -OCL

OCL: Object Concept Learning

MVIG-RHOS, SJTU

demo
Understanding objects is a central building block of artificial intelligence, especially for embodied AI. Even though object recognition excels with deep learning, current machines still struggle to learn higher-level knowledge, e.g., what attributes does an object have, what can we do with an object. In this work, we propose a challenging Object Concept Learning (OCL) task to push the envelope of object understanding. It requires machines to reason out object affordances and simultaneously give the reason: what attributes make an object possesses these affordances. To support OCL, we build a densely annotated knowledge base including extensive labels for three levels of object concept: categories, attributes, and affordances, together with their causal relations. By analyzing the causal structure of OCL, we present a strong baseline, Object Concept Reasoning Network (OCRN). It leverages causal intervention and concept instantiation to infer the three levels following their causal relations.

Demo

Left-top: object (in yellow box)

Right-top: key causal graph

Left-bottom: affordance prediction score

Right-bottom: key causal relations

Full demo on Youtube

Full demo on BiliBili

News and Olds

[2023.11] We release the code and data of OCL on Github.
[2023.07] OCL will appear at ICCV 2023.
[2022.12] Our preprint paper is available on arXiv.

Download

Our code and full data are available on Github.

Publications

To use our data and code in your project, please cite:
@inproceedings{li2023beyond,
+OCL

OCL: Object Concept Learning

MVIG-RHOS, SJTU

demo
Understanding objects is a central building block of artificial intelligence, especially for embodied AI. Even though object recognition excels with deep learning, current machines still struggle to learn higher-level knowledge, e.g., what attributes does an object have, what can we do with an object. In this work, we propose a challenging Object Concept Learning (OCL) task to push the envelope of object understanding. It requires machines to reason out object affordances and simultaneously give the reason: what attributes make an object possesses these affordances. To support OCL, we build a densely annotated knowledge base including extensive labels for three levels of object concept: categories, attributes, and affordances, together with their causal relations. By analyzing the causal structure of OCL, we present a strong baseline, Object Concept Reasoning Network (OCRN). It leverages causal intervention and concept instantiation to infer the three levels following their causal relations.

Demo

Left-top: object (in yellow box)

Right-top: key causal graph

Left-bottom: affordance prediction score

Right-bottom: key causal relations

Full demo on Youtube

Full demo on BiliBili

News and Olds

[2023.11] We release the code and data of OCL on Github.
[2023.07] OCL will appear at ICCV 2023.
[2022.12] Our preprint paper is available on arXiv.

Download

Our code and full data are available on Github.

Publications

To use our data and code in your project, please cite:
@inproceedings{li2023beyond,
   title={Beyond Object Recognition: A New Benchmark towards Object Concept Learning},
   author={Li, Yong-Lu and Xu, Yue and Xu, Xinyu and Mao, Xiaohan and Yao, Yuan and Liu, Siqi and Lu, Cewu},
   booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
   pages={20029--20040},
   year={2023}
-}

Disclaimer

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
In our database, 75,578 images and their anntations are extracted from existing datasets (COCOa, ImageNet-150K, aPY, SUN). 4,885 images are from internet. We only provide image links for research purposes.

© Copyright 2022 MVIG-RHOS • Based on tbakerx
\ No newline at end of file +}

Disclaimer

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
In our database, 75,578 images and their anntations are extracted from existing datasets (COCOa, ImageNet-150K, aPY, SUN). 4,885 images are from internet. We only provide image links for research purposes.

© Copyright 2022 MVIG-RHOS • Based on tbakerx
\ No newline at end of file diff --git a/pangea.html b/pangea.html index 10a89b8..b88c6b0 100644 --- a/pangea.html +++ b/pangea.html @@ -1,7 +1,7 @@ -Pangea

Pangea: Unified Semantic Space for Human Action Understanding

MVIG-RHOS, SJTU

CVPR 2024, Highlight

demo
Action understanding matters and attracts attention. It can be formed as the mapping from the action physical space to the semantic space. Typically, researchers built action datasets according to idiosyncratic choices to define classes and push the envelope of benchmarks respectively. Thus, datasets are incompatible with each other like “Isolated Islands” due to semantic gaps and various class granularities, e.g., do housework in dataset A and wash plate in dataset B. We argue that a more principled semantic space is an urgent need to concentrate the community efforts and enable us to use all datasets together to pursue generalizable action learning. To this end, we design a Poincare action semantic space in view of verb taxonomy hierarchy and covering massive actions. By aligning the classes of previous datasets to our semantic space, we gather (image/video/skeleton/MoCap) datasets into a unified database in a unified label system, i.e., bridging “isolated islands” into a “Pangea”. Accordingly, we propose a bidirectional mapping model between physical and semantic space to fully use Pangea. In extensive experiments, our method shows significant superiority, especially in transfer learning.

Demo

Top: video frames

Left-bottom: semantic prediction visualization

Right-bottom: semantic prediction details

News and Olds

[2024.1] Our code and data will be available at Code.
[2023.4] Our paper is available at arXiv.
[2023.3] Trail run

Results

pentagon

Download

[arXiv], [Code]

Publications

Before using our data and code in your project, please cite:
@ARTICLE{li2023isolated,
+Pangea

Pangea: Unified Semantic Space for Human Action Understanding

MVIG-RHOS, SJTU

CVPR 2024, Highlight

demo
Action understanding matters and attracts attention. It can be formed as the mapping from the action physical space to the semantic space. Typically, researchers built action datasets according to idiosyncratic choices to define classes and push the envelope of benchmarks respectively. Thus, datasets are incompatible with each other like “Isolated Islands” due to semantic gaps and various class granularities, e.g., do housework in dataset A and wash plate in dataset B. We argue that a more principled semantic space is an urgent need to concentrate the community efforts and enable us to use all datasets together to pursue generalizable action learning. To this end, we design a Poincare action semantic space in view of verb taxonomy hierarchy and covering massive actions. By aligning the classes of previous datasets to our semantic space, we gather (image/video/skeleton/MoCap) datasets into a unified database in a unified label system, i.e., bridging “isolated islands” into a “Pangea”. Accordingly, we propose a bidirectional mapping model between physical and semantic space to fully use Pangea. In extensive experiments, our method shows significant superiority, especially in transfer learning.

Demo

Top: video frames

Left-bottom: semantic prediction visualization

Right-bottom: semantic prediction details

News and Olds

[2024.1] Our code and data will be available at Code.
[2023.4] Our paper is available at arXiv.
[2023.3] Trail run

Results

pentagon

Download

[arXiv], [Code]

Publications

Before using our data and code in your project, please cite:
@ARTICLE{li2023isolated,
   title={From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding},
   author={Li, Yong-Lu and Wu, Xiaoqian and Liu, Xinpeng and Dou, Yiming and Ji, Yikun 
     and Zhang, Junyi and Li, Yixing and Tan, Jingru and Lu, Xudong and Lu, Cewu},
   journal={arXiv preprint arXiv:2304.00553},
   year={2023},
-}
© Copyright 2022 MVIG-RHOS • Based on tbakerx
\ No newline at end of file +}
© Copyright 2022 MVIG-RHOS • Based on tbakerx
\ No newline at end of file diff --git a/recruit.html b/recruit.html index 9b5457c..7d6aeff 100644 --- a/recruit.html +++ b/recruit.html @@ -1 +1 @@ -Recruit

MVIG-RHOS招新

各位同学大家好!

  我是李永露(长聘教轨助理教授,博导),感谢关注上海交大RHOS实验室,RHOS隶属于电子信息与电气工程学院-清源研究院和MVIG实验室(与卢策吾教授co-supervise),我的基本情况如下:

  •   •在人工智能领域发表论文30余篇(TPAMI, NeurIPS, CVPR, ICCV, ECCV, etc);
  •   •开发了开源系统HAKE,探索以知识驱动的推理方式理解人类行为,其官网得到全球十几万次访问;
  •   •NeurIPS Area Chair,曾获得NeurIPS杰出审稿人(2020、2021)、百度奖学金、WAIC云帆奖(明日之星、璀璨明星)、吴文俊奖优博(中国人工智能学会)、上海市优秀毕业生、2020华人学生AI百人(机器学习top-10)、上交85-杨元庆基金优秀博士等。

实验室简介

  我们的目标是做出类似C-3PO和R2-D2的智能(真实/模拟)机器人。实验室的研究方向包括:

  1. 1. 具身智能:如何让机器人学习人类的技能并与人类交互?

      a. 人类行为理解:如何从多模态信息(2D-3D-4D)中学习和理解复杂、模糊的人类行为(身体运动,人-物体/人/场景 交互)和物体概念;

      b. 视觉推理: 如何从人类行为中挖掘、捕获和编码逻辑、因果关系;

      c. 通用多模态基础模型:特别是以人为中心的感知与理解任务;

      d. 基于认知的行为理解:与跨学科研究团队合作,探索大脑如何感知人类行为;

  2. 2. 人机交互(如:智能医院):与上海交通大学的医疗团队(医生和工程师)合作,开发辅助人类的机器人。

我们有非常高水平的博士/硕士团队,以及具有浓厚科研兴趣的实习生(数据截至2024.4):

  ● 10+同学曾是交大本科所在系前五名;

  ● 6名本科实习生以一作发表顶会(CVPR、ECCV);

  ● 27名本科实习生以co-author发表顶会/顶刊(TPAMI、CVPR、NeurIPS、ECCV、ICCV);

  ● 2名本科实习生前往Stanford, UCB等北美CS四大深造(硕博)。

  ● 2名本科实习生获得商汤奖学金(全国25人/年)。


我们追求自由与理性,努力营造团结、紧张、严肃、活泼的研究氛围:

  ● 每周组会有一对一小组讨论两种形式,根据实际需求进行。

  ● 论文方面,我们追求顶级机器学习、计算机视觉、机器人方向的会议和期刊(如ICLR、 NeurlPS、ICML、CVPR、ICCV、ECCV、ICRA、CoRL、RSS、TPAMI、IJRR, T-RO等)。

  ● 与工业届有密切的合作,积极推动我们的研究工作落地应用,也可以推荐实习生前往人工智能和机器人的工业界团队(如腾讯、华为、快手、商汤、非夕等)实习。

  ● 足够的研究资金和硬件设施(50+ GPUs、机器人、VR、人体工程设备)。


招生

   实验室每年会招收多名博士生及硕士生,同时长期招收实习生(本硕博均有)。 英语水平、科研经历、代码能力和深度学习基础将会是你的加分项,但不是必需项。 只要你有强烈的自驱力和科研兴趣,能展示出有价值的科学研究潜力或优秀工程技能,欢迎加入我们!


本科生培养

我们致力于“world-class research, world-class students”,给学生予充分的指导。我们的培养分为四个阶段:

  ● Stage-0: 培训掌握基本算法和技术,包括深度学习/机器人基础课程的学习、基础工具的学习等。 可参见Research_in_RHOS

  ● Stage-1: 参与研究课题,在老师和高年级学生的带领下,发表一篇co-author论文,体验完整的研究过程: 好奇心--问题--思考--验证实验--动机--洞察力--反复试验--写论文--提交论文--如果不幸拒稿修改后recycle--准备最终版--发表论文--参加会议。

  ● Stage-2: 提出一个好的问题,在指导下每周推进,独立发表一作顶会论文。

  ● Stage-3: 提出一个高水平的idea,独立进行研究,与老师同学们互相学习。


未来发展

  ● 保研/直博:实习期间表现优异的同学优先保研,目前实验室研究生大部分来自本科实习生。 若实习期内表现突出且能够获得推免资格,实验室会竭尽全力为大家提供组内的保研机会。

  ● 出国申请:每年实验室会帮助很多实习生同学准备申请工作, 包括科研训练、申请材料准备、暑研学校选择、北美导师选择、推荐信等, 实验室也已有很多同学去Stanford、MIT、CMU等北美名校的人工智能和机器人实验室暑研、读博。

我们的工作

Robot Brain

从数据和任务端重新定义具身智能学习的范式, 在有限、多模态、noisy、异构数据的条件下, 研究可泛化、可解释且具有推理能力的机器人“大脑”,与现有技术配合实现智能机器人应用。

egopca
egopca

HAKE

推理驱动的人类行为知识系统:使智能体能够感知人类行为、推理人类行为逻辑、从人类行为中学习技能,并与物体和环境进行交互。

hake

OCL

物体概念学习:提出涉及物体属性、可供性等与人类行为紧密相关的物体概念, 以推动机器对物体的理解,并基于因果图模型提出了因果推理基准和基线模型。

Full demo on BiliBili

Pangea

根据动词分类层次结构设计了动作语义空间,涵盖了大量人类行为,从而将多模态数据集聚合到一个统一的数据池中, 使用统一的标签系统。相应地,提出了一个在物理空间和语义空间之间进行双向映射的模型,以促进行为理解领域的“统一度量衡”。


EgoPCA

提出了一个全新的手-物体交互理解框架,通过对手-物体交互数据的探测、分析和重采样, 提供了更平衡而全面的预训练集、测试集和测试基准,并使用专门针对手-物体交互的预训练策略以及下游微调有效机制, 推动手-物交互理解的发展,以助力机器人物体操作技能学习。

egopca
© Copyright 2022 MVIG-RHOS • Based on tbakerx
\ No newline at end of file +Recruit

MVIG-RHOS招新

各位同学大家好!

  我是李永露(长聘教轨助理教授,博导),感谢关注上海交大RHOS实验室,RHOS隶属于电子信息与电气工程学院-清源研究院和MVIG实验室(与卢策吾教授co-supervise),我的基本情况如下:

  •   •在人工智能领域发表论文30余篇(TPAMI, NeurIPS, CVPR, ICCV, ECCV, etc);
  •   •开发了开源系统HAKE,探索以知识驱动的推理方式理解人类行为,其官网得到全球十几万次访问;
  •   •NeurIPS Area Chair,曾获得NeurIPS杰出审稿人(2020、2021)、百度奖学金、WAIC云帆奖(明日之星、璀璨明星)、吴文俊奖优博(中国人工智能学会)、上海市优秀毕业生、2020华人学生AI百人(机器学习top-10)、上交85-杨元庆基金优秀博士等。

实验室简介

  我们的目标是做出类似C-3PO和R2-D2的智能(真实/模拟)机器人。实验室的研究方向包括:

  1. 1. 具身智能:如何让机器人学习人类的技能并与人类交互?

      a. 人类行为理解:如何从多模态信息(2D-3D-4D)中学习和理解复杂、模糊的人类行为(身体运动,人-物体/人/场景 交互)和物体概念;

      b. 视觉推理: 如何从人类行为中挖掘、捕获和编码逻辑、因果关系;

      c. 通用多模态基础模型:特别是以人为中心的感知与理解任务;

      d. 基于认知的行为理解:与跨学科研究团队合作,探索大脑如何感知人类行为;

  2. 2. 人机交互(如:智能医院):与上海交通大学的医疗团队(医生和工程师)合作,开发辅助人类的机器人。

我们有非常高水平的博士/硕士团队,以及具有浓厚科研兴趣的实习生(数据截至2024.4):

  ● 10+同学曾是交大本科所在系前五名;

  ● 6名本科实习生以一作发表顶会(CVPR、ECCV);

  ● 27名本科实习生以co-author发表顶会/顶刊(TPAMI、CVPR、NeurIPS、ECCV、ICCV);

  ● 2名本科实习生前往Stanford, UCB等北美CS四大深造(硕博)。

  ● 2名本科实习生获得商汤奖学金(全国25人/年)。


我们追求自由与理性,努力营造团结、紧张、严肃、活泼的研究氛围:

  ● 每周组会有一对一小组讨论两种形式,根据实际需求进行。

  ● 论文方面,我们追求顶级机器学习、计算机视觉、机器人方向的会议和期刊(如ICLR、 NeurlPS、ICML、CVPR、ICCV、ECCV、ICRA、CoRL、RSS、TPAMI、IJRR, T-RO等)。

  ● 与工业届有密切的合作,积极推动我们的研究工作落地应用,也可以推荐实习生前往人工智能和机器人的工业界团队(如腾讯、华为、快手、商汤、非夕等)实习。

  ● 足够的研究资金和硬件设施(50+ GPUs、机器人、VR、人体工程设备)。


招生

   实验室每年会招收多名博士生及硕士生,同时长期招收实习生(本硕博均有)。 英语水平、科研经历、代码能力和深度学习基础将会是你的加分项,但不是必需项。 只要你有强烈的自驱力和科研兴趣,能展示出有价值的科学研究潜力或优秀工程技能,欢迎加入我们!


本科生培养

我们致力于“world-class research, world-class students”,给学生予充分的指导。我们的培养分为四个阶段:

  ● Stage-0: 培训掌握基本算法和技术,包括深度学习/机器人基础课程的学习、基础工具的学习等。 可参见Research_in_RHOS

  ● Stage-1: 参与研究课题,在老师和高年级学生的带领下,发表一篇co-author论文,体验完整的研究过程: 好奇心--问题--思考--验证实验--动机--洞察力--反复试验--写论文--提交论文--如果不幸拒稿修改后recycle--准备最终版--发表论文--参加会议。

  ● Stage-2: 提出一个好的问题,在指导下每周推进,独立发表一作顶会论文。

  ● Stage-3: 提出一个高水平的idea,独立进行研究,与老师同学们互相学习。


未来发展

  ● 保研/直博:实习期间表现优异的同学优先保研,目前实验室研究生大部分来自本科实习生。 若实习期内表现突出且能够获得推免资格,实验室会竭尽全力为大家提供组内的保研机会。

  ● 出国申请:每年实验室会帮助很多实习生同学准备申请工作, 包括科研训练、申请材料准备、暑研学校选择、北美导师选择、推荐信等, 实验室也已有很多同学去Stanford、MIT、CMU等北美名校的人工智能和机器人实验室暑研、读博。

我们的工作

Robot Brain

从数据和任务端重新定义具身智能学习的范式, 在有限、多模态、noisy、异构数据的条件下, 研究可泛化、可解释且具有推理能力的机器人“大脑”,与现有技术配合实现智能机器人应用。

egopca
egopca

HAKE

推理驱动的人类行为知识系统:使智能体能够感知人类行为、推理人类行为逻辑、从人类行为中学习技能,并与物体和环境进行交互。

hake

OCL

物体概念学习:提出涉及物体属性、可供性等与人类行为紧密相关的物体概念, 以推动机器对物体的理解,并基于因果图模型提出了因果推理基准和基线模型。

Full demo on BiliBili

Pangea

根据动词分类层次结构设计了动作语义空间,涵盖了大量人类行为,从而将多模态数据集聚合到一个统一的数据池中, 使用统一的标签系统。相应地,提出了一个在物理空间和语义空间之间进行双向映射的模型,以促进行为理解领域的“统一度量衡”。


EgoPCA

提出了一个全新的手-物体交互理解框架,通过对手-物体交互数据的探测、分析和重采样, 提供了更平衡而全面的预训练集、测试集和测试基准,并使用专门针对手-物体交互的预训练策略以及下游微调有效机制, 推动手-物交互理解的发展,以助力机器人物体操作技能学习。

egopca
© Copyright 2022 MVIG-RHOS • Based on tbakerx
\ No newline at end of file diff --git a/symbol_llm.html b/symbol_llm.html index 14dd3e9..90bbfba 100644 --- a/symbol_llm.html +++ b/symbol_llm.html @@ -1,7 +1,7 @@ -SymbolLLM

Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning

MVIG-RHOS, SJTU

demo
Human reasoning can be understood as a cooperation between the intuitive, associative ``System-1'' and the deliberative, logical ``System-2''. For existing System-1-like methods in visual activity understanding, it is crucial to integrate System-2 processing to improve explainability, generalization, and data efficiency. One possible path of activity reasoning is building a symbolic system composed of symbols and rules, where one rule connects multiple symbols, implying human knowledge and reasoning abilities. Previous methods have made progress, but are defective with limited symbols from handcraft and limited rules from visual-based annotations, failing to cover the complex patterns of activities and lacking compositional generalization. To overcome the defects, we propose a new symbolic system with two ideal important properties: broad-coverage symbols and rational rules. Collecting massive human knowledge via manual annotations is expensive to instantiate this symbolic system. Instead, we leverage the recent advancement of LLMs (Large Language Models) as an approximation of the two ideal properties, i.e., Symbols from Large Language Models (Symbol-LLM). Then, given an image, visual contents from the images are extracted and checked as symbols and activity semantics are reasoned out based on rules via fuzzy logic calculation. Our method shows superiority in extensive activity understanding tasks.

News and Olds

[2023.12] arXiv Released.
[2023.11] Code Released.
[2023.10] Symbol-LLM will appear at NeurIPS 2023.

Publications

If you find our paper, data or code usefull, please cite:
@inproceedings{wu2023symbol,
+SymbolLLM

Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning

MVIG-RHOS, SJTU

demo
Human reasoning can be understood as a cooperation between the intuitive, associative ``System-1'' and the deliberative, logical ``System-2''. For existing System-1-like methods in visual activity understanding, it is crucial to integrate System-2 processing to improve explainability, generalization, and data efficiency. One possible path of activity reasoning is building a symbolic system composed of symbols and rules, where one rule connects multiple symbols, implying human knowledge and reasoning abilities. Previous methods have made progress, but are defective with limited symbols from handcraft and limited rules from visual-based annotations, failing to cover the complex patterns of activities and lacking compositional generalization. To overcome the defects, we propose a new symbolic system with two ideal important properties: broad-coverage symbols and rational rules. Collecting massive human knowledge via manual annotations is expensive to instantiate this symbolic system. Instead, we leverage the recent advancement of LLMs (Large Language Models) as an approximation of the two ideal properties, i.e., Symbols from Large Language Models (Symbol-LLM). Then, given an image, visual contents from the images are extracted and checked as symbols and activity semantics are reasoned out based on rules via fuzzy logic calculation. Our method shows superiority in extensive activity understanding tasks.

News and Olds

[2023.12] arXiv Released.
[2023.11] Code Released.
[2023.10] Symbol-LLM will appear at NeurIPS 2023.

Publications

If you find our paper, data or code usefull, please cite:
@inproceedings{wu2023symbol,
   title={Symbol-LLM: Leverage Language Models for Symbolic System in Visual 
   Human Activity Reasoning},
   author={Wu, Xiaoqian and Li, Yong-Lu and Sun, Jianhua and Lu, Cewu},
   booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
   year={2023}
-}
© Copyright 2022 MVIG-RHOS • Based on tbakerx
\ No newline at end of file +}
© Copyright 2022 MVIG-RHOS • Based on tbakerx
\ No newline at end of file diff --git a/video-distill.html b/video-distill.html index 64af030..9579bf5 100644 --- a/video-distill.html +++ b/video-distill.html @@ -1,6 +1,6 @@ -SDD

Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement

MVIG-RHOS, SJTU

teaset
Recently, dataset distillation has paved the way towards efficient machine learning, especially for image datasets. However, the distillation for videos, characterized by an exclusive temporal dimension, remains an underexplored domain. In this work, we provide the first systematic study of video distillation and introduce a taxonomy to categorize temporal compression. Our investigation reveals that the temporal information is usually not well learned during distillation, and the temporal dimension of synthetic data contributes little. The observations motivate our unified framework of disentangling the dynamic and static information in the videos. It first distills the videos into still images as static memory and then compensates the dynamic and motion information with a learnable dynamic memory block. Our method achieves state-of-the-art on video datasets at different scales, with notably smaller storage expenditure. Our code will be publicly available.

Method and Results

Overview of our method

demo

The synthetic videos are learned in two stages:
Stage 1: static memory learning with image distillation on one frame per video.
Stage 2: the static (frozen) and dynamic memory are combined.

Results

demo
demo

Publications

If you find our paper, data or code usefull, please cite:
@article{wang2023dancing,
+SDD

Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement

MVIG-RHOS, SJTU

teaset
Recently, dataset distillation has paved the way towards efficient machine learning, especially for image datasets. However, the distillation for videos, characterized by an exclusive temporal dimension, remains an underexplored domain. In this work, we provide the first systematic study of video distillation and introduce a taxonomy to categorize temporal compression. Our investigation reveals that the temporal information is usually not well learned during distillation, and the temporal dimension of synthetic data contributes little. The observations motivate our unified framework of disentangling the dynamic and static information in the videos. It first distills the videos into still images as static memory and then compensates the dynamic and motion information with a learnable dynamic memory block. Our method achieves state-of-the-art on video datasets at different scales, with notably smaller storage expenditure. Our code will be publicly available.

Method and Results

Overview of our method

demo

The synthetic videos are learned in two stages:
Stage 1: static memory learning with image distillation on one frame per video.
Stage 2: the static (frozen) and dynamic memory are combined.

Results

demo
demo

Publications

If you find our paper, data or code usefull, please cite:
@article{wang2023dancing,
   title={Dancing with Images: Video Distillation via Static-Dynamic Disentanglement},
   author={Wang, Ziyu and Xu, Yue and Lu, Cewu and Li, Yong-Lu},
   journal={arXiv preprint arXiv:2312.00362},
   year={2023}
-}
© Copyright 2022 MVIG-RHOS • Based on tbakerx
\ No newline at end of file +}
© Copyright 2022 MVIG-RHOS • Based on tbakerx
\ No newline at end of file