Continuing editing (Not finished yet)
The goal of project is focus on Audio-driven Gesture Generation with output is 3D keypoints gesture.
Input: Audio, Text, Gesture, ..etc. -> Output: Gesture Motion
Gesture Generation is the process of generating gestures from speech or text. The goal of Gesture Generation is to generate gestures that are natural, realistic, and appropriate for the given context. The generated gestures can be used to animate virtual characters, robots, or embodied conversational agents.
ACM CCS: β’ Human-centered computing β Human computer interaction (HCI).
Paper by Folder : π/survey || π/approach || π/papers || π/dataset || π/books
- 1. Survey
- 2. Papers
- 3. Selected Approach
- 4. Pipeline
- 5. Learning objective
- 6. Metric Evaluation
- 7. Datasets
- 8. Toolkit
- 9. Playlist & Talks
- 10. Code
- 11. Books
Main resource
- paperswithcode.com/Gesture Generation
- GENEA Workshop
- github/topic/gesture-generation
- twitter/WorkshopGENEA
- Scholar/gesture generation
-
γEUROGRAPHICS 2023γA Comprehensive Review of Data-Driven Co-Speech Gesture Generation; [paper]
-
2014 - Gesture and speech in interaction: An overview ; [paper]
- γHAI 2021γSpeech-based Gesture Generation for Robots and Embodied Agents: A Scoping Review [paper]
- γIEEE 2022γ A Review of Evaluation Practices of Gesture Generation in Embodied Conversational Agents [paper] ;
- The GENEA Challenge 2020: A large, crowdsourced evaluation of gesture generation systems on common data [paper] ; [homepage] ; [youtube] ; [Svito-zar/genea_numerical_evaluations]
- The GENEA Challenge 2022: A large evaluation of data-driven co-speech gesture generation [paper] ; [homepage] ; [youtube] ; [web]
- The GENEA Challenge 2023: A large-scale evaluation of gesture [paper]
- GENEA Workshop 2021: The 2nd Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents [paper] ; [homepage]
- Evaluating gesture-generation in a large-scale open challenge - The GENEA Challenge 2022 [paper]
- The Relation of Speech and Gestures: Temporal Synchrony Follows Semantic Synchrony [paper]
- Complexity Matters E05: Complexity Matching and Synchronization between Gestures and Speech [paper]
- Easier Said Than Done? Task Difficulty's Influence on Temporal Alignment, Semantic Similarity, and Complexity Matching Between Gestures and Speech [paper]
- Advances in Visual Semiotics - The Semiotic Web [book]
- Gestural beats: The rhythm hypothesis [paper]
GENEA Challenge 2024 [Homepage]
Method (Team*) | Paper | Video | π |
---|---|---|---|
GENEA Challenge 2023 [Homepage]
Method (Team*) | Paper | Video | π |
---|---|---|---|
FineMotion | γICMI 2023γThe FineMotion entry to the GENEA Challenge 2023: DeepPhase for conversational gestures generation [paper] | [youtube] | |
Gesture Motion Graphs | γICMI 2023γGesture Motion Graphs for Few-Shot Speech-Driven Gesture Reenactment [paper] | [youtube] | |
Diffusion-based | γICMI 2023γDiffusion-based co-speech gesture generation using joint text and audio representation [paper] | [youtube] | |
UEA Digital Humans | γICMI 2023γThe UEA Digital Humans entry to the GENEA Challenge 2023 [paper] ; [JonathanPWindle/UEA-DH-GENEA23] | [youtube] | |
FEIN-Z | γICMI 2023γFEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation [paper] | [youtube] | |
DiffuseStyleGesture+ | γICMI 2023γThe DiffuseStyleGesture+ entry to the GENEA Challenge 2023 [paper] | [youtube] | π |
Discrete Diffusion | γICMI 2023γDiscrete Diffusion for Co-Speech Gesture Synthesis [paper] | [youtube] | |
KCL-SAIR | γICMI 2023γThe KCL-SAIR team's entry to the GENEA Challenge 2023 Exploring Role-based Gesture Generation in Dyadic Interactions: Listener vs. Speaker [paper] | [youtube] | |
Gesture Generation | γICMI 2023γGesture Generation with Diffusion Models Aided by Speech Activity Information [paper] | [youtube] | |
Co-Speech Gesture Generation | γICMI 2023γCo-Speech Gesture Generation via Audio and Text Feature Engineering [paper] | [youtube] | |
DiffuGesture | γICMI 2023γDiffuGesture: Generating Human Gesture From Two-person Dialogue With Diffusion Models [paper] | [youtube] | |
KU-ISPL | γICMI 2023γThe KU-ISPL entry to the GENEA Challenge 2023-A Diffusion Model for Co-speech Gesture generation [paper] | [youtube] |
GENEA Workshop 2023 - ICMI 2023 Accepted papers [Homepage]
Papers | Video | π |
---|---|---|
γICMI 2023γ MultiFacet A Multi-Tasking Framework for Speech-to-Sign Language Generation [paper] | ||
γICMI 2023γ Look What I Made It Do - The ModelIT Method for Manually Modeling Nonverbal Behavior of Socially Interactive Agents [paper] | ||
γICMI 2023γ A Methodology for Evaluating Multimodal Referring Expression Generation for Embodied Virtual Agents [paper] | ||
γICMI 2023γ Towards the generation of synchronized and believable non-verbal facial behaviors of a talking virtual agent [paper]; [aldelb/non_verbal_facial_animation] | π |
GENEA Challenge 2022 - Accepted papers [Homepage]
Team (Method) | Paper | Video | π |
---|---|---|---|
DeepMotion | γICMI 2022γThe DeepMotion entry to the GENEA Challenge 2022 [paper] | [youtube] | |
DSI | γICMI 2022γHybrid Seq2Seq Architecture for 3D Co-Speech Gesture Generation [paper] | [youtube] | |
FineMotion | γICMI 2022γReCell: replicating recurrent cell for auto-regressive pose generation [paper] [FineMotion/GENEA_2022] | [youtube] | |
Forgerons | γICMI 2022γUbisoft Exemplar-based Stylized Gesture Generation from Speech: An Entry to the GENEA Challenge 2022 [paper] | [youtube] | |
GestureMaster | γICMI 2022γGestureMaster: Graph-based Speech-driven Gesture Generation [paper] | [youtube] | |
IVI Lab | γICMI 2022γThe IVI Lab entry to the GENEA Challenge 2022 β A Tacotron2 Based Method for Co-Speech Gesture Generation With Locality-Constraint Attention Mechanism [paper] [Tacotron2-SpeechGesture] | [youtube] | π |
ReprGesture | γICMI 2022γThe ReprGesture entry to the GENEA Challenge 2022 [paper] [YoungSeng/ReprGesture] | [youtube] | |
TransGesture | γICMI 2022γTransGesture: Autoregressive Gesture Generation with RNN-Transducer [paper] | [youtube] | |
UEA Digital Humans | γICMI 2022γUEA Digital Humans entry to the GENEA Challenge 2022 [paper] [UEA/GENEA22] | [youtube] |
GENEA Workshop 2022 - ICMI 2022 Accepted papers [Homepage]
Papers | Video | π |
---|---|---|
γICMI 2022γ Understanding Intervieweesβ Perceptions and Behaviour towards Verbally and Non-verbally Expressive Virtual Interviewing Agents [paper] | [youtube] | |
γICMI 2022γ Emotional Respiration Speech Dataset [paper] | [youtube] | |
γICMI 2022γ Automatic facial expressions, gaze direction and head movements generation of a virtual agent [paper] | [youtube] | π |
γICMI 2022γ Can you tell that I'm confused? An overhearer study for German backchannels by an embodied agent [paper] | [youtube] |
GENEA Challenge 2021 - ICMI 2021 Accepted papers [Homepage]
Papers | Video | π |
---|---|---|
γICMI 2021γ Probabilistic Human-like Gesture Synthesis from Speech using GRU-based WGAN [paper] [wubowen416/gesture-generation-using-WGAN] | [youtube] | π |
γICMI 2021γ Influence of Movement Energy and Affect Priming on the Perception of Virtual Characters Extroversion and Mood [paper] | β | |
γICMI 2021γ Crossmodal clustered contrastive learning: Grounding of spoken language to gesture [paper] [dondongwon/CC_NCE_GENEA] | [youtube] |
GENEA Challenge 2020 - Accepted papers [Homepage]
Papers | Video |
---|---|
γIVA 2020γ The StyleGestures entry to the GENEA Challenge 2020 [paper] ; [[simonalexanderson/StyleGestures]] | [youtube] |
γIVA 2020γ The FineMotion entry to the GENEA Challenge 2020 [paper] ; [FineMotion/GENEA_2020] | [youtube] |
γIVA 2020γ Double-DCCCAE: Estimation of Sequential Body Motion Using Wave-Form - AlltheSmooth [paper] | [youtube] |
γIVA 2020γ CGVU: Semantics-guided 3D Body Gesture Synthesis [paper] | [youtube] |
γIVA 2020γ Interpreting and Generating Gestures with Embodied Human Computer Interactions [paper] | [youtube] |
γIVA 2020γ The Nectec Gesture Generation System entry to the GENEA Challenge 2020 [paper] | [youtube] |
- γCVPR 2024γ DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures [paper]
- γCVPR 2024γ EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling [paper]
- γCVPR 2024γ Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion [paper]
- γCVPR 2024γ Using Language-Aligned Gesture Embeddings for Understanding Gestures Accompanying Math Terms [paper]
- γSIGGRAPH 2024γSemantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis [paper] ; [video] ; [LuMen-ze/Semantic-Gesticulator-Official]
- SynTalker - Enabling Synergistic Full-Body Control in Prompt-Based Co-Speech Motion Generation [paper] ; [homepage] ; [video] ; [RobinWitch/SynTalker]
- MDT-A2G- Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation [paper] ; [homepage]
- γACM MM 2024γ MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion [paper] ; [homepage]
- γCVPR 2023γ Co-speech Gesture Synthesis by Reinforcement Learning with Contrastive Pre-trained Rewards [paper] ; [RLracer/RACER]
- γPAKDD 2023γ RLMixer: A Reinforcement Learning Approach For Integrated Ranking With Contrastive User Preference Modeling [paper]
- γIJCAI 2023γ DiffuseStyleGesture - Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models [paper] ; [YoungSeng/DiffuseStyleGesture] ; [youtube]
- γCVPR 2023γ Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation [paper] ; [Advocate99/DiffGesture]
- γCVPR 2023γ Diverse 3D Hand Gesture Prediction from Body Dynamics by Bilateral Hand Disentanglement [paper] ; [XingqunQi-lab/Diverse-3D-Hand-Gesture-Prediction]
- γCVPR 2023γ QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation [paper] ; [YoungSeng/QPGesture] ; [video]
- γCVPR 2023γ GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents [paper] ; [homepage] ; [video]
- γCVPR 2023γ Continual Learning for Personalized Co-Speech Gesture Generation [paper]; [homepage]
- γCVPR 2023γ Guided Motion Diffusion for Controllable Human Motion Synthesis [paper] ; [homepage]
- γCVPR 2023γ Sequential Texts Driven Cohesive Motions Synthesis with Natural Transitions [paper]
- γCVPR 2023γ EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation [paper]
- γCVPR 2023γ FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation [paper]
- γCVPR 2023γ Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models [paper]
- γCVPR 2023γ Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation [paper]
- γCVPR 2023γ Semi-supervised Speech-driven 3D Facial Animation via Cross-modal Encoding [paper]
- γACM MM 2023γUnifiedGesture - A Unified Gesture Synthesis Model for Multiple Skeletons [paper] ; [YoungSeng/UnifiedGesture]
- γICMI 2023γ AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis [paper] ; [hvoss-techfak/AQGT]
- DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model [paper]
- BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer [paper]
- EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation [paper] ;
- Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis [paper] ; [homepage] ; [video]
- EMoG: Synthesizing Emotive Co-speech 3D Gesture with Diffusion Model [paper]
- MRecGen: Multimodal Appropriate Reaction Generator [paper] ; [SSYSteve/MRecGen]
- Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model [paper]
- The KCL-SAIR teamβs entry to the GENEA Challenge 2023 Exploring Role-based Gesture Generation in Dyadic Interactions: Listener vs. Speaker [paper]
- The KU-ISPL entry to the GENEA Challenge 2023-A Diffusion Model for Co-speech Gesture generation [paper]
- Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation [paper]
- Co-Speech Gesture Generation via Audio and Text Feature Engineering [paper]
- Gesture Motion Graphs for Few-Shot Speech-Driven Gesture Reenactment [paper]
- Gesture Generation with Diffusion Models Aided by Speech Activity Information [paper]
- FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation [paper]
- Discrete Diffusion for Co-Speech Gesture Synthesis [paper]
- DiffuGesture: Generating Human Gesture From Two-person Dialogue With Diffusion Models [paper]
- The FineMotion entry to the GENEA Challenge 2023: DeepPhase for conversational gestures generation [paper]
- Am I listening - Evaluating theQuality of Generated Data-driven Listening Motion [paper]
- Unified speech and gesture synthesis using flow matching [paper] ; [homepage] ;
- γSIGGRAPH 2022γ A Motion Matching-based Framework for Controllable Gesture Synthesis from Speech [paper] ; [homepage]
- γICMI 2022γThe DeepMotion entry to the GENEA Challenge 2022 [paper]
- γICMI 2022γThe ReprGesture entry to the GENEA Challenge 2022 [paper] ; [YoungSeng/ReprGesture] ; [youtube]
- γCVPR 2022γ HA2G - Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation [paper] ; [alvinliu0/HA2G]
- γCVPR 2022γ SEEG - SEEG: Semantic Energized Co-Speech Gesture Generation [paper] ; [akira-l/seeg]
- γCVPR 2022γ DiffGAN - Low-Resource Adaptation for Personalized Co-Speech Gesture Generation [paper]
- γCVPR 2022γ Audio-Driven Neural Gesture Reenactment With Video Motion Graphs [paper]
- γICMI 2022γ ZeroEGGS Exemplar-based stylized gesture generation from speech: An entry to the GENEA Challenge 2022 [paper]
- γAAMAS 2022γ Multimodal analysis of the predictability of hand-gesture properties [paper]
- γICMI 2022γ GestureMaster GestureMaster: Graph-based Speech-driven Gesture Generation [paper]
- γICRA 2022γContext-Aware Body Gesture Generation for Social Robots [paper]
- γIROS 2022γGesture2Vec: Clustering Gestures using Representation Learning Methods for Co-speech Gesture Generation [paper] [pjyazdian/Gesture2Vec] ; [youtube] ; [youtube]
- Evaluating Data-Driven Co-Speech Gestures of Embodied Conversational Agents through Real-Time Interaction [paper]
- ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech [paper] ; [ubisoft/ubisoft-laforge-ZeroEGGS] ; [youtube]
- Voice2Face: Audio-Driven Facial and Tongue Rig Animations [paper] ; [youtube] ; [web]
- Deep Gesture Generation for Social Robots Using Type-Specific Libraries [paper] ; [youtube] ; [web]
- Automatic textβtoβgesture rule generation for embodied conversational agents [paper] [youtube]
- Evaluating Data-Driven Co-Speech Gestures of Embodied Conversational Agents through Real-Time Interaction [paper] ; [web]
- Towards Context-Aware Human-like Pointing Gestures with RL Motion Imitation [paper]
- Text/Speech-Driven Full-Body Animation [paper]
- Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding [paper]
- γICCV 2021γ Speech Drives Templates: Co-Speech Gesture Synthesis With Learned Templates [paper] ; shenhanqian/speechdrivestemplates ; [youtube] ; poster
- γICCV 2021γ Audio2Gestures Audio2Gestures: Generating Diverse Gestures From Speech Audio With Conditional Variational Autoencoders [paper]
- γICMI 2021γ Probabilistic Human-like Gesture Synthesis from Speech using GRU-based WGAN [paper]
- γICMI 2021γ Crossmodal Clustered Contrastive Learning: Grounding of Spoken Language to Gesture [paper] ; [dondongwon/CC_NCE_GENEA]
- γIVA 2021γ Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech [paper] ; [homepage]
- γIVA 2021γ Learning Speech-driven 3D Conversational Gestures from Video [paper]
- γCASA 2021γ ExpressGesture ExpressGesture: Expressive gesture generation from speech through database matching [paper] ; [youtube]
- γAAMAS 2021γ CMCF CCFM: An Architecture for Realtime Gesture Generation by Clustering Gestures by Communicative Function and Motion [paper]
- γIEEEVR 2021γ Text2gestures: A transformer-based network for generating emotive body gestures for virtual agents [paper] ; [UttaranB127/Text2Gestures] ; homepage
- Evaluating Data-Driven Co-Speech Gestures of Embodied Conversational Agents [paper]
- Multimodal analysis of the predictability of hand-gesture properties [paper]
- Deep Gesture Generation for Social Robots Using Type-Specific Libraries [paper]
- A Framework for Integrating Gesture Generation Models into Interactive Conversational Agents [paper] ; [youtube] ; [homepage] ; [nagyrajmund/gesturebot]
- Moving Fast and Slow: Analysis of Representations and Post-Processing in Speech-Driven Automatic Gesture Generation [paper]
- ExpressGesture: Expressive gesture generation from speech through database matching [paper]
- Passing a Non-verbal Turing Test: Evaluating Gesture Animations Generated from Speech [paper]
- γSIGGRAPH Asia 2020γ Trimodal Speech gesture generation from the trimodal context of text, audio, and speaker identity [paper] ; [ai4r/Gesture-Generation-from-Trimodal-Context]
- γICMI 2020γ Gesticulator Gesticulator: A framework for semantically-aware speech-driven gesture generation [paper] ; [Svito-zar/gesticulator]
- γECCV 2020γ Mix-StAGE Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach [paper]
- γEUROGRAPHICS 2020γ StyleGestures Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows [paper] ; [simonalexanderson/StyleGestures] ; [youtube]
- γEUROGRAPHICS 2020γ StyleGestures Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows [paper] ; [simonalexanderson/StyleGestures] ; [youtube]
- γEMNLP 2020γ AiSLE No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures [paper] ; [pchahuja/aisle]
- The GENEA Challenge 2020: A large, crowdsourced evaluation of gesture generation systems on common data [paper] ; [homepage] ; [youtube] ; [youtube] ; [Svito-zar/genea_numerical_evaluations]
- Gesticulator: A framework for semantically-aware speech-driven gesture generation [paper] ; [youtube] ; [Svito-zar/gesticulator] ; [homepage] ; [dataset]
- Probabilistic Multi-modal Interlocutor-aware Generation of Facial Gestures in Dyadic Settings [paper] ; [youtube] ; [homepage]
- Can we trust online crowdworkers? Comparing online and offline participants in a preference test of virtual agents [paper]
- Affective synthesis and animation of arm gestures from speech prosody [paper]
- FineMotion - Audio and Text-Driven approach for Conversational Gestures Generation [paper] ; [FineMotion/GENEA_2020]
- Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN [paper] ; [wubowen416/co-speech-gesture-generation-using-CGAN]
- γIVA 2020γ Generating coherent spontaneous speech and gesture from text [paper] ; [video] ; [homepage]
- γSIGGRAPH MIG 2019γ Multi-objective adversarial gesture generation [paper]
- γICMI 2019γ DRAM To React or not to React: End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations [paper]
- γCVPR 2019γ Speech2Gesture Learning Individual Styles of Conversational Gesture [paper]
- Analyzing Input and Output Representations for Speech-Driven Gesture Generation [paper] ; [GestureGeneration/Speech_driven_gesture_generation_with_autoencoder] ; [youtube] ; [youtube] ; [homepage]
- On the Importance of Representations for Speech-Driven Gesture Generation [paper]
- A Neural Network Approach to Missing Marker Reconstruction in Human Motion Capture [paper] ; [youtube] ; [youtube] ; [Svito-zar/NN-for-Missing-Marker-Reconstruction]
- Data Driven Non-Verbal Behavior Generation for Humanoid Robots [paper]
- A Neural Network Approach to Missing Marker Reconstruction in Human Motion Capture [paper] ; [Svito-zar/NN-for-Missing-Marker-Reconstruction] ; [youtube]
- Evaluation of Speech-to-Gesture Generation Using Bi-Directional LSTM Network [paper]
- A Speech-Driven Hand Gesture Generation Method and Evaluation in Android Robots [paper] ; [youtube]
- Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots [paper]
- DCNF Predicting Co-verbal Gestures - A Deep and Temporal Modeling Approach [paper]
- 2017 - CDBN Speech-driven animation with meaningful behaviors [paper]
- γSIGGRAPH 2022γ GANimator for generate data GANimator: Neural Motion Synthesis from a Single Sequence [paper] ; PeizhuoLi/ganimator ; [youtube]
- γCVPR 2021γ Body2Hands: Learning To Infer 3D Hands From Conversational Gesture Body Dynamics [paper]
- Rig Inversion by Training a Differentiable Rig Function [paper] ; [youtube]
-
[1994] Rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents [paper]
-
Speech to sequence gesture
-
Probabilistic model of speech to gesture
-
Probabilistic model of personal style
- γACM Transactions on Graphics 2008γGesture modeling and animation based on a probabilistic re-creation of speaker style [paper]
-
Neural classification model of personal style
- γIVA 2015γPredicting Co-verbal Gestures: A Deep and Temporal Modeling Approach [paper]
This section is -- not accurate --> continue edditing
-
MLP (Multilayer perceptron)
- γICMI 2020γ Gesticulator Gesticulator: A framework for semantically-aware speech-driven gesture generation [paper] ; [Svito-zar/gesticulator]
-
RNN (Recurrent Neural Networks)
- γMM 2021γ Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning [paper] ; [UttaranB127/speech2affective_gestures] ; [homepage]
- γIVA 2018γEvaluation of Speech-to-Gesture Generation Using Bi-Directional LSTM Network [paper]
- γCVPR 2022γ HA2G - Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation [paper] ; alvinliu0/HA2G
- γSIGGRAPH Asia 2020γ Trimodal Speech gesture generation from the trimodal context of text, audio, and speaker identity [paper] ; [ai4r/Gesture-Generation-from-Trimodal-Context]
- γICMI 2022γTransGesture: Autoregressive Gesture Generation with RNN-Transducer [paper]
-
CNN (Convolutional Networks)
- γIVA 2021γ Learning Speech-driven 3D Conversational Gestures from Video [paper]
-
Transformers
- γIEEEVR 2021γ Text2gestures: A transformer-based network for generating emotive body gestures for virtual agents [paper] ; [UttaranB127/Text2Gestures] ; homepage
-
Generative models -- not accurate -- continue edditing
-
Normalising Flows
- γEUROGRAPHICS 2020γ StyleGestures Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows [paper] ; [simonalexanderson/StyleGestures] ; [youtube]
-
WGAN
- γICMI 2021γ Probabilistic Human-like Gesture Synthesis from Speech using GRU-based WGAN [paper]
-
VAEs
- γICCV 2021γ Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders ; [paper] ; [JingLi513/Audio2Gestures] ; [homepage]
- Freeform Body Motion Generation from Speech [paper] ; [TheTempAccount/Co-Speech-Motion-Generation] ; [youtube]
- γCVMP 2021γ Flow-VAE Speech-Driven Conversational Agents using Conditional Flow-VAEs [paper]
-
Learnable noise codes
- γICCV 2021γ Speech Drives Templates: Co-Speech Gesture Synthesis With Learned Templates ; [paper] ; [ShenhanQian/SpeechDrivesTemplates] ;
-
CaMN BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis [paper] ; [PantoMatrix/BEAT]
-
-
Diffusion
- γSIGGRAPH 2023γ Listen, denoise, action! Audio-driven motion synthesis with diffusion models [paper] ; (Code repository (coming soon)) ; [youtube] ; [homepage] ; [video]
- γIJCAI 2023γ DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models [paper] ; youngseng/diffusestylegesture ; [youtube]
- γCVPR 2023γ Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation [paper] ; [advocate99/diffgesture]
-
Periodic autoencoders (DeepPhase)
- Rhythmic Gesticulator - Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings [paper] ; [Aubrey-ao/HumanBehaviorAnimation] ; [youtube] ; [youtube]
- γCVPR 2023γQPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation [paper] ; [YoungSeng/QPGesture] ; [video]
-
Text to Gesture
- γCVPR 2022γ Generating Diverse and Natural 3D Human Motions from Text [paper] [homepage] ; [poster] ; [EricGuo5513/text-to-motion]
-
Others (Uncategory)
-
γSIGGRAPH 2022γ A Motion Matching-based Framework for Controllable Gesture Synthesis from Speech [paper] ; [homepage]
-
γCVPR 2022γ SEEG - SEEG: Semantic Energized Co-Speech Gesture Generation [paper] ; [akira-l/seeg]
-
γCVPR 2022γ DiffGAN - Low-Resource Adaptation for Personalized Co-Speech Gesture Generation [paper]
-
γICMI 2022γ ZeroEGGS Exemplar-based stylized gesture generation from speech: An entry to the GENEA Challenge 2022 [paper]
-
γCVPR 2022γ Audio-Driven Neural Gesture Reenactment With Video Motion Graphs [paper]
-
γAAMAS 2022γ Multimodal analysis of the predictability of hand-gesture properties [paper]
-
γICMI 2022γ GestureMaster GestureMaster: Graph-based Speech-driven Gesture Generation [paper]
-
γICCV 2021γ Audio2Gestures Audio2Gestures: Generating Diverse Gestures From Speech Audio With Conditional Variational Autoencoders [paper]
-
γIVA 2021γ Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech [paper] ; [homepage]
-
γECCV 2020γ Mix-StAGE Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach [paper]
-
γCVPR 2019γ Speech2Gesture Learning Individual Styles of Conversational Gesture [paper]
-
- FrΓ©chet Inception Distance (FID) -
- FrΓ©chet Gesture Distance (FGD) -
- FrΓ©chet Template Distance (FTD) -
Full name | Description |
---|---|
Adversarial Loss (Adv) | Used in Generative Adversarial Networks (GANs), this loss function pits a generator network against a discriminator network, with the goal of the generator producing samples that can fool the discriminator into thinking they are real. |
Categorical Cross Entropy (CCE) | A common loss function used in multi-class classification tasks, where the goal is to minimize the difference between the predicted and true class labels. |
Cross-modal Cluster Noise Contrastive Estimation (CC-NCE) | Used in multimodal learning to learn joint representations across different modalities, this loss function maximizes the similarity between matching modalities while minimizing the similarity between non-matching modalities. |
Edge Transition Cost (ETC) | Used in graph-based image segmentation, this loss function measures the similarity between adjacent pixels in an image to preserve the coherence and smoothness of segmented regions. |
Expectation Maximization (EM) | Used for maximum likelihood estimation when dealing with incomplete or missing data, this algorithm involves computing the expected likelihood of the missing data and updating model parameters to maximize the likelihood of the observed data given the expected values. |
Geodesic Distance (GeoD) | Used in deep learning for image segmentation, this loss function penalizes the discrepancy between the predicted segmentation map and the ground truth, while also considering the spatial relationships between different image regions. |
Wasserstein-GAN Gradient Penalty (WGAN-GP) | An extension of the Wasserstein GAN algorithm that adds a gradient penalty term to the loss function, used to enforce the Lipschitz continuity constraint and ensure stability during training. |
Hamming Distance (Hamm) | Used in information theory, this metric measures the number of positions at which two strings differ. |
Huber Loss (Huber) | A robust loss function used in regression tasks that is less sensitive to outliers than the Mean Squared Error (MSE) loss. |
Imitation Reward (IR) | Used in imitation learning to train a model to mimic the behavior of an expert agent, by providing a reward signal based on how closely the model's behavior matches that of the expert. |
KullbackβLeibler Divergence (KL) | Used to measure the difference between two probability distributions, this loss function is commonly used in probabilistic models and deep learning for regularization and training. |
L2 Distance (L2) | Measures the Euclidean distance between two points in space, commonly used in regression tasks. |
Mean Absolute Error (MAE) | A loss function used in regression tasks that measures the average difference between the predicted and true values. |
Maximum Likelihood Estimation (MLE) | A statistical method used to estimate the parameters of a probability distribution that maximize the likelihood of observing the data. |
Mean Squared Error (MSE) | A common loss function used in regression tasks that measures the average squared difference between the predicted and true values. |
Negative Log-likelihood (NLL) | Used in probabilistic models to maximize the likelihood of the observed data by minimizing the negative log-likelihood. |
Structural Similarity Index Measure (SIMM) | Used in image processing to measure the similarity between two images based on their luminance, contrast, and structural content. |
Task Reward (TR) | Used in reinforcement learning to provide a reward signal to an agent based on its performance in completing a given task. |
Variance (Var) | A statistical metric used to measure the variability of a set of data points around their mean. |
Within-cluster Sum of Squares (WCSS) | Used in cluster analysis to measure the variability of data points within a cluster by computing the sum of squared distances between each data point and the mean of the cluster. |
-
Human-likeness : looks like the motion of a real human
-
Appropriateness (specificity) : appropriate for the given speech, controlling for the human-likeness of the motion
-
π§β𦲠: Upper-body tier || π§ : Full-body tier
-
π§ββοΈ : motion || π : text || π : audio || βοΈ : custom by teams
-
Metric (Description) | Body tier | Type | 2020 | 2021 | 2022 | 2023 |
---|---|---|---|---|---|---|
FNA (Full-body Natural Motion ) | π§ | π§ββοΈ | ||||
FBT (Full-body Text-based ) | π§ | π | ||||
FSA (Full-body Custom by Teams ) | π§ | β | ||||
FSB (Full-body Custom by Teams ) | π§ | βοΈ | ||||
FSC (Full-body Custom by Teams ) | π§ | βοΈ | ||||
FSD (Full-body Custom by Teams ) | π§ | βοΈ | ||||
FSF (Full-body Custom by Teams ) | π§ | βοΈ | ||||
FSG (Full-body Custom by Teams ) | π§ | βοΈ | ||||
FSH (Full-body Custom by Teams ) | π§ | βοΈ | ||||
FSI (Full-body Custom by Teams ) | π§ | βοΈ | ||||
UNA (Upper-body Natural Motion ) | π§β𦲠| π§ββοΈ | ||||
UBA (Upper-body Audio-based ) | π§β𦲠| π | ||||
UBT (Upper-body Text-based ) | π§β𦲠| π | ||||
USJ (Upper-body Custom by Teams) | π§β𦲠| βοΈ | ||||
USK (Upper-body Custom by Teams) | π§β𦲠| βοΈ | ||||
USL (Upper-body Custom by Teams) | π§β𦲠| βοΈ | ||||
USM (Upper-body Custom by Teams) | π§β𦲠| βοΈ | ||||
USN (Upper-body Custom by Teams) | π§β𦲠| βοΈ | ||||
USO (Upper-body Custom by Teams) | π§β𦲠| βοΈ | ||||
USP (Upper-body Custom by Teams) | π§β𦲠| βοΈ | ||||
USQ (Upper-body Custom by Teams) | π§β𦲠| βοΈ |
- Canonical correlation analysis
-
Modalities type:
- π : audio || π : text || π€― : emotion || πΆ : gesture motion || βΉοΈ : gesture properties || ποΈ : gesture segment
-
Type
- π₯ : Dialog (Conversation between two people π€Ό) || π€ : Monolog (Self conversation π§)
Dataset | Modalities | Type | Download | Paper |
---|---|---|---|---|
IEMOCAP | πΆ, π, π, π€― | π₯ | sail.usc.edu/iemocap | [paper] |
Creative-IT | πΆ, π, π, π€― | π₯ | sail.usc.edu/CreativeIT | |
Gesture-Speech Dataset | πΆ, π | π€ | dropbox | |
CMU Panoptic | πΆ, π, π | π₯ | domedb.perception.cmu | [paper] |
Speech-Gesture | πΆ, π | π€ | amirbar/speech2gesture | [paper] |
TED Dataset [homepage] | πΆ, π | π€ | youtube-gesture-dataset | |
Talking With Hands ([github]) | πΆ, π | π₯ | facebookresearch/TalkingWithHands32M | [paper] |
PATS ([homepage], [github]) | πΆ, π, π | π€ | chahuja.com/pats | [paper] |
Trinity Speech-Gesture I | πΆ, π, π | π€ | Trinity Speech-Gesture I | |
Trinity Speech-Gesture II | πΆ, π, ποΈ | π€ | Trinity Speech GestureII | |
Speech-Gesture 3D extension | πΆ, π | π€ | nextcloud.mpi-klsb | |
Talking With Hands GENEA Extension | πΆ, π, π | π₯ | zenodo/6998231 | [paper] |
SaGA | πΆ, π, βΉοΈ | π₯ | phonetik.uni-muenchen | [paper] |
SaGA++ | πΆ, π, βΉοΈ | π₯ | zenodo/6546229 | |
ZEGGS Dataset [youtube] | πΆ, π | π€ | ubisoft-laforge-ZeroEGGS | [paper] |
BEAT Dataset ([homepage] [homepage], [github]) | πΆ, π, π, βΉοΈ, π€― | π₯, π€ | github.io/BEAT | [paper] |
InterAct homepage | πΆ, π, π | π₯ | hku-cg.github.io | [paper] |
- Challenge dataset: GENEA Challenge 2022 Dataset Files
- 3D coordinates of submitted motion: GENEA Challenge 2022 3D coordinates of submitted motion
- Submitted BVH files: GENEA Challenge 2022 submitted BVH files
- User-study video stimuli: GENEA Challenge 2022 user-study video stimuli
-
Algorithms
- SGToolkit: An Interactive Gesture Authoring Toolkit for Embodied Conversational Agents [paper] ; [homepage] ; [youtube]
-
Recognition:
- OpenPose - CMU-Perceptual-Computing-Lab/openpose
- MMPose - open-mmlab/mmpose
- AlphaPose - MVIG-SJTU/AlphaPose
-
Audio pre-processing:
-
Mesh processing:
- Utility to trim BVH files: github.com/ghenter/trim_bvh
-
Visualization:
- Objective evaluation code: github.com/genea-workshop/genea_numerical_evaluations
- Text-based baseline: github.com/youngwoo-yoon/Co-Speech_Gesture_Generation
- Audio-based baseline: github.com/genea-workshop/Speech_driven_gesture_generation_with_autoencoder
- Interface for subjective evaluations: jonepatr/hemvip
- Code for creating attention-check videos: youngwoo-yoon/create_attention_check
- Utility to trim BVH files: github.com/ghenter/trim_bvh
- Modified PyMO for the challenge dataset: youngwoo-yoon/PyMO
- 2000 - Paired Speech and Gesture Generation in Embodied Conversational Agents
- 2005 - Gesture Generation by Imitation: From Human Behavior to Computer Character Animation
- 2009 - Gesture in Embodied Communication and Human Computer Interaction
- 2013 - Nonverbal Communication in Human Interaction
- 2013 - Nonverbal Communication: Science and Applications
-
TEDTalk (Extract skeleton from video Dataset)
- AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis [paper] ; [hvoss-techfak/AQGT]
- Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings [paper] ; [aubrey-ao/humanbehavioranimation] ; [youtube]
- Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation [paper] alvinliu0/HA2G ; [youtube] ; [homepage]
- Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning [paper] [UttaranB127/speech2affective_gestures] ; [homepage] ; [youtube]
- Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity [paper] ; [ai4r/Gesture-Generation-from-Trimodal-Context]
-
BEAT (Motion Capture Dataset)
- BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis [paper] ; [PantoMatrix/BEAT]
- Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity [paper] ; [ai4r/Gesture-Generation-from-Trimodal-Context]
- Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders [paper]
- Learning Individual Styles of Conversational Gesture [paper]
- Robots Learning to Say `No': Prohibition and Rejective Mechanisms in Acquisition of Linguistic Negation [paper]
Your contributions are always welcome! Please take a look at the contribution guidelines first.
This project is licensed under the MIT License - see the LICENSE.md file for details.
Created by OpenHuman
OpenHuman.ai - Open Store for Realistic Digital Human