Skip to content

openhuman-ai/awesome-gesture_generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Awesome Gesture Generation Awesome

Continuing editing (Not finished yet)

The goal of project is focus on Audio-driven Gesture Generation with output is 3D keypoints gesture.
Input: Audio, Text, Gesture, ..etc. -> Output: Gesture Motion

Gesture Generation is the process of generating gestures from speech or text. The goal of Gesture Generation is to generate gestures that are natural, realistic, and appropriate for the given context. The generated gestures can be used to animate virtual characters, robots, or embodied conversational agents.

ACM CCS: β€’ Human-centered computing β†’ Human computer interaction (HCI).

Paper by Folder : πŸ“/survey || πŸ“/approach || πŸ“/papers || πŸ“/dataset || πŸ“/books

Table of Contents


Main resource


Comprehensive preview

  • 【EUROGRAPHICS 2023】A Comprehensive Review of Data-Driven Co-Speech Gesture Generation; [paper]

  • 2014 - Gesture and speech in interaction: An overview ; [paper]

Survey review

  • 【HAI 2021】Speech-based Gesture Generation for Robots and Embodied Agents: A Scoping Review [paper]

Evaluation survey

  • 【IEEE 2022】 A Review of Evaluation Practices of Gesture Generation in Embodied Conversational Agents [paper] ;

GENEA Challenge


  • The Relation of Speech and Gestures: Temporal Synchrony Follows Semantic Synchrony [paper]
  • Complexity Matters E05: Complexity Matching and Synchronization between Gestures and Speech [paper]
  • Easier Said Than Done? Task Difficulty's Influence on Temporal Alignment, Semantic Similarity, and Complexity Matching Between Gestures and Speech [paper]
  • Advances in Visual Semiotics - The Semiotic Web [book]
  • Gestural beats: The rhythm hypothesis [paper]

GENEA 2024

GENEA Challenge 2024 [Homepage]
Method (Team*) Paper Video πŸ†

GENEA 2023

GENEA Challenge 2023 [Homepage]
Method (Team*) Paper Video πŸ†
FineMotion 【ICMI 2023】The FineMotion entry to the GENEA Challenge 2023: DeepPhase for conversational gestures generation [paper] [youtube]
Gesture Motion Graphs 【ICMI 2023】Gesture Motion Graphs for Few-Shot Speech-Driven Gesture Reenactment [paper] [youtube]
Diffusion-based 【ICMI 2023】Diffusion-based co-speech gesture generation using joint text and audio representation [paper] [youtube]
UEA Digital Humans 【ICMI 2023】The UEA Digital Humans entry to the GENEA Challenge 2023 [paper] ; [JonathanPWindle/UEA-DH-GENEA23] [youtube]
FEIN-Z 【ICMI 2023】FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation [paper] [youtube]
DiffuseStyleGesture+ 【ICMI 2023】The DiffuseStyleGesture+ entry to the GENEA Challenge 2023 [paper] [youtube] πŸ†
Discrete Diffusion 【ICMI 2023】Discrete Diffusion for Co-Speech Gesture Synthesis [paper] [youtube]
KCL-SAIR 【ICMI 2023】The KCL-SAIR team's entry to the GENEA Challenge 2023 Exploring Role-based Gesture Generation in Dyadic Interactions: Listener vs. Speaker [paper] [youtube]
Gesture Generation 【ICMI 2023】Gesture Generation with Diffusion Models Aided by Speech Activity Information [paper] [youtube]
Co-Speech Gesture Generation 【ICMI 2023】Co-Speech Gesture Generation via Audio and Text Feature Engineering [paper] [youtube]
DiffuGesture 【ICMI 2023】DiffuGesture: Generating Human Gesture From Two-person Dialogue With Diffusion Models [paper] [youtube]
KU-ISPL 【ICMI 2023】The KU-ISPL entry to the GENEA Challenge 2023-A Diffusion Model for Co-speech Gesture generation [paper] [youtube]
GENEA Workshop 2023 - ICMI 2023 Accepted papers [Homepage]
Papers Video πŸ†
【ICMI 2023】 MultiFacet A Multi-Tasking Framework for Speech-to-Sign Language Generation [paper]
【ICMI 2023】 Look What I Made It Do - The ModelIT Method for Manually Modeling Nonverbal Behavior of Socially Interactive Agents [paper]
【ICMI 2023】 A Methodology for Evaluating Multimodal Referring Expression Generation for Embodied Virtual Agents [paper]
【ICMI 2023】 Towards the generation of synchronized and believable non-verbal facial behaviors of a talking virtual agent [paper]; [aldelb/non_verbal_facial_animation] πŸ†

GENEA 2022

GENEA Challenge 2022 - Accepted papers [Homepage]
Team (Method) Paper Video πŸ†
DeepMotion 【ICMI 2022】The DeepMotion entry to the GENEA Challenge 2022 [paper] [youtube]
DSI 【ICMI 2022】Hybrid Seq2Seq Architecture for 3D Co-Speech Gesture Generation [paper] [youtube]
FineMotion 【ICMI 2022】ReCell: replicating recurrent cell for auto-regressive pose generation [paper] [FineMotion/GENEA_2022] [youtube]
Forgerons 【ICMI 2022】Ubisoft Exemplar-based Stylized Gesture Generation from Speech: An Entry to the GENEA Challenge 2022 [paper] [youtube]
GestureMaster 【ICMI 2022】GestureMaster: Graph-based Speech-driven Gesture Generation [paper] [youtube]
IVI Lab 【ICMI 2022】The IVI Lab entry to the GENEA Challenge 2022 – A Tacotron2 Based Method for Co-Speech Gesture Generation With Locality-Constraint Attention Mechanism [paper] [Tacotron2-SpeechGesture] [youtube] πŸ†
ReprGesture 【ICMI 2022】The ReprGesture entry to the GENEA Challenge 2022 [paper] [YoungSeng/ReprGesture] [youtube]
TransGesture 【ICMI 2022】TransGesture: Autoregressive Gesture Generation with RNN-Transducer [paper] [youtube]
UEA Digital Humans 【ICMI 2022】UEA Digital Humans entry to the GENEA Challenge 2022 [paper] [UEA/GENEA22] [youtube]
GENEA Workshop 2022 - ICMI 2022 Accepted papers [Homepage]
Papers Video πŸ†
【ICMI 2022】 Understanding Interviewees’ Perceptions and Behaviour towards Verbally and Non-verbally Expressive Virtual Interviewing Agents [paper] [youtube]
【ICMI 2022】 Emotional Respiration Speech Dataset [paper] [youtube]
【ICMI 2022】 Automatic facial expressions, gaze direction and head movements generation of a virtual agent [paper] [youtube] πŸ†
【ICMI 2022】 Can you tell that I'm confused? An overhearer study for German backchannels by an embodied agent [paper] [youtube]

GENEA 2021

GENEA Challenge 2021 - ICMI 2021 Accepted papers [Homepage]
Papers Video πŸ†
【ICMI 2021】 Probabilistic Human-like Gesture Synthesis from Speech using GRU-based WGAN [paper] [wubowen416/gesture-generation-using-WGAN] [youtube] πŸ†
【ICMI 2021】 Influence of Movement Energy and Affect Priming on the Perception of Virtual Characters Extroversion and Mood [paper] ❌
【ICMI 2021】 Crossmodal clustered contrastive learning: Grounding of spoken language to gesture [paper] [dondongwon/CC_NCE_GENEA] [youtube]

GENEA 2020

GENEA Challenge 2020 - Accepted papers [Homepage]
Papers Video
【IVA 2020】 The StyleGestures entry to the GENEA Challenge 2020 [paper] ; [[simonalexanderson/StyleGestures]] [youtube]
【IVA 2020】 The FineMotion entry to the GENEA Challenge 2020 [paper] ; [FineMotion/GENEA_2020] [youtube]
【IVA 2020】 Double-DCCCAE: Estimation of Sequential Body Motion Using Wave-Form - AlltheSmooth [paper] [youtube]
【IVA 2020】 CGVU: Semantics-guided 3D Body Gesture Synthesis [paper] [youtube]
【IVA 2020】 Interpreting and Generating Gestures with Embodied Human Computer Interactions [paper] [youtube]
【IVA 2020】 The Nectec Gesture Generation System entry to the GENEA Challenge 2020 [paper] [youtube]

  • 【CVPR 2024】 DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures [paper]
  • 【CVPR 2024】 EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling [paper]
  • 【CVPR 2024】 Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion [paper]
  • 【CVPR 2024】 Using Language-Aligned Gesture Embeddings for Understanding Gestures Accompanying Math Terms [paper]
  • 【SIGGRAPH 2024】Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis [paper] ; [video] ; [LuMen-ze/Semantic-Gesticulator-Official]
  • SynTalker - Enabling Synergistic Full-Body Control in Prompt-Based Co-Speech Motion Generation [paper] ; [homepage] ; [video] ; [RobinWitch/SynTalker]
  • MDT-A2G- Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation [paper] ; [homepage]
  • 【ACM MM 2024】 MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion [paper] ; [homepage]
  • 【CVPR 2023】 Co-speech Gesture Synthesis by Reinforcement Learning with Contrastive Pre-trained Rewards [paper] ; [RLracer/RACER]
  • 【PAKDD 2023】 RLMixer: A Reinforcement Learning Approach For Integrated Ranking With Contrastive User Preference Modeling [paper]
  • 【IJCAI 2023】 DiffuseStyleGesture - Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models [paper] ; [YoungSeng/DiffuseStyleGesture] ; [youtube]
  • 【CVPR 2023】 Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation [paper] ; [Advocate99/DiffGesture]
  • 【CVPR 2023】 Diverse 3D Hand Gesture Prediction from Body Dynamics by Bilateral Hand Disentanglement [paper] ; [XingqunQi-lab/Diverse-3D-Hand-Gesture-Prediction]
  • 【CVPR 2023】 QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation [paper] ; [YoungSeng/QPGesture] ; [video]
  • 【CVPR 2023】 GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents [paper] ; [homepage] ; [video]
  • 【CVPR 2023】 Continual Learning for Personalized Co-Speech Gesture Generation [paper]; [homepage]
  • 【CVPR 2023】 Guided Motion Diffusion for Controllable Human Motion Synthesis [paper] ; [homepage]
  • 【CVPR 2023】 Sequential Texts Driven Cohesive Motions Synthesis with Natural Transitions [paper]
  • 【CVPR 2023】 EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation [paper]
  • 【CVPR 2023】 FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation [paper]
  • 【CVPR 2023】 Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models [paper]
  • 【CVPR 2023】 Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation [paper]
  • 【CVPR 2023】 Semi-supervised Speech-driven 3D Facial Animation via Cross-modal Encoding [paper]
  • 【ACM MM 2023】UnifiedGesture - A Unified Gesture Synthesis Model for Multiple Skeletons [paper] ; [YoungSeng/UnifiedGesture]
  • 【ICMI 2023】 AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis [paper] ; [hvoss-techfak/AQGT]
  • DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model [paper]
  • BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer [paper]
  • EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation [paper] ;
  • Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis [paper] ; [homepage] ; [video]
  • EMoG: Synthesizing Emotive Co-speech 3D Gesture with Diffusion Model [paper]
  • MRecGen: Multimodal Appropriate Reaction Generator [paper] ; [SSYSteve/MRecGen]
  • Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model [paper]
  • The KCL-SAIR team’s entry to the GENEA Challenge 2023 Exploring Role-based Gesture Generation in Dyadic Interactions: Listener vs. Speaker [paper]
  • The KU-ISPL entry to the GENEA Challenge 2023-A Diffusion Model for Co-speech Gesture generation [paper]
  • Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation [paper]
  • Co-Speech Gesture Generation via Audio and Text Feature Engineering [paper]
  • Gesture Motion Graphs for Few-Shot Speech-Driven Gesture Reenactment [paper]
  • Gesture Generation with Diffusion Models Aided by Speech Activity Information [paper]
  • FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation [paper]
  • Discrete Diffusion for Co-Speech Gesture Synthesis [paper]
  • DiffuGesture: Generating Human Gesture From Two-person Dialogue With Diffusion Models [paper]
  • The FineMotion entry to the GENEA Challenge 2023: DeepPhase for conversational gestures generation [paper]
  • Am I listening - Evaluating theQuality of Generated Data-driven Listening Motion [paper]
  • Unified speech and gesture synthesis using flow matching [paper] ; [homepage] ;

  • 【SIGGRAPH 2022】 A Motion Matching-based Framework for Controllable Gesture Synthesis from Speech [paper] ; [homepage]
  • 【ICMI 2022】The DeepMotion entry to the GENEA Challenge 2022 [paper]
  • 【ICMI 2022】The ReprGesture entry to the GENEA Challenge 2022 [paper] ; [YoungSeng/ReprGesture] ; [youtube]
  • 【CVPR 2022】 HA2G - Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation [paper] ; [alvinliu0/HA2G]
  • 【CVPR 2022】 SEEG - SEEG: Semantic Energized Co-Speech Gesture Generation [paper] ; [akira-l/seeg]
  • 【CVPR 2022】 DiffGAN - Low-Resource Adaptation for Personalized Co-Speech Gesture Generation [paper]
  • 【CVPR 2022】 Audio-Driven Neural Gesture Reenactment With Video Motion Graphs [paper]
  • 【ICMI 2022】 ZeroEGGS Exemplar-based stylized gesture generation from speech: An entry to the GENEA Challenge 2022 [paper]
  • 【AAMAS 2022】 Multimodal analysis of the predictability of hand-gesture properties [paper]
  • 【ICMI 2022】 GestureMaster GestureMaster: Graph-based Speech-driven Gesture Generation [paper]
  • 【ICRA 2022】Context-Aware Body Gesture Generation for Social Robots [paper]
  • 【IROS 2022】Gesture2Vec: Clustering Gestures using Representation Learning Methods for Co-speech Gesture Generation [paper] [pjyazdian/Gesture2Vec] ; [youtube] ; [youtube]
  • Evaluating Data-Driven Co-Speech Gestures of Embodied Conversational Agents through Real-Time Interaction [paper]
  • ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech [paper] ; [ubisoft/ubisoft-laforge-ZeroEGGS] ; [youtube]
  • Voice2Face: Audio-Driven Facial and Tongue Rig Animations [paper] ; [youtube] ; [web]
  • Deep Gesture Generation for Social Robots Using Type-Specific Libraries [paper] ; [youtube] ; [web]
  • Automatic text‐to‐gesture rule generation for embodied conversational agents [paper] [youtube]
  • Evaluating Data-Driven Co-Speech Gestures of Embodied Conversational Agents through Real-Time Interaction [paper] ; [web]
  • Towards Context-Aware Human-like Pointing Gestures with RL Motion Imitation [paper]
  • Text/Speech-Driven Full-Body Animation [paper]
  • Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding [paper]

  • 【ICCV 2021】 Speech Drives Templates: Co-Speech Gesture Synthesis With Learned Templates [paper] ; shenhanqian/speechdrivestemplates ; [youtube] ; poster
  • 【ICCV 2021】 Audio2Gestures Audio2Gestures: Generating Diverse Gestures From Speech Audio With Conditional Variational Autoencoders [paper]
  • 【ICMI 2021】 Probabilistic Human-like Gesture Synthesis from Speech using GRU-based WGAN [paper]
  • 【ICMI 2021】 Crossmodal Clustered Contrastive Learning: Grounding of Spoken Language to Gesture [paper] ; [dondongwon/CC_NCE_GENEA]
  • 【IVA 2021】 Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech [paper] ; [homepage]
  • 【IVA 2021】 Learning Speech-driven 3D Conversational Gestures from Video [paper]
  • 【CASA 2021】 ExpressGesture ExpressGesture: Expressive gesture generation from speech through database matching [paper] ; [youtube]
  • 【AAMAS 2021】 CMCF CCFM: An Architecture for Realtime Gesture Generation by Clustering Gestures by Communicative Function and Motion [paper]
  • 【IEEEVR 2021】 Text2gestures: A transformer-based network for generating emotive body gestures for virtual agents [paper] ; [UttaranB127/Text2Gestures] ; homepage
  • Evaluating Data-Driven Co-Speech Gestures of Embodied Conversational Agents [paper]
  • Multimodal analysis of the predictability of hand-gesture properties [paper]
  • Deep Gesture Generation for Social Robots Using Type-Specific Libraries [paper]
  • A Framework for Integrating Gesture Generation Models into Interactive Conversational Agents [paper] ; [youtube] ; [homepage] ; [nagyrajmund/gesturebot]
  • Moving Fast and Slow: Analysis of Representations and Post-Processing in Speech-Driven Automatic Gesture Generation [paper]
  • ExpressGesture: Expressive gesture generation from speech through database matching [paper]
  • Passing a Non-verbal Turing Test: Evaluating Gesture Animations Generated from Speech [paper]




  • DCNF Predicting Co-verbal Gestures - A Deep and Temporal Modeling Approach [paper]
  • 2017 - CDBN Speech-driven animation with meaningful behaviors [paper]

Others

  • 【SIGGRAPH 2022】 GANimator for generate data GANimator: Neural Motion Synthesis from a Single Sequence [paper] ; PeizhuoLi/ganimator ; [youtube]
  • 【CVPR 2021】 Body2Hands: Learning To Infer 3D Hands From Conversational Gesture Body Dynamics [paper]
  • Rig Inversion by Training a Differentiable Rig Function [paper] ; [youtube]

  • [1994] Rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents [paper]

  • Speech to sequence gesture

    • 【SIGGRAPH 2001】 BEAT: the Behavior Expression Animation Toolkit [paper]
    • 【HRI 2012】 Robot Behavior Toolkit: Generating Effective Social Behaviors for Robots [paper]
    • Gesture Generation by Imitation: From Human Behavior to Computer Character Animation [books]

  • Probabilistic model of speech to gesture

    • 【IVA 2006】 Towards a Common Framework for Multimodal Generation: The Behavior Markup Language [paper]
    • 【SIGGRAPH 2010】 Gesture Controllers [paper]
  • Probabilistic model of personal style

    • 【ACM Transactions on Graphics 2008】Gesture modeling and animation based on a probabilistic re-creation of speaker style [paper]
  • Neural classification model of personal style

    • 【IVA 2015】Predicting Co-verbal Gestures: A Deep and Temporal Modeling Approach [paper]

This section is -- not accurate --> continue edditing

  • Others (Uncategory)

    • 【SIGGRAPH 2022】 A Motion Matching-based Framework for Controllable Gesture Synthesis from Speech [paper] ; [homepage]

    • 【CVPR 2022】 SEEG - SEEG: Semantic Energized Co-Speech Gesture Generation [paper] ; [akira-l/seeg]

    • 【CVPR 2022】 DiffGAN - Low-Resource Adaptation for Personalized Co-Speech Gesture Generation [paper]

    • 【ICMI 2022】 ZeroEGGS Exemplar-based stylized gesture generation from speech: An entry to the GENEA Challenge 2022 [paper]

    • 【CVPR 2022】 Audio-Driven Neural Gesture Reenactment With Video Motion Graphs [paper]

    • 【AAMAS 2022】 Multimodal analysis of the predictability of hand-gesture properties [paper]

    • 【ICMI 2022】 GestureMaster GestureMaster: Graph-based Speech-driven Gesture Generation [paper]

    • 【ICCV 2021】 Audio2Gestures Audio2Gestures: Generating Diverse Gestures From Speech Audio With Conditional Variational Autoencoders [paper]

    • 【IVA 2021】 Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech [paper] ; [homepage]

    • 【ECCV 2020】 Mix-StAGE Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach [paper]

    • 【CVPR 2019】 Speech2Gesture Learning Individual Styles of Conversational Gesture [paper]



Full name Description
Adversarial Loss (Adv) Used in Generative Adversarial Networks (GANs), this loss function pits a generator network against a discriminator network, with the goal of the generator producing samples that can fool the discriminator into thinking they are real.
Categorical Cross Entropy (CCE) A common loss function used in multi-class classification tasks, where the goal is to minimize the difference between the predicted and true class labels.
Cross-modal Cluster Noise Contrastive Estimation (CC-NCE) Used in multimodal learning to learn joint representations across different modalities, this loss function maximizes the similarity between matching modalities while minimizing the similarity between non-matching modalities.
Edge Transition Cost (ETC) Used in graph-based image segmentation, this loss function measures the similarity between adjacent pixels in an image to preserve the coherence and smoothness of segmented regions.
Expectation Maximization (EM) Used for maximum likelihood estimation when dealing with incomplete or missing data, this algorithm involves computing the expected likelihood of the missing data and updating model parameters to maximize the likelihood of the observed data given the expected values.
Geodesic Distance (GeoD) Used in deep learning for image segmentation, this loss function penalizes the discrepancy between the predicted segmentation map and the ground truth, while also considering the spatial relationships between different image regions.
Wasserstein-GAN Gradient Penalty (WGAN-GP) An extension of the Wasserstein GAN algorithm that adds a gradient penalty term to the loss function, used to enforce the Lipschitz continuity constraint and ensure stability during training.
Hamming Distance (Hamm) Used in information theory, this metric measures the number of positions at which two strings differ.
Huber Loss (Huber) A robust loss function used in regression tasks that is less sensitive to outliers than the Mean Squared Error (MSE) loss.
Imitation Reward (IR) Used in imitation learning to train a model to mimic the behavior of an expert agent, by providing a reward signal based on how closely the model's behavior matches that of the expert.
Kullback–Leibler Divergence (KL) Used to measure the difference between two probability distributions, this loss function is commonly used in probabilistic models and deep learning for regularization and training.
L2 Distance (L2) Measures the Euclidean distance between two points in space, commonly used in regression tasks.
Mean Absolute Error (MAE) A loss function used in regression tasks that measures the average difference between the predicted and true values.
Maximum Likelihood Estimation (MLE) A statistical method used to estimate the parameters of a probability distribution that maximize the likelihood of observing the data.
Mean Squared Error (MSE) A common loss function used in regression tasks that measures the average squared difference between the predicted and true values.
Negative Log-likelihood (NLL) Used in probabilistic models to maximize the likelihood of the observed data by minimizing the negative log-likelihood.
Structural Similarity Index Measure (SIMM) Used in image processing to measure the similarity between two images based on their luminance, contrast, and structural content.
Task Reward (TR) Used in reinforcement learning to provide a reward signal to an agent based on its performance in completing a given task.
Variance (Var) A statistical metric used to measure the variability of a set of data points around their mean.
Within-cluster Sum of Squares (WCSS) Used in cluster analysis to measure the variability of data points within a cluster by computing the sum of squared distances between each data point and the mean of the cluster.

Evaluation aspects

  • Human-likeness : looks like the motion of a real human

  • Appropriateness (specificity) : appropriate for the given speech, controlling for the human-likeness of the motion

    • πŸ§‘β€πŸ¦² : Upper-body tier || 🧍 : Full-body tier

    • πŸ§β€β™‚οΈ : motion || πŸ“ƒ : text || πŸ”Š : audio || βš™οΈ : custom by teams

Metric (Description) Body tier Type 2020 2021 2022 2023
FNA (Full-body Natural Motion ) 🧍 πŸ§β€β™‚οΈ
FBT (Full-body Text-based ) 🧍 πŸ“ƒ
FSA (Full-body Custom by Teams ) 🧍 βš™
FSB (Full-body Custom by Teams ) 🧍 βš™οΈ
FSC (Full-body Custom by Teams ) 🧍 βš™οΈ
FSD (Full-body Custom by Teams ) 🧍 βš™οΈ
FSF (Full-body Custom by Teams ) 🧍 βš™οΈ
FSG (Full-body Custom by Teams ) 🧍 βš™οΈ
FSH (Full-body Custom by Teams ) 🧍 βš™οΈ
FSI (Full-body Custom by Teams ) 🧍 βš™οΈ
UNA (Upper-body Natural Motion ) πŸ§‘β€πŸ¦² πŸ§β€β™‚οΈ
UBA (Upper-body Audio-based ) πŸ§‘β€πŸ¦² πŸ”Š
UBT (Upper-body Text-based ) πŸ§‘β€πŸ¦² πŸ“ƒ
USJ (Upper-body Custom by Teams) πŸ§‘β€πŸ¦² βš™οΈ
USK (Upper-body Custom by Teams) πŸ§‘β€πŸ¦² βš™οΈ
USL (Upper-body Custom by Teams) πŸ§‘β€πŸ¦² βš™οΈ
USM (Upper-body Custom by Teams) πŸ§‘β€πŸ¦² βš™οΈ
USN (Upper-body Custom by Teams) πŸ§‘β€πŸ¦² βš™οΈ
USO (Upper-body Custom by Teams) πŸ§‘β€πŸ¦² βš™οΈ
USP (Upper-body Custom by Teams) πŸ§‘β€πŸ¦² βš™οΈ
USQ (Upper-body Custom by Teams) πŸ§‘β€πŸ¦² βš™οΈ

Objective metrics

3.1 Average acceleration and jerk

3.2 Comparing speed histograms

3.3 Canonical correlation analysis

3.4 FrΓ©chet gesture distance

3.5 System ranking comparison

  • Canonical correlation analysis

  • Modalities type:

    • πŸ”Š : audio || πŸ“ƒ : text || 🀯 : emotion || 🚢 : gesture motion || ℹ️ : gesture properties || 🎞️ : gesture segment
  • Type

    • πŸ‘₯ : Dialog (Conversation between two people 🀼) || πŸ‘€ : Monolog (Self conversation 🧍)
Dataset Modalities Type Download Paper
IEMOCAP 🚢, πŸ”Š, πŸ“ƒ, 🀯 πŸ‘₯ sail.usc.edu/iemocap [paper]
Creative-IT 🚢, πŸ”Š, πŸ“ƒ, 🀯 πŸ‘₯ sail.usc.edu/CreativeIT
Gesture-Speech Dataset 🚢, πŸ”Š πŸ‘€ dropbox
CMU Panoptic 🚢, πŸ”Š, πŸ“ƒ πŸ‘₯ domedb.perception.cmu [paper]
Speech-Gesture 🚢, πŸ”Š πŸ‘€ amirbar/speech2gesture [paper]
TED Dataset [homepage] 🚢, πŸ”Š πŸ‘€ youtube-gesture-dataset
Talking With Hands ([github]) 🚢, πŸ”Š πŸ‘₯ facebookresearch/TalkingWithHands32M [paper]
PATS ([homepage], [github]) 🚢, πŸ”Š, πŸ“ƒ πŸ‘€ chahuja.com/pats [paper]
Trinity Speech-Gesture I 🚢, πŸ”Š, πŸ“ƒ πŸ‘€ Trinity Speech-Gesture I
Trinity Speech-Gesture II 🚢, πŸ”Š, 🎞️ πŸ‘€ Trinity Speech GestureII
Speech-Gesture 3D extension 🚢, πŸ”Š πŸ‘€ nextcloud.mpi-klsb
Talking With Hands GENEA Extension 🚢, πŸ”Š, πŸ“ƒ πŸ‘₯ zenodo/6998231 [paper]
SaGA 🚢, πŸ”Š, ℹ️ πŸ‘₯ phonetik.uni-muenchen [paper]
SaGA++ 🚢, πŸ”Š, ℹ️ πŸ‘₯ zenodo/6546229
ZEGGS Dataset [youtube] 🚢, πŸ”Š πŸ‘€ ubisoft-laforge-ZeroEGGS [paper]
BEAT Dataset ([homepage] [homepage], [github]) 🚢, πŸ”Š, πŸ“ƒ, ℹ️, 🀯 πŸ‘₯, πŸ‘€ github.io/BEAT [paper]
InterAct homepage 🚢, πŸ”Š, πŸ“ƒ πŸ‘₯ hku-cg.github.io [paper]

2022 GENEA Challenge


5. Toolkit

GENEA

GENEA 2023 Playlist

GENEA 2022 Playlist

GENEA 2021 Playlist

GENEA 2020 Playlist

SIGGRAPH

ACM SIGGRAPH MIG 2019 Playlist


PapersWithCode Ranking

Contributing GitHub

Your contributions are always welcome! Please take a look at the contribution guidelines first.

License GitHub

This project is licensed under the MIT License - see the LICENSE.md file for details.

Created by OpenHuman

OpenHuman.ai - Open Store for Realistic Digital Human