video-captioning

Here are 89 public repositories matching this topic...

YehLi / xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

image-captioning video-captioning visual-question-answering vision-and-language cross-modal-retrieval pretraining tden

Updated Feb 27, 2023
Python

xiadingZ / video-caption.pytorch

Star

pytorch implementation of video captioning

deep-learning pytorch video-captioning

Updated Aug 19, 2019
Python

scopeInfinity / Video2Description

Star

Video to Text: Natural language description generator for some given video. [Video Captioning]

deep-neural-networks video-processing image-captioning cnn-keras audio-processing lstm-neural-networks video-captioning video-to-text

Updated May 3, 2022
Python

tomchang25 / whisper-auto-transcribe

Star

Auto transcribe tool based on whisper

text-to-speech deep-learning pytorch speech-recognition speech-to-text language-model gradio speech-processing asr video-captioning voice-activity-detection gradio-interface

Updated Apr 27, 2023
Python

antoyang / VidChapters

Star

[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale

video-understanding weakly-supervised-learning video-captioning multimodal-learning vision-and-language dense-video-captioning pre-training temporal-language-grounding video-chapter-generation vid2seq

Updated Nov 13, 2023
Jupyter Notebook

jayleicn / recurrent-transformer

Star

[ACL 2020] PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

pytorch video-captioning youcook2 activitynet-captions

Updated Dec 4, 2020
Jupyter Notebook

vijayvee / video-captioning

Star

This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.

tensorflow seq2seq sequence-to-sequence video-captioning s2vt multimodal-deep-learning

Updated Oct 12, 2019
Python

JasonYao81000 / MLDS2018SPRING

Star

Machine Learning and having it Deep and Structured (MLDS) in 2018 spring

Updated Apr 19, 2019
Python

jpthu17 / EMCL

Star

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

video-captioning neurips video-retrieval video-question-answering cross-modal-retrieval

Updated Apr 9, 2024
Python

jssprz / video_captioning_datasets

Star

Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*

review video-captioning state-of-the-art vision-and-language charades video-to-text msvd video-dataset video-description activitynet-captions trecvid tgif-dataset msr-vtt vatex

Updated Oct 27, 2023
Jupyter Notebook

bytedance / Shot2Story

Star

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

benchmark research video-summarization dataset video-captioning video-story vision-language video-question-answering video-language large-language-models video-language-pretraining video-story-generation

Updated Jan 30, 2025
Python

terry-r123 / Awesome-Captioning

Star

A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)

image-captioning video-captioning text-captioning

Updated Jun 6, 2022

jayleicn / TVCaption

Star

[ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset

pytorch dataset video-captioning

Updated Sep 6, 2023
Python

Kamino666 / Video-Captioning-Transformer

Star

这是一个基于Pytorch平台、Transformer框架实现的视频描述生成 (Video Captioning) 深度学习模型。视频描述生成任务指的是：输入一个视频，输出一句描述整个视频内容的文字（前提是视频较短且可以用一句话来描述）。本repo主要目的是帮助视力障碍者欣赏网络视频、感知周围环境，促进“无障碍视频”的发展。

pytorch transformer video-captioning

Updated Mar 12, 2022
Python

nasib-ullah / video-captioning-models-in-Pytorch

Star

A PyTorch implementation of state of the art video captioning models from 2015-2019 on MSVD and MSRVTT datasets.

video deep-learning pytorch sequence-to-sequence video-captioning s2vt msvd pytorch-implementation msrvtt marn video-captioning-models recnet

Updated Jul 30, 2023
Python

UARK-AICV / VLTinT

Star

[AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning

pytorch video-captioning transformer-architecture vision-language video-paragraph-captioning aaai2023

Updated Feb 16, 2024
Jupyter Notebook

ParitoshParmar / MTL-AQA

Star

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]

pytorch video-processing lstm representation-learning action-recognition video-understanding c3d video-captioning captioning fine-grained-classification multitask-learning dilated-convolution action-quality-assessment mtl-aqa fine-grained-action-recognition dilated-c3d

Updated Nov 10, 2024
Python

amazon-science / crossmodal-contrastive-learning

Star

CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021

natural-language-processing video computer-vision transformers video-captioning multi-modality contrastive-learning video-text-retrieval

Updated Feb 7, 2022
Python

jacobswan1 / Video2Commonsense

Star

Video captioning baseline models on Video2Commonsense Dataset.

video-captioning commonsense-story commonsense-question-answering video2commonsense

Updated Apr 15, 2021
Python

xid32 / NAACL_2025_TWM

Star

We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into existing MFMs. With our TWM, nine state-of-the-art models exhibit significant performance improvements across QA, captioning, and retrieval tasks.

question-answering video-captioning working-memory audio-visual-learning video-text-retrieval multimodal-large-language-models multimodal-foundation-model

Updated Jan 26, 2025
Python

Improve this page

Add a description, image, and links to the video-captioning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the video-captioning topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

video-captioning

Here are 89 public repositories matching this topic...

YehLi / xmodaler

xiadingZ / video-caption.pytorch

scopeInfinity / Video2Description

tomchang25 / whisper-auto-transcribe

antoyang / VidChapters

jayleicn / recurrent-transformer

vijayvee / video-captioning

JasonYao81000 / MLDS2018SPRING

jpthu17 / EMCL

jssprz / video_captioning_datasets

bytedance / Shot2Story

terry-r123 / Awesome-Captioning

jayleicn / TVCaption

Kamino666 / Video-Captioning-Transformer

nasib-ullah / video-captioning-models-in-Pytorch

UARK-AICV / VLTinT

ParitoshParmar / MTL-AQA

amazon-science / crossmodal-contrastive-learning

jacobswan1 / Video2Commonsense

xid32 / NAACL_2025_TWM

Improve this page

Add this topic to your repo