video-captioning

Here are 52 public repositories matching this topic...

YehLi / xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

image-captioning video-captioning visual-question-answering vision-and-language cross-modal-retrieval pretraining tden

Updated Feb 27, 2023
Python

xiadingZ / video-caption.pytorch

Star

pytorch implementation of video captioning

deep-learning pytorch video-captioning

Updated Aug 19, 2019
Python

scopeInfinity / Video2Description

Star

Video to Text: Natural language description generator for some given video. [Video Captioning]

deep-neural-networks video-processing image-captioning cnn-keras audio-processing lstm-neural-networks video-captioning video-to-text

Updated May 3, 2022
Python

tomchang25 / whisper-auto-transcribe

Star

Auto transcribe tool based on whisper

text-to-speech deep-learning pytorch speech-recognition speech-to-text language-model gradio speech-processing asr video-captioning voice-activity-detection gradio-interface

Updated Apr 27, 2023
Python

vijayvee / video-captioning

Star

This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.

tensorflow seq2seq sequence-to-sequence video-captioning s2vt multimodal-deep-learning

Updated Oct 12, 2019
Python

JasonYao81000 / MLDS2018SPRING

Star

Machine Learning and having it Deep and Structured (MLDS) in 2018 spring

Updated Apr 19, 2019
Python

jpthu17 / EMCL

Star

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

video-captioning neurips video-retrieval video-question-answering cross-modal-retrieval

Updated Apr 9, 2024
Python

bytedance / Shot2Story

Star

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

benchmark research video-summarization dataset video-captioning video-story vision-language video-question-answering video-language large-language-models video-language-pretraining video-story-generation

Updated Sep 25, 2024
Python

jayleicn / TVCaption

Star

[ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset

pytorch dataset video-captioning

Updated Sep 6, 2023
Python

Kamino666 / Video-Captioning-Transformer

Star

这是一个基于Pytorch平台、Transformer框架实现的视频描述生成 (Video Captioning) 深度学习模型。视频描述生成任务指的是：输入一个视频，输出一句描述整个视频内容的文字（前提是视频较短且可以用一句话来描述）。本repo主要目的是帮助视力障碍者欣赏网络视频、感知周围环境，促进“无障碍视频”的发展。

pytorch transformer video-captioning