Sequence to Sequence - Video to Text

This service uses S2VT to describe video content with natural language text.

It is part of our third party DNN Model Services.

Welcome

The service receives a video and uses it as an input for a VGG16 model that generates features of each frame.

Then all these frames are passed as input for a second model (S2VT_VGG16) that outputs a caption for all features.

What’s the point?

The service makes prediction using computer vision and machine learning techniques.

The service outputs its best guess to describe an action on the specified time interval from a video (accordingly to confidence).

The output is delivered using SRT format.

How does it work?

The user must provide the following inputs in order to start the service and get a response:

Inputs:

url: An YouTube video URL.
start_time_sec: Start time position, in seconds.
stop_time_sec: Stop time position, in seconds.
The time interval (stop-start) must be <= 20 seconds.

You can use this service from SingularityNET DApp, clicking on SNET/VideoCaptioning.

You can also call the service from SingularityNET CLI (snet).

Assuming that you have an open channel to this service:

$ snet client call snet s2vt-video-captioning default_group video_cap '{"url": "http://crcv.ucf.edu/THUMOS14/UCF101/UCF101/v_PlayingGuitar_g05_c01.avi", "start_time_sec": "0", "stop_time_sec": "0"}'
...
Read call params from cmdline...

Calling service...

    response:
        value: '{''Caption'': ''1\n00:00:00,00 --> 00:00:10,00\nA man is playing guitar.''}'

What to expect from this service?

Video Input:

with:

start_time_sec: 0
stop_time_sec: 0

Response:

1
00:00:00,00 --> 00:00:12,00
A man is playing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

s2vt-video-captioning.md

s2vt-video-captioning.md

Sequence to Sequence - Video to Text

Welcome

What’s the point?

How does it work?

What to expect from this service?

Files

s2vt-video-captioning.md

Latest commit

History

s2vt-video-captioning.md

File metadata and controls

Sequence to Sequence - Video to Text

Welcome

What’s the point?

How does it work?

What to expect from this service?