Skip to content

Converts VobSub subtitles (.idx/.srt format) into .srt subtitles.

License

Notifications You must be signed in to change notification settings

dguglielmi/VobSub2SRT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VobSub2SRT is a simple command line program to convert .idx / .sub subtitles into .srt text subtitles by using OCR. It is based on code from the MPlayer project - a really really great movie player. Some minor parts are copied from ffmpeg/avutil headers. Tesseract is used as OCR software.

vobsub2srt is released under the GPL3+ license. The MPlayer code included is GPL2+ licensed.

The quality of the OCR depends on the text in the subtitles. Currently the code does not use any preprocessing. But I’m currently looking into adding filters and scaling options to improve the OCR. You can correct mistakes in the .srt files with a text editor or a special subtitle editor.

Building

You need tesseract >= 4.1. You also need meson and gcc to build it.

You should also install the tesseract data for the languages you want to use!

meson build
ninja -C build
ninja -C build install

Usage

vobsub2srt converts subtitles in VobSub (.idx / .sub) format into subtitles in .srt format. VobSub subtitles consist of two or three files called Filename.idx, Filename.sub and optional Filename.ifo. To convert subtitles simply call

vobsub2srt Filename

with Filename being the file name of the subtitle files WITHOUT the extension (.idx / .sub). vobsub2srt writes the subtitles to a file called Filename.srt.

If a subtitle file contains more than one language you can use the --lang parameter to set the correct language (Use --langlist to find out about the languages in the file). For some languages you might need to set the tesseract language yourself (e.g., chi_tra/chi_sim for traditional or simplified chinese characters). You can use --tesseract-lang to do this. In most cases this should however be autodetected.

If you want to dump the subtitles as images (e.g. to check for correct ocr) you can use the --dump-images flag.

Use --help or read the manpage to get more information about the options of vobsub2srt.

Bug reports

Please submit bug reports or feature requests to the issue tracker on GitHub. If you do not have a GitHub account and feel uncomfortable creating one then feel free to send an e-mail to <ruediger@c-plusplus.de> instead.

If you have problems with a specific subtitle file then please check if it works in mplayer first. If it does not then please report the bug to mplayer as well and link to the mplayer bug report.

For bug reports please run vobsub2srt with the --verbose option and copy and paste the full output to the bug report.

Contributors

Most code is from the MPlayer project.

  • Armin Häberling <armin.aha@gmail.com> wrote a patch to fix an issue with multiple instances of the same subtitle in result file (21af426)
  • James Harris <jimmy@jamesharris.org> wrote the formula for Homebrew (54f311d6)
  • Leo Koppelkamm reported and fixed issue #5 and problems with long filenames (b903074c, 36ec8da, d3602d6)
  • Till Korten <webmaster@korten-privat.de> wrote the ebuild script (#13)
  • Andreasf fixed missing libavutil include path (3a175eb, #15)
  • Michal Gawlik fixed the overlapping issue (5b2ccabc55f, #29, #32)
  • “bit” made sure no trailing whitespace are written to the SRT (3a59dc278abc2, #38)
  • Baudouin Raoult for various fixes (028f742, #44, b722a03, #42, 7293ac2, #40)
  • Justyn Butler added the y-threshold support (f873761, #43)
  • James Laird-Wah added min-width/height support and fixed other issues (41c6844, #48, #46)
  • Filirom1 fixed a minor issue (4ed58c2, #49)

To Do

  • implement preprocessing (first step scaling. Code available in spudec.c)

About

Converts VobSub subtitles (.idx/.srt format) into .srt subtitles.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 85.1%
  • Perl 6.2%
  • Shell 5.7%
  • Meson 3.0%