jbickers

January 29, 2008

Extracting subtitles from a DVD

Filed under: Linux — jbickers @ 9:11 pm

This page has a guide for extracting subtitles from a DVD. The subrip tool that it mentions seems to have gotten a bit out of sync. with MPlayer 1.0rc2, as the command line to compile it, given at the top of the source code file, doesn’t work. It produces numerous linking errors similar to this one:

yuv2rgb.c:(.text+0×4d59): undefined reference to `av_malloc’

I have made a minimal change to subrip.c to update the compilation line and to call tesseract instead of gocr. It depends on ImageMagick and Tesseract being installed.

A test using Once Upon A Time In China produced much better results with Tesseract than the default GOCR, the main issues being the digit zero in place of lower-case o, “tothe” instead of “to the”, and “‘II” (upper-case i) instead of “‘ll”.

Blog at WordPress.com.