-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimal SSA/ASS Subtitle Support #2281
Conversation
* Created by cablej01 on 27/12/2016. | ||
*/ | ||
|
||
public class Style { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way we typically support styling of subtitles is by attaching Span objects directly onto the Cue text, which needs to be a CharSequence (you can use SpannableStringBuilder directly) rather than just a String. Take a look at what we do for webvtt to get the idea (see in particular WebvttCueParser).
I think you should use the same approach for SSA. In particular, there are built in Span objects for font, background color, foreground color, size adjustment, bold, italic, underline, strike-out etc, which the existing subtitle rendering components will render automatically without you having to do anything.
For anything that's not supported by existing Span objects I guess it is sensible to extend Cue to add them (assuming they apply to the whole Cue). If they might apply to only part of the Cue, a custom Span would be appropriate. This stuff is only useful if you think someone's going to build a UI component that actually uses this information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I just wanted a branch point where everything was captured and titles would render without style. Happy to change the approach to fit the architecture.
This will be much easier to review and merge as a sequence of smaller pull requests. Could we start with just the extractor change (and corresponding tests / test assets)? That looks pretty easy to get merged as a first step. I posted a comment about the decoder part that probably requires some work. It's fine to send the decoder pieces a single pull request, although it might be simpler if you send two; one adding basic support for plain text SSA, and the second adding in all the styling stuff. Up to you! Thanks. |
private static String defaultStyleFormat = "Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding"; | ||
private String[] dialogueFormat; | ||
private String[] styleFormat; | ||
private Map<String,Style> styles = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect the decoder to be completely stateless (i.e. styles would be a local variable in decode). This approach will definitely be fine for a side-loaded caption file. I'm not sure what happens in the MKV case. Does the header from which this data is parsed get prefixed onto every sample as it's output, and if not already is that possible to do? If so, this approach will be fine for that case too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was better to move the header from the mkv decoder to the SSADecoder just once
If you prefer the header state stored in the mkv decoder and passed every time that is possible but much less efficient as there are typically many styles defined and re-parsing them for every cue seems very wasteful.
If it were possible to pass something other than a byte[] to the decoder it would be fine, but the subtitle decoder isn't called directly by the container decoder so I have to recreate an octet stream to pass through the demuxer. Passing the styles every time probably increases the effective bit rate between the container decoder and the subtitle decoder by a factor of 10 or 20.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See other comment; you should probably put the header into the Format init data instead, if that works. This will require some plumbing of the format data from TextRenderer to the decoder.
@@ -1453,6 +1502,10 @@ public void initializeOutput(ExtractorOutput output, int trackId) throws ParserE | |||
case CODEC_ID_SUBRIP: | |||
mimeType = MimeTypes.APPLICATION_SUBRIP; | |||
break; | |||
case CODEC_ID_ASS: | |||
mimeType = MimeTypes.TEXT_SSA; | |||
ssaHeader = codecPrivate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the header that you don't want to prefix onto each sample? If so, put it into the format object (by doing: initializationData = codecPrivate
) and don't prefix it onto any sample at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I can get the header into the format object and I can see it is there in the decoder factory createDecoder call in onStreamChanged in the TextRenderer.
But the decode method of the decoder is called from the decoder thread, not from the text.
If I plumb the format data into the decoder from the TextRenderer doesn't that make the decoder stateful again? The subtitle objects are created in the decoder thread and i can't see any way of passing Format data into each call to the decoder that would make it stateless.
The styles need to be accessible in the decoder at decode time if I'm going to create the spans at the time the cues are created.
private static String defaultStyleFormat = "Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding"; | ||
private String[] dialogueFormat; | ||
private String[] styleFormat; | ||
private Map<String,Style> styles = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See other comment; you should probably put the header into the Format init data instead, if that works. This will require some plumbing of the format data from TextRenderer to the decoder.
…coder create method. This doesn't make a stateless decoder but it is a step in the right direction. Note that subtitles are not rendering. Is there something wrong with the timecodes?
Had a go. I think this is heading in the direction you suggest. I need a bit more guidance. |
Having slept on it I realise that if I put a Should separate lines of text in a subtitle go into separate cues? |
I've noticed a difference in the timestamps passed to Subtitle.getNextEventTimeIndex between release-v2 and dev-v2. In release-v2 it works the way you explained and it fine when Cues are timed from 0 within a Subtitle (within a block?). In dev-v2 the timestamps passed look more absolute from the beginning of the stream. Here are some log outputs from getNextEventTimeIndex: release-v2: I/SSASubtitle: time 27254 dev-v2; I/SSASubtitle: time 4236599 |
I'm not sure what this means; I think there might be some words missing from your sentence :)?
Not unless they're positioned differently. If it's just a line break then it should be a single cue that contains a line break. |
Finally worked out what I had done to make my comment meaningless - unfamiliarity with markdown. It should now make sense. |
…er but this doesn't seem to be how webvtt does it so not being used.
I had a look at how webvtt does things and as far as I can tell the style definitions are embedded in the text before the first cue. This seems very similar to the SSA/ASS approach so I don't understand what should be changed next. |
import java.io.UnsupportedEncodingException; | ||
import java.util.List; | ||
|
||
import static android.R.attr.initialKeyguardLayout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these imports for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something Android Studio did when i wasn't looking ...
changes removed
if (format.sampleMimeType.equals(MimeTypes.APPLICATION_CEA608) | ||
if (format.sampleMimeType.equals(MimeTypes.TEXT_SSA)) { | ||
byte[] header = format.initializationData.get(1); | ||
String dlgfmt = new String(format.initializationData.get(0), "UTF-8"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid abbreviations. I don't understand what "dlgfmt" means, and I doubt most other people will either ;).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That name wasn't adding any value. I've removed it.
This change is more likely to be merged if it's split into two pieces. One that adds basic support without styling in an obviously correct way, and a second that adds styling. I still don't think the styling stuff in this change is in line with what we do elsewhere, and I think it'll make things much easier if we can merge the non-controversial bit and discuss that separately. |
That's fine. Can you suggest the rough shape of the changes you would like to get to the style-less version and I'll make those and then we can take stock. |
Sorry for the long silence on this. I sync'd this on top of latest dev-v2, cleaned up some pieces, removed the styling and merged. So what's in |
Hi, please find for consideration this pull request which adds SSA/AAS subtitle decoding.
The Script Info section is not decoded. The Styles and Events sections are fully decoded.
\N tags in the text are converted to newlines and all other text tags are removed. The full rich text and the full style information is kept in the Cue object and is therefore available for enhanced rendering but this is not implemented.
Unit tests are included.
Thanks to ojw28 for essential advice in getting timestamp handling correct.