Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist filters in between segments. #179

Merged
merged 9 commits into from
Apr 6, 2020
Merged

Persist filters in between segments. #179

merged 9 commits into from
Apr 6, 2020

Conversation

j0sh
Copy link
Collaborator

@j0sh j0sh commented Mar 23, 2020

This PR fixes GPU performance and VRAM growth issues by reusing the filtergraph in between segments. With this, LPMS is about equal to ffmpeg. Compared to previous LPMS versions, this gives us roughly a 30% performance gain for the workload of 1080p -> 720p,576p,460p,360p .

Note: Github for some reason refuses to show commits in the order they appear in the tree, and insists on incorrectly showing them chronologically based on commit time. The below is the correct ordering in tree, as will be reflected in a pull. The ordering is somewhat important, as some tests are added prior to the bulk of the functional modifications as a guard against behavioral changes. Ideally, nothing about the behavior as a result of this PR should change; it's just an extensive plumbing fix for performance.

Mostly ready for review, but not entirely comfortable with this yet ; in particular, SW encoding seems to show some differences at higher frame rates between 97c48b3 and a03f838 .

eab2b37 ffmpeg: Add tests for first and last PTS values. …
This helps ensure that changes to sensitive areas that munge pts
such as the filtergraph continue to produce the correct results

97c48b3 ffmpeg: Check both software and Nvidia frame counts. …
More sanity checking.

a03f838 ffmpeg: Persist filtergraph in between transcodes. …
Re-initializing the filtergraph turns out to be expensive,
especially if GPUs are in the picture. We get around 30% more
performance by persisting the filtergraph.

In order to flush the filter, we cache the most recent audio/video
frame and feed those into the filter repeatedly with a sentinel
value set (AVFrame.opque). Once we receive a frame from the
filtergraph with the sentinel set, we know the filter has been
completely flushed.

4ab522f ffmpeg: Disallow out-of-order segments. …
This is a consequence of persisting the fps filter which expects
pts timestamps to be monotonic. Non-monotonic frames are dropped.

Accommodating out-of-order segments would involve totally
rewriting output timestamps based on the input value from the
source frame, while forging the pts that the filter receives.

00fe946 ffmpeg: Log improvements and cosmetics.

00163c5 ffmpeg: Documentation updates.

@j0sh j0sh requested review from darkdarkdragon and yondonfu March 23, 2020 06:45
@j0sh j0sh changed the base branch from ja/misc to master March 25, 2020 20:54
Copy link
Contributor

@darkdarkdragon darkdarkdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One cosmetic change request, otherwise looks good!

// Filter: The challenge here is around fps filter adding and dropping frames.
// The fps filter expects a strictly monotonic input pts: frames with
// earlier timestamps get dropped, and frames with too-late timestamps
// will see a bunch of duplicated frames be generated to catch up with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if segment missed? Like, orchestrator receives segment 1 and then segment 10?

Copy link
Collaborator Author

@j0sh j0sh Mar 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. This actually works correctly. The fps filter duplicates the last frame to make up for the "missing frames" if it detects insertion(s) are necessary between the current frame and the last frame in order to maintain a smooth rate of output. If the current and last frames are "far apart", this does result in outputting a zillion duplicates to fill the gap.

But since the last frame is a flush frame, the duplicates are dropped before hitting the encoder. This leads to the final output still being correct. We actually have a test that implicitly checks this behavior via alternating timestamps in 4ab522f which encodes segments 1 and 3 after dropping segments 0 and 2. But I'll see about adding an explicit check for skipped segments.

This brings up another issue I had overlooked until now. Since the fps filter comes before the scale filter, all duplicated frames get rescaled, then dropped. This is additional computation that really should be avoided since it is a clear attack vector: send artificial timestamps that are years apart and pin down the CPU / GPU as the filtergraph churns through all the duplicated frames.

lpms/ffmpeg/ffmpeg.go

Lines 213 to 217 in d03c749

var filters string
if param.Framerate > 0 {
filters = fmt.Sprintf("fps=%d/1,", param.Framerate)
}
filters += fmt.Sprintf("%s='w=if(gte(iw,ih),%d,-2):h=if(lt(iw,ih),%d,-2)'", scale_filter, w, h)

An easy fix for this would be to rearrange the filter string such that the fps adjustment comes after the rescaling, but it also means doing unnecessary work if the fps is being adjusted downwards (eg, 60fps -> 30fps) which is a rather common case (and that's the reason for the filters being ordered the way they are right now). But given the choice between avoiding redundant work, and protecting against an easy DoS, it'd be better to rearrange the filters to avoid DoS.

That being said, I think it might be good to spend a little more time figuring out how to support alternating timestamps without making the code even more terrible than it already is, because I suspect we'll be running into ordering issues often with retries etc. Supporting alternating timestamps should allow us to preserve the current filtergraph ordering while mitigating the DoS.

Note that we've always had an issue with the user requesting an extremely high fps output, but theoretically that could be addressed by short circuiting the transcode as described in #129

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be better to rearrange the filters to avoid DoS.

I agree we should do this until we implement support alternating timestamps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this one livepeer/go-livepeer#1422 will be an issue with persistent filtergraph too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this one livepeer/go-livepeer#1422 will be an issue with persistent filtergraph too?

It shouldn't, since timestamps are carried as int64 everywhere, and most sane formats can accommodate rollovers (including mpegts and RTMP). Dunno what the issue is with that ticket.

Filters rearranged and skip test added in c5d1824

ffmpeg/lpms_ffmpeg.c Outdated Show resolved Hide resolved
j0sh added 4 commits March 31, 2020 18:20
This helps ensure that changes to sensitive areas that munge pts
such as the filtergraph continue to produce the correct results.
This is a useful test and we should always run it for software.
@j0sh j0sh force-pushed the ja/filterflush branch from 00163c5 to 33c7a90 Compare April 2, 2020 04:10
@j0sh
Copy link
Collaborator Author

j0sh commented Apr 2, 2020

Rebased on top of master and squashed the fixups from earlier (and also probably some other small modifications which I've lost track of). But here are some relevant changes from the past day or so:

c5d1824 Filters rearranged due to the DoS issue mentioned in #179 (comment) and tests also added for skipped segments. The compute savings are real with the rearranged filters: with two skipped segments (240 frames), the test runs roughly 20% faster on my machine. Or seen in another light, we've avoided a 20% slowdown. But we may still end up a bit slower here due to redundant rescaling in downsampling the framerate.

6d66e5b Added a pts check for framerate pass through. Good thing this was added, because there was a bug in the timestamp rewriting logic.

6c4daa9 Aforementioned bug fixed. See the commit for details on what it does.

33c7a90 Cosmetic changes as requested in #179 (comment) .

SW encoding seems to show some differences at higher frame rates between 97c48b3 and a03f838 .

This was a bit of a silly bug. The memory leak PR ( https://github.com/livepeer/lpms/pull/177/files ) unintentionally introduced a small change meant for this patch series (the if(vf->active) check), and things were working OK locally until I split the branches up (then they started failing in CI and locally). Merging #177 and rebasing this PR on top fixed the issue.

Copy link
Contributor

@darkdarkdragon darkdarkdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

j0sh added 5 commits April 3, 2020 20:41
Re-initializing the filtergraph turns out to be expensive,
especially if GPUs are in the picture. We get around 30% more
performance by persisting the filtergraph.

In order to flush the filter, we cache the most recent audio/video
frame and feed those into the filter repeatedly with a sentinel
value set (`AVFrame.opque`). Once we receive a frame from the
filtergraph with the sentinel set, we know the filter has been
completely flushed.
This is a consequence of persisting the fps filter which expects
pts timestamps to be monotonic. Non-monotonic frames are dropped.

Accommodating out-of-order segments would involve totally
rewriting output timestamps based on the input value from the
source frame, while forging the pts that the filter receives.
Also add a test for skipping segments.
@j0sh j0sh force-pushed the ja/filterflush branch from 33c7a90 to a15864f Compare April 6, 2020 19:40
@j0sh j0sh merged commit a15864f into master Apr 6, 2020
@j0sh j0sh deleted the ja/filterflush branch April 6, 2020 19:41
@j0sh
Copy link
Collaborator Author

j0sh commented Apr 6, 2020

Rebased and merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants