Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChangeFPS/SelectEvery plummets the speed when outputting at a lower framerate #7

Closed
couleurm opened this issue Nov 11, 2022 · 4 comments

Comments

@couleurm
Copy link

couleurm commented Nov 11, 2022

I am using AverageFrames with a video that has FPS in the hundreds, here's a 280FPS sample

When I use SelectEvery to lower the output framerate, rendering speed shits itself (500FPS -> 2), this doesn't happen on R54

You can try with the sample video and commenting/uncommenting the line in this script:

from vapoursynth import core
from havsfunc import ChangeFPS

clip = core.lsmas.LWLibavSource(source=r"D:\Video Vault\cian.mp4", format="YUV420P8", cache=1, prefer_hw=1)

clip = core.std.AverageFrames(clip, weights=([1]*5))

# clip = ChangeFPS(clip=clip, fpsnum=60, fpsden=1)
# clip[::4].set_output()
clip.set_output()

Using a BlankClip does not tank the speed, using a clip with only I-frames (e.g encoded in a lossless codec) makes it tank less (~20FPS from my single test)

@AkarinVS
Copy link
Member

Thanks for the report. This is indeed an interesting issue.

Confirmed the 100x slowdown between these two scripts:
fast:

from vapoursynth import core
clip = core.lsmas.LWLibavSource(source=a, format="YUV420P8", cache=1)
clip = core.std.AverageFrames(clip, weights=([1]*5))
clip.set_output()

slow:

from vapoursynth import core
clip = core.lsmas.LWLibavSource(source=a, format="YUV420P8", cache=1)
clip = core.std.AverageFrames(clip, weights=([1]*5))
clip[::4].set_output()

Profiling reveals that lsmas spent most of the time on the 2nd script.

I think it's because vs api4 changed the way it caches frames from source filter. The combination of AverageFrame and SelectEvery changes the request pattern in a way that makes the cache miss almost every single time.

For example, if I change the hardcoded 20 to 100 in this line:

cache.setMaxFrames(core->threadPool->threadCount() * 2 + 20);

the slowdown is reduced to 4x, similar to R54.

Will need to think about the root cause more.

@AkarinVS
Copy link
Member

I've created a workaround for this issue. Please try this build
https://github.com/AmusementClub/vapoursynth-classic/actions/runs/3451033013

download the release zip file and replace your vapoursynth.dll with the one in the zip.

It's a safe change, but its performance implications are not well understood at this time, and more benchmarks are needed.
You're welcomed to benchmark your other scripts as well and please report back the results.

Thanks.

@couleurm
Copy link
Author

You're welcomed to benchmark your other scripts as well and please report back the results.

Thank you so much! It is indeed working (faster than R54!)

if it ends up being unstable for other specific usecases, please make it optional (if that can be done after VS loads) for mine with something like core.std.needsSort(False)

@AkarinVS
Copy link
Member

Testing didn't show any noticeable performance regressions, so I will keep the workaround and released https://github.com/AmusementClub/vapoursynth-classic/releases/tag/R57.A6.

Thanks for the testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants