Improve frametiming for linux capture #2333

gschintgen · 2024-03-30T21:09:11Z

Description

This commit computes the time of the next screen capture based on the current frame's theoretical time point instead of the actual capture time. This should be slightly more precise and lead to better frame timing.

I made the same change in three parts of the codebase:

twice in kmsgrab.cpp (display_ram_t and display_vram_t). IIUC these correspond to capture to system RAM for software encoding and capture to VRAM for hardware encoding respectively. Please correct me if I'm wrong; I did not try to understand the whole pipeline.
once in x11grab.cpp which presented the same issue.

At this point I did only preliminary testing of the kmsgrab & VA-API case on an AMD 6650. The results were very encouraging with a precise 60.0 Hz instead of the previous average of 59.94 Hz. (See #2286 for details.) I'll vary my testing over the next few days and report back. Any feedback and review is welcome, of course.

If I'm not mistaken this fix should benefit all Linux users with the exception of NvFBC configurations.

Screenshot

Issues Fixed or Closed

Fixes #2286

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Dependency update (updates to dependencies)
Documentation update (changes to documentation)
Repository update (changes to repository files, e.g. .github/...)

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have added or updated the in code docstring/documentation-blocks for new or existing methods/components

Branch Updates

LizardByte requires that branches be up-to-date before merging. This means that after any PR is merged, this branch
must be updated before it can be merged. You must also
Allow edits from maintainers.

I want maintainers to keep my branch updated

CLAassistant · 2024-03-30T21:09:18Z

All committers have signed the CLA.

codecov · 2024-03-30T22:21:49Z

Codecov Report

Attention: Patch coverage is 0% with 46 lines in your changes are missing coverage. Please review.

Project coverage is 7.41%. Comparing base (5c1bad7) to head (5737f0d).

Additional details and impacted files

@@            Coverage Diff             @@
##           nightly   #2333      +/-   ##
==========================================
+ Coverage     7.35%   7.41%   +0.06%     
==========================================
  Files           95      95              
  Lines        18949   18965      +16     
  Branches      8130    8070      -60     
==========================================
+ Hits          1393    1406      +13     
- Misses       15857   16519     +662     
+ Partials      1699    1040     -659

Flag	Coverage Δ
Linux	`6.96% <0.00%> (+0.03%)`	⬆️
Windows	`2.05% <0.00%> (+0.03%)`	⬆️
macOS-12	`8.47% <ø> (+0.08%)`	⬆️
macOS-13	`7.78% <ø> (+0.06%)`	⬆️
macOS-14	`8.12% <ø> (+0.07%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
src/platform/windows/display_base.cpp	`9.33% <0.00%> (+0.08%)`	⬆️
src/platform/linux/cuda.cpp	`1.93% <0.00%> (-0.01%)`	⬇️
src/platform/common.h	`28.39% <0.00%> (-3.12%)`	⬇️
src/platform/linux/kmsgrab.cpp	`2.84% <0.00%> (-0.01%)`	⬇️
src/platform/linux/wlgrab.cpp	`0.00% <0.00%> (ø)`
src/platform/linux/x11grab.cpp	`34.28% <0.00%> (-0.11%)`	⬇️

... and 28 files with indirect coverage changes

ReenigneArcher · 2024-03-30T23:55:53Z

Don't worry about the patch "failure" but the lint failure will need to be addressed. https://github.com/LizardByte/Sunshine/actions/runs/8493409915/job/23268083421?pr=2333#step:5:12

peperunas · 2024-04-01T15:19:43Z

I will test this PR ASAP. I'll look also into supporting NvFBC - if applicable.

gschintgen · 2024-04-01T19:58:09Z

I force-pushed the revised commit. Unfortunately it's late again and my time for actual testing is limited. I'll do my best. (At least testing something game related is fun ;-))

kmsgrab with va-api hardware encoding on AMD is ok. (no code change compared to initial test)
kmsgrab with x264 encoding needs testing.
x11grab with va-api hardware encoding needs testing.
x11grab with x264 encoding needs testing.

gschintgen · 2024-04-01T20:30:09Z

As for the nvfcb codepath, I highly suggest to open a new issue (cc @peperunas ) and make it only about nvfbc on linux. The framerate is configured here:

Sunshine/src/platform/linux/cuda.cpp

Line 757 in bb7c2d5

capture_params.dwSamplingRateMs = 1000 /* ms */ / config.framerate;

and then it's simply passed to nvidia's software:

Sunshine/src/platform/linux/cuda.cpp

Line 651 in bb7c2d5

if (func.nvFBCCreateCaptureSession(handle, &capture_params)) {

At least that's my understanding.
In essence the looping and waiting is all done by nvidia.

What I find peculiar though is that dwSamplingRateMs is defined as an uint32 as far as I can tell. See here:
https://gitlab.com/fzwoch/obs-nvfbc/-/blob/master/NvFBC.h#L935
(I don't seem to have a copy of this header file on my system even though I installed all the development dependencies.)

This could be problematic depending on how this is working in detail. As the sampling rate parameter is an integer, the delay between successive iterations will be 16ms instead of 16.66666666. IF the nvfbc code does not wait and block until the next vsync, and vsync is probably disabled on the host, this will lead to an inexact timing of the captured frames. In essence it would be similar to the situation that this PR addresses for kmsgrab and x11grab.

What I find puzzling though is that this would in effect mean that this aspect of the API itself were broken, since it forces the interval to be quite imprecise.

I think that a) someone a bit more familiar with this should have a look at this and think it through, and b) there should be some in-depth testing as I did in my original bug report, i.e. check the long-term average framerate that moonlight-qt writes to its log at the end of the stream and post it in the newly opened issue. Even a tiny (but supposedly regular) deviation will easily lead to microstutter.

Theoretically it would lead to 62.5 frames per second, some of which would have to be dropped somewhere. Or if nvidia only emits a new frame when the display content has actually changed (let's say at a precise but unrealistic 16.66666ms interval), there will regularly be situations where a whole 16ms interval will fit entirely in the 16.666ms interval between two successive frames. This would then lead to an iteration without capture of a new frame. The next frame, a few microseconds later, would then have to wait almost a complete frame interval until it's finally captured by the next iteration and then encoded end emitted. I can't imagine how this could result in a perfectly smooth stream.

gschintgen · 2024-04-01T20:50:08Z

And even without that 16ms interval fitting right into a 16.666ms interval, you'd still have two frequencies close to each other. That will always lead to beating and hence issues with frame pacing. Unless each 16ms interval is dragged out via some synching mechanism until the next 16.666ms frame render interval is done.

gschintgen · 2024-04-01T21:01:18Z

Oh well, I'll probably have to take all that stuff about nvfbc back. I don't know how, but somehow I must have missed this part:

Sunshine/src/platform/linux/cuda.cpp

Lines 808 to 812 in bb7c2d5

    
           while (next_frame > now) { 
        
             std::this_thread::sleep_for(1ns); 
        
             now = std::chrono::steady_clock::now(); 
        
           } 
        
           next_frame = now + delay;

which does exhibit the same issue as for kmsgrab and x11grab. The dwSamplingRateMs thing is probably just to populate the structure passed to nvfbc.

I must admit that I'm struggling a bit with following the code.

gschintgen · 2024-04-01T21:23:49Z

I've gone ahead and copy-pasted the change from (kms|x11)grab to cuda.cpp.

Let's see how it turns out... cc @peperunas

peperunas · 2024-04-02T12:55:15Z

As me tioned in the related issue, the patch works well for me.

Great job!

gschintgen · 2024-04-02T13:29:38Z

As me tioned in the related issue, the patch works well for me.

Great job!

Thanks for testing! Just to make sure: That's still using NvFBC for capturing (using patched drivers)?

ReenigneArcher · 2024-04-02T14:34:18Z

We should probably remove this note: https://github.com/LizardByte/Sunshine/pull/1438/files

gschintgen · 2024-04-02T18:42:52Z

We should probably remove this note: https://github.com/LizardByte/Sunshine/pull/1438/files

I'm not sure either way, but I can of course remove that note. There seem to be two separate issues though in #1429:

Gamescope causing stutter on the host,
a disabled Gamescope still leaving a bit of remaining microstutter.

That second microstutter could well be the one fixed here. Not sure about the first one. Also: Does it introduce that stutter only while streaming or also when using gamescope as compositor and playing locally on the host? (cc @Arbitrate3280 )

As for the first one it does remind me of my oldish mini pc: if I start moonlight from inside a desktop environment using ubuntu's default compositor (gnome's mutter) it's an uneven mess, but if I start from a plain virtual console it's buttery smooth (with my PR that is...). That's client-side though, not host-side.

cgutman

Can you also update wlgrab.cpp?

gschintgen · 2024-04-03T11:55:58Z

Can you also update wlgrab.cpp?

Sure, I just didn't think of it!

But just now I'm noticing that it does raise all of those questions that I didn't want to raise in this PR. (Because I'm not sure I'll have the time to properly follow through and do all the investigations and experimentation.)

Anyway, the waiting loop in (kms|x11)grab is like this (before my PR):

Sunshine/src/platform/linux/kmsgrab.cpp

Lines 1199 to 1206 in 2da6fb0

    
           if (next_frame > now) { 
        
             std::this_thread::sleep_for((next_frame - now) / 3 * 2); 
        
           } 
        
           while (next_frame > now) { 
        
             std::this_thread::sleep_for(1ns); 
        
             now = std::chrono::steady_clock::now(); 
        
           } 
        
           next_frame = now + delay;

It first waits using sleep_for and then it tries to do a kind of busy-loop until the right nanosecond (which is unrealistic). I was wondering what the point of this approach is? Is it
a) because no sleep function in the standard library is precise enough?
b) a vague attempt to spin up the cpu and have it ramp up its frequency?
I suppose it's a). But won't any sleep_for yield to the kernel scheduler anyway? In that case line 1203 would negate the whole idea of first having the supposedly inexact sleep_for() for 2/3 of the interval complemented by a busy-wait. (I'm noticing that top gives some 7-9% cpu usage when sunshine is streaming and not the expected 33% of a core if it were indeed busy-waiting. How is cpu usage currently using wlgrab?)

Within wlgrab.cpp the sleeping approach is the same, except for the "missing" nanosecond sleep in the busy-waiting-loop:

Sunshine/src/platform/linux/wlgrab.cpp

Lines 135 to 141 in 2da6fb0

    
           if (next_frame > now) { 
        
             std::this_thread::sleep_for((next_frame - now) / 3 * 2); 
        
           } 
        
           while (next_frame > now) { 
        
             now = std::chrono::steady_clock::now(); 
        
           } 
        
           next_frame = now + delay;

So in essence I'm wondering if wlgrab capture is even affected by this issue! (But it should have higher cpu usage.)

Here is the commit that introduced the 1ns-sleep: e3f642a
It was intended to reduce CPU usage. Which it probably does, since the busy-wait is probably no longer kept busy!

To be honest I'm wondering how important all of that timing micro-optimization really is. In particular if we ensure that on average the pacing is the theoretically exact 60.00 fps (or whatever is requested by moonlight). How large could this overshoot be? Is it even reasonable to reserve a third(!) of the time for busy-waiting until the precise nanosecond? I googled around a bit, and all I could find was that in practice the sleep_for overshoot seems to be <= 1ms. Even for 120fps, with a frametime interval of 8.3ms it should be sufficient to only reserve the last 1/8 (instead of 1/3) for busy-waiting. Or better yet, given the overshoot is probably rather constant, it may be best to define that busy-wait in absolute terms (e.g. 1.5ms) instead of relative.

What do you think?

gschintgen · 2024-04-03T21:21:46Z

BTW software encoding (kmsgrab) also leads to the expected 60.00 fps:

01:33:55 - SDL Info (0): Global video stats
01:33:55 - SDL Info (0): ----------------------------------------------------------
Incoming frame rate from network: 60.00 FPS
Decoding frame rate: 60.00 FPS
Rendering frame rate: 60.00 FPS
Host processing latency min/max/average: 6.1/61.9/10.1 ms
Frames dropped by your network connection: 0.00%
Frames dropped due to network jitter: 0.00%
Average network latency: 1 ms (variance: 0 ms)
Average decoding time: 0.54 ms
Average frame queue delay: 0.08 ms
Average rendering time (including monitor V-sync latency): 6.08 ms

(I'm not sure why the encoding misbehaved 3 or 4 times in those 90 minutes (could well be that the host did some unrelated background processing), but apart from that it was as smooth as can be expected given those global stats.

gschintgen · 2024-04-04T19:33:30Z

Investigating all those timing questions in my previous post was unexpectedly straightforward, thanks to the contributor that already added all the overshoot measurement code to the Windows side of Sunshine... IOW I just copied over the code from Windows to Linux kmsgrab and did some tests.

First test

Current sleep methodology (sleep_for for 2/3 of the time, then suspicious "busy"-waiting), streaming my Gnome desktop, mostly idle, running glxgears, wiggling it around a bit, light stuff.

As you can see below (in the first part of the log output) the maximum overshoot is negligible (<1ms).

Then I loaded the CPU with stress -c 14 where 14 is the number of physical cores of my CPU (6 performance cores supporting hyperthreading and 8 efficiency cores). You can see the timings breaking down a bit and the overshoot even surpasses 3ms at one point.

[2024:04:04:20:00:44]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.0ms
[2024:04:04:20:01:04]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.0ms
[2024:04:04:20:01:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.5ms/0.0ms
[2024:04:04:20:01:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.5ms/0.0ms
[2024:04:04:20:02:05]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.0ms
[2024:04:04:20:02:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.0ms
[2024:04:04:20:02:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.0ms
[2024:04:04:20:03:05]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.0ms
[2024:04:04:20:03:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.0ms
[2024:04:04:20:03:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.5ms/0.0ms
[2024:04:04:20:04:05]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.5ms/0.0ms
[2024:04:04:20:04:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.5ms/0.0ms
[2024:04:04:20:04:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.7ms/0.0ms
[2024:04:04:20:05:05]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.4ms/0.0ms
[2024:04:04:20:05:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.0ms
[2024:04:04:20:05:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.0ms
[2024:04:04:20:06:05]: Debug: Sleep overshoot (min/max/avg): 0.0ms/3.3ms/0.0ms
[2024:04:04:20:06:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/1.9ms/0.0ms
[2024:04:04:20:06:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.3ms/0.0ms
[2024:04:04:20:07:05]: Debug: Sleep overshoot (min/max/avg): 0.0ms/1.5ms/0.0ms
[2024:04:04:20:07:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.7ms/0.0ms
[2024:04:04:20:07:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/2.5ms/0.0ms
[2024:04:04:20:08:05]: Debug: Sleep overshoot (min/max/avg): 0.0ms/3.9ms/0.0ms
[2024:04:04:20:08:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/1.4ms/0.0ms
[2024:04:04:20:08:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/1.1ms/0.0ms
[2024:04:04:20:09:05]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.7ms/0.0ms
[2024:04:04:20:09:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.9ms/0.0ms
[2024:04:04:20:09:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/1.2ms/0.0ms
[2024:04:04:20:10:05]: Debug: Sleep overshoot (min/max/avg): 0.0ms/1.6ms/0.0ms
[2024:04:04:20:10:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/1.8ms/0.0ms
[2024:04:04:20:10:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.3ms/0.0ms

Second test

Same as before, except that I removed the busy-wait logic and replaced it by a single sleep_for with the theoretical waiting time, leaving it up to the kernel scheduler to get the timing mostly right. I find the results quite
interesting: I can't even tell at what point I launched the 14-core stress test! The timings are arguably better by just leaving it up to the kernel. I had to double check the filestamps to make sure that I even replaced (and reloaded)
the binary.

[2024:04:04:20:22:54]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.3ms/0.1ms
[2024:04:04:20:23:14]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.1ms
[2024:04:04:20:23:34]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.3ms/0.1ms
[2024:04:04:20:23:54]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.3ms/0.1ms
[2024:04:04:20:24:14]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.3ms/0.1ms
[2024:04:04:20:24:34]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.1ms
[2024:04:04:20:24:54]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.1ms
[2024:04:04:20:25:14]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.1ms
[2024:04:04:20:25:34]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.3ms/0.1ms
[2024:04:04:20:25:54]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.1ms
[2024:04:04:20:26:14]: Debug: Sleep overshoot (min/max/avg): 0.1ms/0.1ms/0.1ms
[2024:04:04:20:26:34]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.1ms
[2024:04:04:20:26:54]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.1ms
[2024:04:04:20:27:14]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.3ms/0.1ms
[2024:04:04:20:27:34]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.3ms/0.1ms
[2024:04:04:20:27:54]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.5ms/0.1ms
[2024:04:04:20:28:14]: Debug: Sleep overshoot (min/max/avg): 0.0ms/2.1ms/0.1ms
[2024:04:04:20:28:34]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.2ms/0.1ms

Only by adding more stress (starting a youtube video in firefox in addition to stress) I could get it to break down and show a maximum of 3 or even 4ms. (Not shown above.)

Of course this test might not correspond perfectly to the main use case, but I think it should be a good enough test of how stable the timings are when under CPU load.

My plan for this PR is now:

add the overshoot stats code to all linux capture paths
unify the sleeping code to a single sleep_for

If it turns out after more varied testing of the nightlies that the overshoot is deemed problematic after all, it's fairly easy to add a busy-wait back in, but in that case it probably should not call sleep_for itself.

3rd test

I also tested (still with kmsgrab) the true busy-wait loop as it can currently be found in wlgrab.cpp, i.e. without that sleep_for(1ns). As expected, the CPU usage increased dramatically. From 7-8% of a core in both previous cases (single sleep or sleep_for followed by a loop of sleep_for's) to around 40%. On the plus side it reduced the maximum sleep overshoot to a perfect 0.0ms. That is, until I started up stress -c 14 again. (peak of 2.9ms). If I additionally played around with youtube in firefox, the timings couldn't keep up at two specific time points. (See below. The average still stays at 0.0 though)

[2024:04:04:21:00:58]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:01:18]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:01:38]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:01:58]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:02:18]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:02:38]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:02:58]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:03:18]: Debug: Sleep overshoot (min/max/avg): 0.0ms/2.9ms/0.0ms
[2024:04:04:21:03:38]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:03:58]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:04:18]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:04:38]: Debug: Sleep overshoot (min/max/avg): 0.0ms/13.3ms/0.0ms
[2024:04:04:21:04:58]: Debug: Sleep overshoot (min/max/avg): 0.0ms/13.5ms/0.0ms
[2024:04:04:21:05:18]: Debug: Sleep overshoot (min/max/avg): 0.0ms/1.3ms/0.0ms

peperunas · 2024-04-04T20:42:55Z

Outstanding work @gschintgen!

Before this commit kmsgrab, x11grab, cuda used a "busy" wait that did not in fact keep the thread busy since it called `sleep_for()`. The whole waiting period is now slept for at once. Wlgrab used a true busy wait for a third of the waiting time. This entailed a rather high CPU usage. The watiting method has been aligned to the simplified method of the other Linux capture backends. (Benchmark data can be found in the discussion of PR LizardByte#2333.)

gschintgen · 2024-04-05T08:55:16Z

src/platform/windows/display_base.cpp

-            sleep_overshoot_tracker.collect_and_callback_on_interval(overshoot_ns.count() / 1000000., print_info, 20s);
-          }
+          std::chrono::nanoseconds overshoot_ns = std::chrono::steady_clock::now() - sleep_target;
+          log_sleep_overshoot(overshoot_ns);


In order to avoid too much code duplication, I moved out the overshoot logging to common platform code. Unfortunately I don't have a Windows build environment set up, so I couldn't test these changes.

gschintgen · 2024-04-05T09:06:11Z

In the latest commits I harmonized the sleeping code for all Linux capture methods. I did only minor testing to confirm that nothing broke and that the timings remain as good as with the original changes in this PR. (I couldn't test the minor changes to the Windows capture code.)
I'm not super satisfied of having multiple lines of actual code (log_sleep_overshoot) in a header file, but I didn't know where else to put it. It should be fine though.

Please review. (And re-testing would be nice too.)

peperunas · 2024-04-06T09:05:37Z

I am testing your code as I write. One input I have is that maybe the logging might be done in a debug build - if such a build is supported by Sunshine's build system - to save some calculations.

gschintgen · 2024-04-06T09:31:28Z

I am testing your code as I write. One input I have is that maybe the logging might be done in a debug build - if such a build is supported by Sunshine's build system - to save some calculations.

Thanks for testing! The logging & printing code is only executed when the log level in sunshine's configuration is set to debug or higher. The log level can easily be changed in the UI. Here is the relevant if:
https://github.com/gschintgen/Sunshine/blob/87a5d93327070323a6b0b9b4f9e33d8150a878ac/src/platform/common.h#L512

peperunas · 2024-04-07T20:31:38Z

No issues from my side, LGTM!

This commit computes the time of the next screen capture based on the current frame's theoretical time point instead of the actual capture time. This should be slightly more precise and lead to better frame timing.

Before this commit kmsgrab, x11grab, cuda used a "busy" wait that did not in fact keep the thread busy since it called `sleep_for()`. The whole waiting period is now slept for at once. Wlgrab used a true busy wait for a third of the waiting time. This entailed a rather high CPU usage. The watiting method has been aligned to the simplified method of the other Linux capture backends. (Benchmark data can be found in the discussion of PR LizardByte#2333.)

Logging code for sleep overshoot analysis has been added to all Linux capture backends. Duplicated code has been moved out to common platform code.

gschintgen · 2024-04-12T07:38:55Z

Thanks for reviewing and merging @cgutman!

gschintgen mentioned this pull request Mar 30, 2024

Microstuttering due to wrong capture or encoding rate: 59.94Hz instead of 60Hz #2286

Closed

3 tasks

ReenigneArcher requested a review from cgutman March 30, 2024 22:15

ReenigneArcher added the pr:planned label Mar 30, 2024

gschintgen force-pushed the fix-frametiming-linux branch from b8f6c4d to 7a7522d Compare April 1, 2024 19:35

ReenigneArcher changed the title ~~Improve frametiming for kmsgrab and x11grab~~ Improve frametiming for linux capture Apr 1, 2024

This was referenced Apr 2, 2024

Gamescope causing stuttering issues #1429

Closed

Framepacing issues under linux #259

Closed

cgutman requested changes Apr 3, 2024

View reviewed changes

gschintgen commented Apr 5, 2024

View reviewed changes

gschintgen requested a review from cgutman April 5, 2024 08:57

cgutman approved these changes Apr 11, 2024

View reviewed changes

gschintgen added 5 commits April 11, 2024 22:36

Fix frametiming for kmsgrab and x11grab

d5d6bf6

This commit computes the time of the next screen capture based on the current frame's theoretical time point instead of the actual capture time. This should be slightly more precise and lead to better frame timing.

Fix frametiming for NvFCB (Linux)

e696a7b

kmsgrab: add sleep overshoot debugging

d11cc14

Capture/Linux: add sleep overshoot logging

5737f0d

Logging code for sleep overshoot analysis has been added to all Linux capture backends. Duplicated code has been moved out to common platform code.

cgutman force-pushed the fix-frametiming-linux branch from 87a5d93 to 5737f0d Compare April 12, 2024 03:36

cgutman added this to the v0.23.1 milestone Apr 12, 2024

ReenigneArcher merged commit fcd4c07 into LizardByte:nightly Apr 12, 2024
50 of 51 checks passed

gschintgen mentioned this pull request May 6, 2024

NvFBC retrieves slightly outdated images. #2472

Open

3 tasks

gschintgen deleted the fix-frametiming-linux branch May 6, 2024 17:27

This was referenced May 7, 2024

Support XDG portal and Pipewire #2507

Draft

Any kind of frame limiting makes the stream (micro)stutter #1998

Closed

KuleRucket pushed a commit to KuleRucket/Sunshine that referenced this pull request Jun 6, 2024

Improve frametiming for linux capture (LizardByte#2333)

c2159a1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve frametiming for linux capture #2333

Improve frametiming for linux capture #2333

gschintgen commented Mar 30, 2024

CLAassistant commented Mar 30, 2024 •

edited

Loading

codecov bot commented Mar 30, 2024 •

edited

Loading

ReenigneArcher commented Mar 30, 2024

peperunas commented Apr 1, 2024

gschintgen commented Apr 1, 2024

gschintgen commented Apr 1, 2024

gschintgen commented Apr 1, 2024

gschintgen commented Apr 1, 2024 •

edited

Loading

gschintgen commented Apr 1, 2024

peperunas commented Apr 2, 2024

gschintgen commented Apr 2, 2024

ReenigneArcher commented Apr 2, 2024

gschintgen commented Apr 2, 2024

cgutman left a comment

gschintgen commented Apr 3, 2024

gschintgen commented Apr 3, 2024

gschintgen commented Apr 4, 2024

peperunas commented Apr 4, 2024

gschintgen Apr 5, 2024

gschintgen commented Apr 5, 2024

peperunas commented Apr 6, 2024

gschintgen commented Apr 6, 2024

peperunas commented Apr 7, 2024

gschintgen commented Apr 12, 2024

Improve frametiming for linux capture #2333

Improve frametiming for linux capture #2333

Conversation

gschintgen commented Mar 30, 2024

Description

Screenshot

Issues Fixed or Closed

Type of Change

Checklist

Branch Updates

CLAassistant commented Mar 30, 2024 • edited Loading

codecov bot commented Mar 30, 2024 • edited Loading

Codecov Report

ReenigneArcher commented Mar 30, 2024

peperunas commented Apr 1, 2024

gschintgen commented Apr 1, 2024

gschintgen commented Apr 1, 2024

gschintgen commented Apr 1, 2024

gschintgen commented Apr 1, 2024 • edited Loading

gschintgen commented Apr 1, 2024

peperunas commented Apr 2, 2024

gschintgen commented Apr 2, 2024

ReenigneArcher commented Apr 2, 2024

gschintgen commented Apr 2, 2024

cgutman left a comment

Choose a reason for hiding this comment

gschintgen commented Apr 3, 2024

gschintgen commented Apr 3, 2024

gschintgen commented Apr 4, 2024

First test

Second test

3rd test

peperunas commented Apr 4, 2024

gschintgen Apr 5, 2024

Choose a reason for hiding this comment

gschintgen commented Apr 5, 2024

peperunas commented Apr 6, 2024

gschintgen commented Apr 6, 2024

peperunas commented Apr 7, 2024

gschintgen commented Apr 12, 2024

CLAassistant commented Mar 30, 2024 •

edited

Loading

codecov bot commented Mar 30, 2024 •

edited

Loading

gschintgen commented Apr 1, 2024 •

edited

Loading