Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve frametiming for linux capture #2333

Merged
merged 5 commits into from
Apr 12, 2024

Conversation

gschintgen
Copy link
Contributor

Description

This commit computes the time of the next screen capture based on the current frame's theoretical time point instead of the actual capture time. This should be slightly more precise and lead to better frame timing.

I made the same change in three parts of the codebase:

  • twice in kmsgrab.cpp (display_ram_t and display_vram_t). IIUC these correspond to capture to system RAM for software encoding and capture to VRAM for hardware encoding respectively. Please correct me if I'm wrong; I did not try to understand the whole pipeline.
  • once in x11grab.cpp which presented the same issue.

At this point I did only preliminary testing of the kmsgrab & VA-API case on an AMD 6650. The results were very encouraging with a precise 60.0 Hz instead of the previous average of 59.94 Hz. (See #2286 for details.) I'll vary my testing over the next few days and report back. Any feedback and review is welcome, of course.

If I'm not mistaken this fix should benefit all Linux users with the exception of NvFBC configurations.

Screenshot

Issues Fixed or Closed

Fixes #2286

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Dependency update (updates to dependencies)
  • Documentation update (changes to documentation)
  • Repository update (changes to repository files, e.g. .github/...)

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated the in code docstring/documentation-blocks for new or existing methods/components

Branch Updates

LizardByte requires that branches be up-to-date before merging. This means that after any PR is merged, this branch
must be updated before it can be merged. You must also
Allow edits from maintainers.

  • I want maintainers to keep my branch updated

@CLAassistant
Copy link

CLAassistant commented Mar 30, 2024

CLA assistant check
All committers have signed the CLA.

Copy link

codecov bot commented Mar 30, 2024

Codecov Report

Attention: Patch coverage is 0% with 46 lines in your changes are missing coverage. Please review.

Project coverage is 7.41%. Comparing base (5c1bad7) to head (5737f0d).

Additional details and impacted files
@@            Coverage Diff             @@
##           nightly   #2333      +/-   ##
==========================================
+ Coverage     7.35%   7.41%   +0.06%     
==========================================
  Files           95      95              
  Lines        18949   18965      +16     
  Branches      8130    8070      -60     
==========================================
+ Hits          1393    1406      +13     
- Misses       15857   16519     +662     
+ Partials      1699    1040     -659     
Flag Coverage Δ
Linux 6.96% <0.00%> (+0.03%) ⬆️
Windows 2.05% <0.00%> (+0.03%) ⬆️
macOS-12 8.47% <ø> (+0.08%) ⬆️
macOS-13 7.78% <ø> (+0.06%) ⬆️
macOS-14 8.12% <ø> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/platform/windows/display_base.cpp 9.33% <0.00%> (+0.08%) ⬆️
src/platform/linux/cuda.cpp 1.93% <0.00%> (-0.01%) ⬇️
src/platform/common.h 28.39% <0.00%> (-3.12%) ⬇️
src/platform/linux/kmsgrab.cpp 2.84% <0.00%> (-0.01%) ⬇️
src/platform/linux/wlgrab.cpp 0.00% <0.00%> (ø)
src/platform/linux/x11grab.cpp 34.28% <0.00%> (-0.11%) ⬇️

... and 28 files with indirect coverage changes

@ReenigneArcher
Copy link
Member

Don't worry about the patch "failure" but the lint failure will need to be addressed. https://github.com/LizardByte/Sunshine/actions/runs/8493409915/job/23268083421?pr=2333#step:5:12

@peperunas
Copy link

I will test this PR ASAP. I'll look also into supporting NvFBC - if applicable.

@gschintgen
Copy link
Contributor Author

I force-pushed the revised commit. Unfortunately it's late again and my time for actual testing is limited. I'll do my best. (At least testing something game related is fun ;-))

  • kmsgrab with va-api hardware encoding on AMD is ok. (no code change compared to initial test)
  • kmsgrab with x264 encoding needs testing.
  • x11grab with va-api hardware encoding needs testing.
  • x11grab with x264 encoding needs testing.

@gschintgen
Copy link
Contributor Author

As for the nvfcb codepath, I highly suggest to open a new issue (cc @peperunas ) and make it only about nvfbc on linux. The framerate is configured here:

capture_params.dwSamplingRateMs = 1000 /* ms */ / config.framerate;

and then it's simply passed to nvidia's software:
if (func.nvFBCCreateCaptureSession(handle, &capture_params)) {

At least that's my understanding.
In essence the looping and waiting is all done by nvidia.

What I find peculiar though is that dwSamplingRateMs is defined as an uint32 as far as I can tell. See here:
https://gitlab.com/fzwoch/obs-nvfbc/-/blob/master/NvFBC.h#L935
(I don't seem to have a copy of this header file on my system even though I installed all the development dependencies.)

This could be problematic depending on how this is working in detail. As the sampling rate parameter is an integer, the delay between successive iterations will be 16ms instead of 16.66666666. IF the nvfbc code does not wait and block until the next vsync, and vsync is probably disabled on the host, this will lead to an inexact timing of the captured frames. In essence it would be similar to the situation that this PR addresses for kmsgrab and x11grab.

What I find puzzling though is that this would in effect mean that this aspect of the API itself were broken, since it forces the interval to be quite imprecise.

I think that a) someone a bit more familiar with this should have a look at this and think it through, and b) there should be some in-depth testing as I did in my original bug report, i.e. check the long-term average framerate that moonlight-qt writes to its log at the end of the stream and post it in the newly opened issue. Even a tiny (but supposedly regular) deviation will easily lead to microstutter.

Theoretically it would lead to 62.5 frames per second, some of which would have to be dropped somewhere. Or if nvidia only emits a new frame when the display content has actually changed (let's say at a precise but unrealistic 16.66666ms interval), there will regularly be situations where a whole 16ms interval will fit entirely in the 16.666ms interval between two successive frames. This would then lead to an iteration without capture of a new frame. The next frame, a few microseconds later, would then have to wait almost a complete frame interval until it's finally captured by the next iteration and then encoded end emitted. I can't imagine how this could result in a perfectly smooth stream.

@gschintgen
Copy link
Contributor Author

And even without that 16ms interval fitting right into a 16.666ms interval, you'd still have two frequencies close to each other. That will always lead to beating and hence issues with frame pacing. Unless each 16ms interval is dragged out via some synching mechanism until the next 16.666ms frame render interval is done.

@gschintgen
Copy link
Contributor Author

gschintgen commented Apr 1, 2024

Oh well, I'll probably have to take all that stuff about nvfbc back. I don't know how, but somehow I must have missed this part:

while (next_frame > now) {
std::this_thread::sleep_for(1ns);
now = std::chrono::steady_clock::now();
}
next_frame = now + delay;

which does exhibit the same issue as for kmsgrab and x11grab. The dwSamplingRateMs thing is probably just to populate the structure passed to nvfbc.

I must admit that I'm struggling a bit with following the code.

@gschintgen
Copy link
Contributor Author

I've gone ahead and copy-pasted the change from (kms|x11)grab to cuda.cpp.

Let's see how it turns out... cc @peperunas

@ReenigneArcher ReenigneArcher changed the title Improve frametiming for kmsgrab and x11grab Improve frametiming for linux capture Apr 1, 2024
@peperunas
Copy link

As me tioned in the related issue, the patch works well for me.

Great job!

@gschintgen
Copy link
Contributor Author

As me tioned in the related issue, the patch works well for me.

Great job!

Thanks for testing! Just to make sure: That's still using NvFBC for capturing (using patched drivers)?

@ReenigneArcher
Copy link
Member

We should probably remove this note: https://github.com/LizardByte/Sunshine/pull/1438/files

@gschintgen
Copy link
Contributor Author

We should probably remove this note: https://github.com/LizardByte/Sunshine/pull/1438/files

I'm not sure either way, but I can of course remove that note. There seem to be two separate issues though in #1429:

  • Gamescope causing stutter on the host,
  • a disabled Gamescope still leaving a bit of remaining microstutter.

That second microstutter could well be the one fixed here. Not sure about the first one. Also: Does it introduce that stutter only while streaming or also when using gamescope as compositor and playing locally on the host? (cc @Arbitrate3280 )

As for the first one it does remind me of my oldish mini pc: if I start moonlight from inside a desktop environment using ubuntu's default compositor (gnome's mutter) it's an uneven mess, but if I start from a plain virtual console it's buttery smooth (with my PR that is...). That's client-side though, not host-side.

Copy link
Collaborator

@cgutman cgutman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update wlgrab.cpp?

@gschintgen
Copy link
Contributor Author

Can you also update wlgrab.cpp?

Sure, I just didn't think of it!

But just now I'm noticing that it does raise all of those questions that I didn't want to raise in this PR. (Because I'm not sure I'll have the time to properly follow through and do all the investigations and experimentation.)

Anyway, the waiting loop in (kms|x11)grab is like this (before my PR):

if (next_frame > now) {
std::this_thread::sleep_for((next_frame - now) / 3 * 2);
}
while (next_frame > now) {
std::this_thread::sleep_for(1ns);
now = std::chrono::steady_clock::now();
}
next_frame = now + delay;

It first waits using sleep_for and then it tries to do a kind of busy-loop until the right nanosecond (which is unrealistic). I was wondering what the point of this approach is? Is it
a) because no sleep function in the standard library is precise enough?
b) a vague attempt to spin up the cpu and have it ramp up its frequency?
I suppose it's a). But won't any sleep_for yield to the kernel scheduler anyway? In that case line 1203 would negate the whole idea of first having the supposedly inexact sleep_for() for 2/3 of the interval complemented by a busy-wait. (I'm noticing that top gives some 7-9% cpu usage when sunshine is streaming and not the expected 33% of a core if it were indeed busy-waiting. How is cpu usage currently using wlgrab?)

Within wlgrab.cpp the sleeping approach is the same, except for the "missing" nanosecond sleep in the busy-waiting-loop:

if (next_frame > now) {
std::this_thread::sleep_for((next_frame - now) / 3 * 2);
}
while (next_frame > now) {
now = std::chrono::steady_clock::now();
}
next_frame = now + delay;

So in essence I'm wondering if wlgrab capture is even affected by this issue! (But it should have higher cpu usage.)

Here is the commit that introduced the 1ns-sleep: e3f642a
It was intended to reduce CPU usage. Which it probably does, since the busy-wait is probably no longer kept busy!

To be honest I'm wondering how important all of that timing micro-optimization really is. In particular if we ensure that on average the pacing is the theoretically exact 60.00 fps (or whatever is requested by moonlight). How large could this overshoot be? Is it even reasonable to reserve a third(!) of the time for busy-waiting until the precise nanosecond? I googled around a bit, and all I could find was that in practice the sleep_for overshoot seems to be <= 1ms. Even for 120fps, with a frametime interval of 8.3ms it should be sufficient to only reserve the last 1/8 (instead of 1/3) for busy-waiting. Or better yet, given the overshoot is probably rather constant, it may be best to define that busy-wait in absolute terms (e.g. 1.5ms) instead of relative.

What do you think?

@gschintgen
Copy link
Contributor Author

BTW software encoding (kmsgrab) also leads to the expected 60.00 fps:

01:33:55 - SDL Info (0): Global video stats
01:33:55 - SDL Info (0): ----------------------------------------------------------
Incoming frame rate from network: 60.00 FPS
Decoding frame rate: 60.00 FPS
Rendering frame rate: 60.00 FPS
Host processing latency min/max/average: 6.1/61.9/10.1 ms
Frames dropped by your network connection: 0.00%
Frames dropped due to network jitter: 0.00%
Average network latency: 1 ms (variance: 0 ms)
Average decoding time: 0.54 ms
Average frame queue delay: 0.08 ms
Average rendering time (including monitor V-sync latency): 6.08 ms

(I'm not sure why the encoding misbehaved 3 or 4 times in those 90 minutes (could well be that the host did some unrelated background processing), but apart from that it was as smooth as can be expected given those global stats.

@gschintgen
Copy link
Contributor Author

Investigating all those timing questions in my previous post was unexpectedly straightforward, thanks to the contributor that already added all the overshoot measurement code to the Windows side of Sunshine... IOW I just copied over the code from Windows to Linux kmsgrab and did some tests.

First test

Current sleep methodology (sleep_for for 2/3 of the time, then suspicious "busy"-waiting), streaming my Gnome desktop, mostly idle, running glxgears, wiggling it around a bit, light stuff.

As you can see below (in the first part of the log output) the maximum overshoot is negligible (<1ms).

Then I loaded the CPU with stress -c 14 where 14 is the number of physical cores of my CPU (6 performance cores supporting hyperthreading and 8 efficiency cores). You can see the timings breaking down a bit and the overshoot even surpasses 3ms at one point.

[2024:04:04:20:00:44]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.0ms
[2024:04:04:20:01:04]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.0ms
[2024:04:04:20:01:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.5ms/0.0ms
[2024:04:04:20:01:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.5ms/0.0ms
[2024:04:04:20:02:05]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.0ms
[2024:04:04:20:02:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.0ms
[2024:04:04:20:02:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.0ms
[2024:04:04:20:03:05]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.0ms
[2024:04:04:20:03:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.0ms
[2024:04:04:20:03:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.5ms/0.0ms
[2024:04:04:20:04:05]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.5ms/0.0ms
[2024:04:04:20:04:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.5ms/0.0ms
[2024:04:04:20:04:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.7ms/0.0ms
[2024:04:04:20:05:05]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.4ms/0.0ms
[2024:04:04:20:05:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.0ms
[2024:04:04:20:05:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.0ms
[2024:04:04:20:06:05]: Debug: Sleep overshoot (min/max/avg): 0.0ms/3.3ms/0.0ms
[2024:04:04:20:06:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/1.9ms/0.0ms
[2024:04:04:20:06:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.3ms/0.0ms
[2024:04:04:20:07:05]: Debug: Sleep overshoot (min/max/avg): 0.0ms/1.5ms/0.0ms
[2024:04:04:20:07:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.7ms/0.0ms
[2024:04:04:20:07:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/2.5ms/0.0ms
[2024:04:04:20:08:05]: Debug: Sleep overshoot (min/max/avg): 0.0ms/3.9ms/0.0ms
[2024:04:04:20:08:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/1.4ms/0.0ms
[2024:04:04:20:08:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/1.1ms/0.0ms
[2024:04:04:20:09:05]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.7ms/0.0ms
[2024:04:04:20:09:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.9ms/0.0ms
[2024:04:04:20:09:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/1.2ms/0.0ms
[2024:04:04:20:10:05]: Debug: Sleep overshoot (min/max/avg): 0.0ms/1.6ms/0.0ms
[2024:04:04:20:10:25]: Debug: Sleep overshoot (min/max/avg): 0.0ms/1.8ms/0.0ms
[2024:04:04:20:10:45]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.3ms/0.0ms

Second test

Same as before, except that I removed the busy-wait logic and replaced it by a single sleep_for with the theoretical waiting time, leaving it up to the kernel scheduler to get the timing mostly right. I find the results quite
interesting: I can't even tell at what point I launched the 14-core stress test! The timings are arguably better by just leaving it up to the kernel. I had to double check the filestamps to make sure that I even replaced (and reloaded)
the binary.

[2024:04:04:20:22:54]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.3ms/0.1ms
[2024:04:04:20:23:14]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.1ms
[2024:04:04:20:23:34]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.3ms/0.1ms
[2024:04:04:20:23:54]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.3ms/0.1ms
[2024:04:04:20:24:14]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.3ms/0.1ms
[2024:04:04:20:24:34]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.1ms
[2024:04:04:20:24:54]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.1ms
[2024:04:04:20:25:14]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.1ms
[2024:04:04:20:25:34]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.3ms/0.1ms
[2024:04:04:20:25:54]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.1ms
[2024:04:04:20:26:14]: Debug: Sleep overshoot (min/max/avg): 0.1ms/0.1ms/0.1ms
[2024:04:04:20:26:34]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.1ms
[2024:04:04:20:26:54]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.1ms/0.1ms
[2024:04:04:20:27:14]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.3ms/0.1ms
[2024:04:04:20:27:34]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.3ms/0.1ms
[2024:04:04:20:27:54]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.5ms/0.1ms
[2024:04:04:20:28:14]: Debug: Sleep overshoot (min/max/avg): 0.0ms/2.1ms/0.1ms
[2024:04:04:20:28:34]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.2ms/0.1ms

Only by adding more stress (starting a youtube video in firefox in addition to stress) I could get it to break down and show a maximum of 3 or even 4ms. (Not shown above.)

Of course this test might not correspond perfectly to the main use case, but I think it should be a good enough test of how stable the timings are when under CPU load.

My plan for this PR is now:

  • add the overshoot stats code to all linux capture paths
  • unify the sleeping code to a single sleep_for

If it turns out after more varied testing of the nightlies that the overshoot is deemed problematic after all, it's fairly easy to add a busy-wait back in, but in that case it probably should not call sleep_for itself.

3rd test

I also tested (still with kmsgrab) the true busy-wait loop as it can currently be found in wlgrab.cpp, i.e. without that sleep_for(1ns). As expected, the CPU usage increased dramatically. From 7-8% of a core in both previous cases (single sleep or sleep_for followed by a loop of sleep_for's) to around 40%. On the plus side it reduced the maximum sleep overshoot to a perfect 0.0ms. That is, until I started up stress -c 14 again. (peak of 2.9ms). If I additionally played around with youtube in firefox, the timings couldn't keep up at two specific time points. (See below. The average still stays at 0.0 though)

[2024:04:04:21:00:58]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:01:18]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:01:38]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:01:58]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:02:18]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:02:38]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:02:58]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:03:18]: Debug: Sleep overshoot (min/max/avg): 0.0ms/2.9ms/0.0ms
[2024:04:04:21:03:38]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:03:58]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:04:18]: Debug: Sleep overshoot (min/max/avg): 0.0ms/0.0ms/0.0ms
[2024:04:04:21:04:38]: Debug: Sleep overshoot (min/max/avg): 0.0ms/13.3ms/0.0ms
[2024:04:04:21:04:58]: Debug: Sleep overshoot (min/max/avg): 0.0ms/13.5ms/0.0ms
[2024:04:04:21:05:18]: Debug: Sleep overshoot (min/max/avg): 0.0ms/1.3ms/0.0ms

@peperunas
Copy link

Outstanding work @gschintgen!

gschintgen added a commit to gschintgen/Sunshine that referenced this pull request Apr 5, 2024
Before this commit kmsgrab, x11grab, cuda used a "busy" wait that did
not in fact keep the thread busy since it called `sleep_for()`.
The whole waiting period is now slept for at once.

Wlgrab used a true busy wait for a third of the waiting time. This
entailed a rather high CPU usage. The watiting method has been aligned
to the simplified method of the other Linux capture backends.

(Benchmark data can be found in the discussion of PR LizardByte#2333.)
sleep_overshoot_tracker.collect_and_callback_on_interval(overshoot_ns.count() / 1000000., print_info, 20s);
}
std::chrono::nanoseconds overshoot_ns = std::chrono::steady_clock::now() - sleep_target;
log_sleep_overshoot(overshoot_ns);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to avoid too much code duplication, I moved out the overshoot logging to common platform code. Unfortunately I don't have a Windows build environment set up, so I couldn't test these changes.

@gschintgen gschintgen requested a review from cgutman April 5, 2024 08:57
@gschintgen
Copy link
Contributor Author

In the latest commits I harmonized the sleeping code for all Linux capture methods. I did only minor testing to confirm that nothing broke and that the timings remain as good as with the original changes in this PR. (I couldn't test the minor changes to the Windows capture code.)
I'm not super satisfied of having multiple lines of actual code (log_sleep_overshoot) in a header file, but I didn't know where else to put it. It should be fine though.

Please review. (And re-testing would be nice too.)

@peperunas
Copy link

I am testing your code as I write. One input I have is that maybe the logging might be done in a debug build - if such a build is supported by Sunshine's build system - to save some calculations.

@gschintgen
Copy link
Contributor Author

I am testing your code as I write. One input I have is that maybe the logging might be done in a debug build - if such a build is supported by Sunshine's build system - to save some calculations.

Thanks for testing! The logging & printing code is only executed when the log level in sunshine's configuration is set to debug or higher. The log level can easily be changed in the UI. Here is the relevant if:
https://github.com/gschintgen/Sunshine/blob/87a5d93327070323a6b0b9b4f9e33d8150a878ac/src/platform/common.h#L512

@peperunas
Copy link

No issues from my side, LGTM!

This commit computes the time of the next screen capture based on the
current frame's theoretical time point instead of the actual capture
time. This should be slightly more precise and lead to better frame
timing.
Before this commit kmsgrab, x11grab, cuda used a "busy" wait that did
not in fact keep the thread busy since it called `sleep_for()`.
The whole waiting period is now slept for at once.

Wlgrab used a true busy wait for a third of the waiting time. This
entailed a rather high CPU usage. The watiting method has been aligned
to the simplified method of the other Linux capture backends.

(Benchmark data can be found in the discussion of PR LizardByte#2333.)
Logging code for sleep overshoot analysis has been added to all Linux
capture backends. Duplicated code has been moved out to common platform
code.
@cgutman cgutman added this to the v0.23.1 milestone Apr 12, 2024
@gschintgen
Copy link
Contributor Author

Thanks for reviewing and merging @cgutman!

@ReenigneArcher ReenigneArcher merged commit fcd4c07 into LizardByte:nightly Apr 12, 2024
50 of 51 checks passed
@gschintgen gschintgen deleted the fix-frametiming-linux branch May 6, 2024 17:27
KuleRucket pushed a commit to KuleRucket/Sunshine that referenced this pull request Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Microstuttering due to wrong capture or encoding rate: 59.94Hz instead of 60Hz
5 participants