Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input lag with vsync enabled due to limited polling rates #3317

Open
Tracked by #76
alice-i-cecile opened this issue Dec 13, 2021 · 18 comments
Open
Tracked by #76

Input lag with vsync enabled due to limited polling rates #3317

alice-i-cecile opened this issue Dec 13, 2021 · 18 comments
Labels
A-Input Player input via keyboard, mouse, gamepad, and more C-Performance A change motivated by improving speed, memory usage or compile times

Comments

@alice-i-cecile
Copy link
Member

Problem

@aevyrie has observed noticeable input lag in Bevy applications when vsync is enabled.

The most immediate source of this is quite obvious: we're only fetching the input state at the start of each frame, but are rendering is done at the end of the frame.

Input events appear to be moved into the Bevy app by the winit runner:

pub fn winit_runner_with(mut app: App) {

At 60 fps, this means 16 ms of lag, which is noticeable for some applications: namely for precise cursor movement (FPS, GUI applications) and rhythm games.

Possible solutions

  • Poll more regularly (somewhat limited by the OS I believe)
    • I suspect this could be done at the end of each stage
    • Perhaps a dedicated input-polling thread architecture would help?
  • Poll closer to rendering time

Either solution will involve some trickiness, as we must pierce the Bevy schedule in some fashion in order to insert fresh input events into the World at the right time, rather than merely at the beginning of each pass over the schedule.

@alice-i-cecile alice-i-cecile added A-Input Player input via keyboard, mouse, gamepad, and more C-Performance A change motivated by improving speed, memory usage or compile times labels Dec 13, 2021
@aevyrie
Copy link
Member

aevyrie commented Dec 13, 2021

IIRC, we are limited by the winit event loop, and we can't poll mouse input asynchronously.

I had some ideas that I could prototype, that effectively act as a frame-limiter, which is related to #1343.

How bevy currently behaves with vsync on:

0ms -----------------------
        Start of event loop. Get input. Do stuff.
4ms     Done.
        --
        Sit around and do nothing
        --
        Send to GPU, present frame (input is 16ms out of date)
16ms ----------------------

How bevy currently behaves with vsync off:

0ms -----------------------
        Start of event loop. Get input. Do stuff.
        Send to GPU, present frame (input is 4ms out of date)
4ms     Done.
        Start of event loop. Get input. Do stuff.
        Send to GPU, present frame (input is 4ms out of date)
8ms     Done.
        Start of event loop. Get input. Do stuff.
        Send to GPU, present frame (input is 4ms out of date)
12ms    Done.
        Start of event loop. Get input. Do stuff.
        Send to GPU, present frame (input is 4ms out of date)
16ms ----------------------

What I'd like bevy to do with vsync on:

0ms -----------------------
        Sleep until we have just enough time to render a frame, based on how long it took previously.
12ms    Start of event loop. Get input. Do stuff.
        Send to GPU, present frame (input is 4ms out of date)
16ms ----------------------

We could prototype this by adding a system at the end of the event loop that sleeps for a while after the frame has been sent to the GPU. I'm not intimately familiar with how all that works, but I can try something out.

@bjorn3
Copy link
Contributor

bjorn3 commented Dec 13, 2021

I believe mailbox vsync should behave similar to disabled vsync in terms of latency.

What I'd like bevy to do with vsync on:

This is called frame pacing, right? You will have to be careful to ensure that variation in render time doesn't cause frames to be submitted too late. Predicting the right time to sleep can be difficult.

@aevyrie
Copy link
Member

aevyrie commented Dec 13, 2021

I believe mailbox vsync should behave similar to disabled vsync in terms of latency.

That's what I would expect, but not what I experience in bevy apps.

Predicting the right time to sleep can be difficult.

Definitely. My use case for this would be in applications though, I care more about reducing input latency without just letting the app run at 300fps, draining battery. In fact, for the application use case you could safely add a 2x safety factor to predicted frame render time, and still get a huge improvement in latency without risking dropped frames, because applications like this take very little time to render - on the order of 1-3ms.

@minecrawler
Copy link

minecrawler commented Dec 14, 2021

Why sleep before the frame, if you can also sleep at the end?

0ms -----------------------
        Start of event loop. Get input. Do stuff.
12ms    Send to GPU, present frame (input is 4ms out of date)
        Sleep until the remaining frame-time is over.
16ms ----------------------

To be honest, this is what I'd expect a frame-limiting strategy to look like. Just wait at the end until the specified time-frame is over, so that there is no timing issue later on in case the renderer is slower than expected :)

@aevyrie
Copy link
Member

aevyrie commented Dec 14, 2021

That doesn't get around the timing problem, presenting it this way just makes it seem like the problem doesn't exist. That's why I presented it in reverse - it makes the timing problem more apparent.

You need to make sure the time between "send to GPU" is always <16ms to prevent frame drops. In the order you present, you would need to add a sleep between the time the frame is finished and sent to the GPU to act as your factor of safety for any frame time variability. The "gotcha" here is that it seems like you can just sleep until you've hit a total of 16ms, because your total frametime is always 16ms, but it's masking the fact that what you actually care about is the time between "send to GPU" being 16ms.

Anywho, the proof is in the pudding. We should make some prototypes to see if we can make something that works. 😄

@aevyrie
Copy link
Member

aevyrie commented Dec 20, 2021

I did some work on this.

  1. It appears we are still using FIFO vsync:

    present_mode: if window.vsync {
    wgpu::PresentMode::Fifo

    See revert default vsync mode to Fifo #1416

  2. I made a frame limiter app that adds a frame limiting system in the renderer sub app, in the final cleanup stage. You can check it out here: https://github.com/aevyrie/bevy_latency

It works like this:

0ms -----------------------
        Start of event loop. Get input. Do stuff.
 4ms    Send to GPU, present frame (input is 4ms out of date)
New!    Run a stopwatch in a system in RenderStage::Cleanup to time how long a frame takes (not including sleep)
New!    Sleep the thread for the predicted amount of time, with some safety factor added to prevent frame drops, 
            in case it actually takes longer to render the frame than predicted.
16ms ----------------------

Here's a trace to better visualize, notice the large blue bar with the label "framerate limiter"
image

I don't have any empirical measurements, but so far it seems promising.

Mailbox vsync, framerate limiter enabled

bevy.2021-12-19.17-07-51_Trim_Trim.mp4

Mailbox vsync, framerate limiter disabled:

bevy.2021-12-19.17-08-14_Trim.mp4

With the framerate limiter, the 3d cursor feels perceptively less sluggish, but it's still not as good as with vsync off. I tested without vsync both with and without the framerate limiter, however I was still seeing some tearing with the framerate limited to ~60, though on the plus side I did see significantly less GPU/CPU usage.

@aevyrie
Copy link
Member

aevyrie commented Dec 20, 2021

I modified the prototype to add a second sleep system that caps the frame rate to exactly what you specify.

0ms -----------------------
            Estimate how long the next frame will take, minus a small margin to give us space if it takes longer
            Sleep for this duration. (Forward estimation)
8ms         Start of event loop. Get input. Do stuff.
11.8ms      See how close our estimate was to the requested frame time, sleep if required to get the frametime just right
12ms        Send to GPU, present frame (input is 4ms out of date)
16ms ----------------------

image

Red annotation: forward estimation. Blue annotation: precise frametime limiter accounting for error and margin in the last frame's forward estimation.

I'm seeing some really awesome results, the 3d cursor is basically glued to the mouse cursor:

bevy.2021-12-19.23-48-46_Trim.mp4

In addition, I can bring my safety margin pretty low without frame drops - on the order of 100μs. I've bumped it up to 500μm to reduce chances of frame drops, but at the cost of only 400μs more input lag.

Edit: this wasn't possible without spin_sleep to get precise sleep times. Thanks for the suggestion @cwfitzgerald!

@aevyrie
Copy link
Member

aevyrie commented Dec 20, 2021

Some other neat byproducts of this, it's now really easy to framelimit to an arbitrarily low FPS for power use or other reasons. However, because our input -> render latency is constant (3.5ms in my case), the game/app still feels really responsive at low framerates! Here's the demo locked to only 20fps, yet the 3d cursor still doesn't lag behind the OS cursor very much:

bevy.2021-12-20.01-42-14.mp4

It doesn't feel laggy or jello-y, because the motion-to-photon time is still low, instead it only feels choppy because it doesn't update very frequently.

This brings up an interesting idea for system scheduling too. If a game or application is sensitive to input responsiveness, but needs a large frametime budget, they could schedule everything compute intensive after the render stage, instead of between input and the render stage. This means those changes would take up to a full frametime to display, but you now have the ability to only put critical things (like transforming objects in the world based on user input) in the pre-render schedule.

@dafteran4
Copy link

I posted this in discussions, but it's also relevant for this issue... Here is my quick-and-dirty fix based on #6503 (pipelined rendering) which enables multiple app/input updates per single rendered frame. Thus processing input immediately, even when using VSync. It probably breaks all sorts of stuff that implicitly assumes one app frame equals to one render frame. Also, probably doesn't work on all archs - I tested only in Linux where it works fine.

https://github.com/dafteran4/bevy/tree/multi-app-step-while-rendering

@What42Pizza
Copy link

What's the current progress of improving input lag? I have my own ideas on how to improve input timings, but it seems like the problem is MUCH worse then that. All the Bevy programs I've tested have absolutely terrible input lag, taking many frames to process inputs.

Tests:
I used carnac (and recording) to see how long it takes for bevy projects to process inputs (60 fps, release mode, Windows 10, Rust nightly 0.70).
For my project (bevy 0.10), it consistently updated 4 or 5 frames (~75 ms) after carnac did
For vx_bevy (bevy 0.9.1), it consistently updated 3 or 4 frames (~60 ms) after carnac did

This might not sound bad, but it feels absolutely horrible and I can't even consider using Bevy unless this is fixed

@alice-i-cecile
Copy link
Member Author

Have you experimented with bevy_framepace? That was designed in large part to reduce input lag in sensitive GUI applications and in my experience it helps quite a bit.

We're looking to upstream that, and to expose and use UI time stamps as well.

@What42Pizza
Copy link

How exactly do you use that with Bevy 0.10? The version on crates.io is for bevy 0.9 and I can't figure out how to use cargo workspaces to use the version in the pull request

@bjorn3
Copy link
Contributor

bjorn3 commented Mar 11, 2023

Bevy 0.10 was released on the 6th of this month. I can see it just fine on crates.io.

@SUPERCILEX
Copy link
Contributor

The question was how do you use this: aevyrie/bevy_framepace#32.

You should be able to specify Alice's fork like so: https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories

@What42Pizza
Copy link

That worked, thanks!

@aevyrie
Copy link
Member

aevyrie commented Mar 24, 2023

This might not sound bad, but it feels absolutely horrible and I can't even consider using Bevy unless this is fixed

Using bevy_framepace, you should be getting <1 frame of latency unless you've created some sort of system order bug. It can be used with and without Vsync, and should work with all PresentModes.

The latency you experience depends on the PresentMode you are using. Fifo will accumulate frames (I think it caps out at 3?). Mailbox should give you near-perfect results, though it isn't supported on all platforms. I've spent quite a lot of time on this issue, and I'm pretty happy with the results I've been able to achieve, there is nothing inherently wrong with Bevy.

The other thing worth mentioning is the new parallel pipelined renderer will add latency if enabled, as the CPU simulation + GPU render end-to-end can take longer than a single frame.

@rambip
Copy link

rambip commented Feb 17, 2024

Thank's a lot for this help !

With frame_pace, I was able to greatly reduce the latency of my piano app, it was unusable before and now it works !

@morr
Copy link

morr commented Mar 23, 2024

While I'm not professional game developer and just playing with bevy, such behaviour looks weird and broken, no matter what reason causes this. There should be a way to get rid of input lag without disabling vsync or adding some third party crate.

PresentMode::AutoNoVsync
Monosnap screencast 2024-03-23 14-17-43

PresentMode::AutoVsync or bevy_framepace
Monosnap screencast 2024-03-23 14-40-27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Input Player input via keyboard, mouse, gamepad, and more C-Performance A change motivated by improving speed, memory usage or compile times
Projects
None yet
Development

No branches or pull requests

9 participants