Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak on AMD when using HEVC only #151

Closed
Skerga opened this issue Nov 23, 2024 · 13 comments
Closed

Memory leak on AMD when using HEVC only #151

Skerga opened this issue Nov 23, 2024 · 13 comments

Comments

@Skerga
Copy link

Skerga commented Nov 23, 2024

Hey I observed a Memory leak when connecting to wolf via an Android client.

  • Clients used: Pixel6 & Pixel9 with newest Appstore version of Moonlight.

on my laptop no Memory leak occurs, so i think its Android exclusive.

  • Images tested: Firefox, Lutris and Retroarch

Memory leak occurs the second the stream starts (no actions need to be performed on the client on the running container).

  • Host: AMD gpu Vega64.

System:
Kernel: 6.8.0-49-generic arch: x86_64 bits: 64 compiler: gcc v: 13.2.0 clocksource: tsc
Desktop: Cinnamon v: 6.2.9 tk: GTK v: 3.24.41 wm: Sway with: docker,waybar vt: 7 dm: LightDM
v: 1.30.0 Distro: Linux Mint 22 Wilma base: Ubuntu 24.04 noble
Machine:
Type: Desktop Mobo: Micro-Star model: B350 GAMING PLUS (MS-7A34) v: 4.0
serial: uuid: UEFI: American Megatrends v: M.B0
date: 07/24/2018
CPU:
Info: 8-core model: AMD Ryzen 7 2700X bits: 64 type: MT MCP smt: enabled arch: Zen+ rev: 2 cache:
L1: 768 KiB L2: 4 MiB L3: 16 MiB
Speed (MHz): avg: 2752 high: 4200 min/max: 2200/3700 boost: enabled cores: 1: 2096 2: 2438
3: 2802 4: 2800 5: 4200 6: 4181 7: 2097 8: 2087 9: 2098 10: 2462 11: 4199 12: 4187 13: 2100
14: 2097 15: 2100 16: 2094 bogomips: 118401
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
Device-1: AMD Vega 10 XL/XT [Radeon RX 56/64] vendor: ASUSTeK driver: amdgpu v: kernel
arch: GCN-5 pcie: speed: 8 GT/s lanes: 16 ports: active: HDMI-A-2 empty: DP-1, DP-2, DVI-D-1,
HDMI-A-1 bus-ID: 22:00.0 chip-ID: 1002:687f class-ID: 0300

Here my compose file:

version: "3"
services:
wolf:
image: ghcr.io/games-on-whales/wolf:stable
environment:
- XDG_RUNTIME_DIR=/tmp/sockets
- HOST_APPS_STATE_FOLDER=/etc/wolf
- WOLF_LOG_LEVEL=DEBUG
volumes:
- /etc/wolf/:/etc/wolf
- /tmp/sockets:/tmp/sockets:rw
- /var/run/docker.sock:/var/run/docker.sock:rw
- /dev/:/dev/:rw
- /run/udev:/run/udev:rw
device_cgroup_rules:
- 'c 13:* rmw'
devices:
- /dev/dri
- /dev/uinput
- /dev/uhid
network_mode: host
restart: unless-stopped

here are debug logs from a short firefox session:
wolf_log_firefox.log
and a debug log from Lutrs untill all ram was exhaused:
wolf_log_lutris..log

and here the crash dump created after running out of memory:
dump.zip

@ABeltramo
Copy link
Member

Before I dive into this, a quick question: are you running latest Wolf image?
You can update using

docker pull ghcr.io/games-on-whales/wolf:stable

And restart the container

@Skerga
Copy link
Author

Skerga commented Nov 24, 2024

I pulled all images just before i tested, after i realised there is a problem happening.
Just tried again and it didn't happen. But i had this kind of problem before once already, my guess is that this happens only under some conditions. but i'm not yet sure what might trigger it to happen.
I'll keep an eye on this maybe i find what action causes this to happen.

@Skerga
Copy link
Author

Skerga commented Nov 24, 2024

I managed to trigger it again, while playing. Using the newest image the problem started after this msg apeard in the log:

01:07:16.824051423 WARN | [GSTREAMER] Size of frame too large, 274 packets is bigger than the max (255); skipping FEC
01:07:16.824504348 WARN | [GSTREAMER] Size of frame too large, 274 packets is bigger than the max (255); skipping FEC
01:07:16.825594343 WARN | [GSTREAMER] Size of frame too large, 272 packets is bigger than the max (255); skipping FEC
01:07:17.910738303 WARN | [GSTREAMER] Size of frame too large, 347 packets is bigger than the max (255); skipping FEC
01:07:17.911219778 WARN | [GSTREAMER] Size of frame too large, 347 packets is bigger than the max (255); skipping FEC
01:07:17.911686213 WARN | [GSTREAMER] Size of frame too large, 345 packets is bigger than the max (255); skipping FEC
01:07:20.189441936 WARN | [GSTREAMER] Size of frame too large, 290 packets is bigger than the max (255); skipping FEC
01:07:20.189912230 WARN | [GSTREAMER] Size of frame too large, 290 packets is bigger than the max (255); skipping FEC
01:07:20.190350313 WARN | [GSTREAMER] Size of frame too large, 288 packets is bigger than the max (255); skipping FEC
01:07:21.343157691 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:21.343544932 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:21.343936404 WARN | [GSTREAMER] Size of frame too large, 303 packets is bigger than the max (255); skipping FEC
01:07:22.446560728 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:22.447525966 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:22.448460603 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:23.750222159 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:23.751280250 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:23.752074872 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:25.472555129 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:25.473117444 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:25.473545686 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:27.173342997 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:27.173721248 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:27.174083437 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:28.903762255 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:28.904470703 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:28.905169092 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:30.622409449 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:30.623186959 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:30.623892587 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:32.338994366 WARN | [GSTREAMER] Size of frame too large, 305 packets is bigger than the max (255); skipping FEC
01:07:32.339721225 WARN | [GSTREAMER] Size of frame too large, 305 packets is bigger than the max (255); skipping FEC
01:07:32.340424042 WARN | [GSTREAMER] Size of frame too large, 303 packets is bigger than the max (255); skipping FEC
01:07:34.074014516 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:34.074525768 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:34.075008940 WARN | [GSTREAMER] Size of frame too large, 303 packets is bigger than the max (255); skipping FEC
01:07:35.704969999 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:35.705529872 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC
01:07:35.706085466 WARN | [GSTREAMER] Size of frame too large, 304 packets is bigger than the max (255); skipping FEC

afterwards every used image gets the leak, persistant even after restarting the wolf container. only a Host restart seemed to fix it.
A symptom of this problem is an extremly sluggish mouse, delayed by more that a second.

@ABeltramo
Copy link
Member

I've tested a few things in code and I can't reproduce a memory leak when hitting that edge case so I don't think that's the underlying cause of it; I'm way more suspicious by the fact that this is only triggered by Moonlight Android and I'd like to dig deeper there: what are the differences between the laptop and the Android client?

  • Can you try forcing the exact same resolution + FPS + bitrate that is working on your laptop for your phone?
    • Can you also double-check whilst the stream is running that you also get the same encoder used in Moonlight by issuing CTRL+ALT+SHIFT+S on your laptop and by setting the stats overlay in the Moonlight Android settings before connecting?
  • How are the network connections here? All in Wifi? All in LAN?
  • Can you please run all these tests with one and only one client connected at a time?
  • Does everything still happens if you reduce the resolution?

I think that what's happening here is that for some reason the host can't keep up the requested framerate (could be GPU, network or other) and when that happens we are leaking some frames in our encoding pipeline.

@Skerga
Copy link
Author

Skerga commented Nov 26, 2024

I think i found what causes this for me:
Android decoder settings are set to Auto on default, that lead to android using: "c2.exynos.hvec.decoder"(h.265) after forcing h.264 like my laptop uses, which is also set to auto, the problem disappeared.
Just for completions sake:

  • both use 1080p60fps@40mbit; android: h.265 laptop: h.264
  • Android only Wifi; Laptop works using booth.
  • the problem also disappeared on 420p with h.265 on my phone
    In conclusion: the h.265 implementation of my GPU is shit... which will most likely lead to frames not being processed causing some buffer to grow until all memory is used up.

@ABeltramo ABeltramo changed the title Memory leak on Android client Memory leak on AMD when using HEVC only Nov 26, 2024
@kode54
Copy link
Contributor

kode54 commented Jan 9, 2025

I appear to have triggered this with an AMD RX 7700 XT as the host GPU running on Docker on an Arch machine, serving to an M4 Pro Mac mini. It doesn't trigger under most games, but it triggered this time when running Borderlands 3.

Borderlands 3 also wanted to start running while the shader compilation was going, which I had to cancel, but could not, because the leak caused it to spiral out of control.

@kode54
Copy link
Contributor

kode54 commented Jan 9, 2025

Nope, Borderlands 3 makes it continuously leak memory even with H.264.

@ABeltramo ABeltramo reopened this Jan 9, 2025
@ABeltramo
Copy link
Member

I need a bit more info:

  • what's leaking? RAM or GPU VRAM?
  • does it only happen when starting Borderlands? Is it reliably broken, or does it only happen randomly?
  • can you check the stats on Moonlight (CTRL+ALT+SHIFT+S on desktop) and report what you see there?

@ABeltramo
Copy link
Member

Btw there are tons of reports on protondb https://www.protondb.com/app/397540 with all kind of issues, might be something specific to this game and Proton..

@kode54
Copy link
Contributor

kode54 commented Jan 10, 2025

I need a bit more info:

  • what's leaking? RAM or GPU VRAM?

wolf process "res" stat climbing past 45g.

  • does it only happen when starting Borderlands? Is it reliably broken, or does it only happen randomly?

Reliably shortly after starting the game, by the time it reaches the menu.

  • can you check the stats on Moonlight (CTRL+ALT+SHIFT+S on desktop) and report what you see there?

I'll check and report back when I'm on my desktop again.

Btw there are tons of reports on protondb https://www.protondb.com/app/397540 with all kind of issues, might be something specific to this game and Proton..

Figures it would break in the time since I last checked it, in November.

@ABeltramo
Copy link
Member

I've just pushed a fix that should potentially fix this, you can update to the latest with:

docker pull ghcr.io/games-on-whales/wolf:stable

Let me know how it goes!

@kode54
Copy link
Contributor

kode54 commented Jan 10, 2025

I've just pushed a fix that should potentially fix this, you can update to the latest with:

docker pull ghcr.io/games-on-whales/wolf:stable

Let me know how it goes!

That fixed it, thanks!

@ABeltramo
Copy link
Member

Thanks for testing it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants