Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fossilize influenced by I/O load on unrelated drives #182

Open
SimplyCorbett opened this issue Feb 3, 2022 · 4 comments
Open

Fossilize influenced by I/O load on unrelated drives #182

SimplyCorbett opened this issue Feb 3, 2022 · 4 comments

Comments

@SimplyCorbett
Copy link

Your system information

  • Steam client version (build number or date): latest (Jan 16 2022)
  • Distribution (e.g. Ubuntu): Gentoo
  • Opted into Steam client beta?: [Yes/No] No
  • Have you checked for system updates?: [Yes/No] Yes

Hardware:
3900x
64GB RAM
Rest described below.

When using the following setup with an I/O load fossilize will go down from maxing my processor to using only one core at 50% to having no CPU usage at all.

Setup:
BTRFS RAID0 (2x512GB /)
BTRFS RAID0 (2x10TB /mnt/RAID)
BTRFS Single drive (1x3TB /mnt/RAID/3TB)
Have steam install and download games in /home/user

How to replicate:
Start processing vulkan shaders;
Start transferring files from /mnt/RAID/3TB to /mnt/RAID (this shouldn't cause any problems as the I/O is not on the main drive).

Fossilize now vanishes from system and stops working until the transfer is completed, cancelled or paused.

Reason for bug:

The I/O is not on the main drive that steam is installed on, so it shouldn't affect fossilize. On my 3900x the load average is 3. Plenty of CPU for fossilize to run.

Pausing the transfer results in the CPU suddenly hitting 100% with fossilize using all cores after a second or two.

@kisak-valve kisak-valve transferred this issue from ValveSoftware/steam-for-linux Feb 3, 2022
@kisak-valve kisak-valve changed the title Fossilize (vulkan shaders) and I/O load issues Fossilize influenced by I/O load on unrelated drives Feb 3, 2022
@kakra
Copy link
Contributor

kakra commented Feb 3, 2022

This has mainly been discussed here: #99

The behavior you observe is due to the introduction of watching IO PSI in the kernel which is global across all drives. You could try turning the kernel PSI feature off: Add psi=0 to your kernel command line (https://facebookmicrosites.github.io/psi/docs/overview). It may be possible that your kernel defaults to on.

This change actually included PSI support which fixed desktop stalls for most users: 200b19c

It was introduced because shader compilation is not actually a CPU-only thing, it also involves a lot of inefficient IO (it's actually not much IO but it is pretty random in the driver caches and thus inefficient, especially on btrfs).

@HansKristian-Work
Copy link
Collaborator

Yes, Fossilize has to go out of its way to not make other stuff go slow on the system, especially anything related to IO since we can quickly swarm IO caches when 10+ threads hammer out shader caches with non-ideal access patterns.

@kakra
Copy link
Contributor

kakra commented Feb 4, 2022

@HansKristian-Work It may be possible that the dirty pages watcher is a bit too aggressive: If copying large files, dirty data is expected. Maybe it should watch PSI only, and if there is no PSI feature available, it should fall back to a less aggressive dirty pages watcher? Or just make that less aggressive in general?

Personally, I don't care if it pauses when running in the background. It's probably the foreground mode when people would care about it. That said, it works perfectly fine for me in background mode since that change back then - no issues whatsoever. Not sure if the Steam client already uses the control channel to actually switch to aggressive mode when running in foreground. But then again, when such a control option is implemented but Steam doesn't use it, this is probably a Steam bug, not a Fossilize bug.

@kakra
Copy link
Contributor

kakra commented Feb 4, 2022

since we can quickly swarm IO caches when 10+ threads hammer out shader caches with non-ideal access patterns

That's actually the point here even when copying data on another drive... Fossilize needs to get out of the way of other software using the page cache - no matter if that's a different device.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants