Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full system hang on Apple M1 8GB #548

Closed
jrmoulton opened this issue Apr 12, 2024 · 41 comments
Closed

Full system hang on Apple M1 8GB #548

jrmoulton opened this issue Apr 12, 2024 · 41 comments

Comments

@jrmoulton
Copy link

I'm experimenting with swapping in Vello as the renderer for floem and I'm running into an issue where, when using Vello,

  • my entire system will hang
  • everything but the mouse is unresponsive
  • within about a minute the screen and mouse will also become unresponsive (screen dark with only dim backlight showing)
  • the system requires a reboot.

I get this issue in the with_winit example in the Vello examples and also in the editor example in floem.

In the with_winit example it will happen when I zoom in too far into the Ghostscript tiger. I have noticed that there is a limit to the amount that I can zoom in and this causes some stuttering but this is separate from when it becomes unresponsive.

In the editor example the issue is caused when I delete several characters from the starting text in the editor.

Prior to hanging macOS activity monitor doesn't indicate high memory pressure.

The issue isn't consistent (and reproducing takes a long time) but it does happen regularly (within 30 seconds of doing the above listed actions).

Apple M1 MacBook Air
8GB RAM
Sonoma 14.2.1 (23C71)

@bram209
Copy link

bram209 commented Apr 13, 2024

I had the same issue when aggressively zooming in and out. No stuttering, but suddenly the whole system hangs. I am on a M2 Pro 32GB

@raphlinus
Copy link
Contributor

raphlinus commented Apr 13, 2024

We'll look into it. A good way of isolating this is to turn on --use-cpu, then edit the code to set force_gpu_from to various values. That said, when doing aggressive zooming, it's most likely that there will be a panic from buffer overflow - one of the things to do soon is to chance the CPU shaders to match the behavior of the GPU shaders (report a failure and continue).

@armansito
Copy link
Collaborator

I ran into this with some scene content today. I'll investigate.

@waywardmonkeys waywardmonkeys added this to the Vello 0.2 release milestone May 3, 2024
@DJMcNab
Copy link
Member

DJMcNab commented May 16, 2024

Hi @jrmoulton. We think that #551 or #553 might have fixed this issue. Does this still happen for you on main?

@raphlinus
Copy link
Contributor

I'm able to repro this fairly straightforwardly on M1. There are two separate issues, both of which need to be fixed.

The first is that zooming in causes unboundedly large memory usage, specifically in flatten. This is because it's flattening all outlines, using the transform to determine the number of subdivisions. What needs to happen is culling the flattened line segments to the viewport. That's also related to vello#542.

The second problem is that when the "line soup" buffer overflows, that should be detected and all downstream work should early-out. This is what #553 was trying to address, but it seems that didn't catch every case. What should be happening is that test in the binning stage should see that bump.lines > config.lines_size then set a bit in failure based on that, then all downstream shaders should stop work.

I did a little validation of this, mostly observing a panic when using --use-cpu. To really track it down, I'll want to use a machine that doesn't hard-lock on failure, probably my AMD on Windows.

@waywardmonkeys
Copy link
Contributor

waywardmonkeys commented May 16, 2024

My M3 Pro Max 64GB doesn’t want to start correctly again after this. Can’t seem to get to logging in successfully.

edit: third time was the charm

@raphlinus
Copy link
Contributor

A couple of updates; I'm digging into this. First, it does not seem to repro on v0.1.0. Second, I had originally suspected #537, but it repros in the parent of that. My current working hypothesis is that flatten itself is getting stuck. I'm starting to do some testing with all downstream shaders ablated, and haven't seen a full hard hang, but have seen the GPU get into a bad state.

If this is the case, then it's likely that there's a reasonably quick patch to just limit the amount of subdivision in flatten. I'm also starting to wonder whether the best approach is going to be aggressive culling to the viewport in flatten; if nothing else, then that will be a major performance improvement in the highly zoomed in case.

It's not clear to me why this would be triggered from floem, I'm wondering whether there's something invalid about the scene. Could you provide more detailed repro steps?

@raphlinus
Copy link
Contributor

I've locally tried applying this patch:

diff --git a/shader/flatten.wgsl b/shader/flatten.wgsl
index 80da188..2126ad3 100644
--- a/shader/flatten.wgsl
+++ b/shader/flatten.wgsl
@@ -434,7 +434,7 @@ fn flatten_euler(
                 let integrand_peak = sqrt(abs(k_peak * (k_peak * dist_scaled + 1.0)));
                 n_frac = integral * integrand_peak / a;
             }
-            let n = max(ceil(n_frac * scale_multiplier), 1.0);
+            let n = clamp(ceil(n_frac * scale_multiplier), 1.0, 100.0);
             for (var i = 0u; i < u32(n); i++) {
                 var lp1: vec2f;
                 if i + 1u == u32(n) && t1 == 1.0 {

Rendering becomes very slow when zoomed in with big factors, but it doesn't cause a full system hang. That's possibly something to try with the floem use case. Another thing to try is turning CPU shaders - I expect it to panic with an out-of-bounds when writing to LineSoup from flatten.

@jrmoulton
Copy link
Author

@DJMcNab noticed that in my usage of Vello in Floem I wan't ever doing a scene reset.

After adding a scene reset I no longer experience a hang. I don't know if that is considered resolving the issue but it does unblock me from further integration with Vello

raphlinus added a commit that referenced this issue May 29, 2024
As the scale factor becomes unreasonably large, the number of subdivisions of an Euler spiral into lines grows, making the flatten stage take too long.

This patch just bounds the number of subdivisions, so it will eventually make progress. It will be slow, but not hang.

A better solution would be to aggressively cull so it only generates geometry inside the viewport, but that is considerably more complicated.

Workaround for #548
github-merge-queue bot pushed a commit that referenced this issue May 30, 2024
As the scale factor becomes unreasonably large, the number of
subdivisions of an Euler spiral into lines grows, making the flatten
stage take too long.

This patch just bounds the number of subdivisions, so it will eventually
make progress. It will be slow, but not hang.

A better solution would be to aggressively cull so it only generates
geometry inside the viewport, but that is considerably more complicated.

Workaround for #548
@XdaTk
Copy link

XdaTk commented Jul 18, 2024

The problem still exists.

OS: macOS Monterey 12.6.7 x86_64
Host: MacBook Pro (16-inch, 2019)
Kernel: Darwin 21.6.0
Display (Color LCD): 4096x2560 @ 60Hz (as 2048x1280) [Built-in] *
Display (DELL P2422H): 1080x1920 @ 60Hz [External]
CPU: Intel(R) Core(TM) i7-9750H (12) @ 2.60 GHz
GPU 1: Intel UHD Graphics 630 [Integrated]
GPU 2: AMD Radeon Pro 5300M (0 B / 3.98 GiB, 0%) 
Memory: 11.64 GiB / 16.00 GiB (73%)
Swap: 637.50 MiB / 2.00 GiB (31%)
Locale: zh_CN.UTF-8

@DJMcNab
Copy link
Member

DJMcNab commented Jul 18, 2024

@XdaTk can you please provide reproduction steps for what you're seeing?

@XdaTk
Copy link

XdaTk commented Jul 22, 2024

@XdaTk can you please provide reproduction steps for what you're seeing?

git log 

commit 94ce032d53f3ec48d90e7bbbdf739aaae8a40714 (HEAD -> main, origin/main, origin/HEAD)
Author: Daniel McNab <36049421+DJMcNab@users.noreply.github.com>
Date:   Mon Jul 22 10:27:20 2024 +0100


rustc --version
rustc 1.81.0-nightly (5315cbe15 2024-07-11)

cargo run -p with_winit


# Reboot

rustup default stable
rustc 1.79.0 (129f3b996 2024-06-10)
cargo run -p with_winit
cargo build                                                 main
  Downloaded wasm-bindgen-wasm-conventions v0.2.92
  Downloaded wasm-bindgen-shared v0.2.92
  Downloaded leb128 v0.2.5
  Downloaded fallible-iterator v0.2.0
  Downloaded walrus-macro v0.19.0
  Downloaded id-arena v2.2.1
  Downloaded heck v0.3.3
  Downloaded wasm-bindgen-multi-value-xform v0.2.92
  Downloaded wasm-bindgen-wasm-interpreter v0.2.92
  Downloaded base64 v0.21.7
  Downloaded wasm-bindgen-threads-xform v0.2.92
  Downloaded wasm-encoder v0.29.0
  Downloaded miniz_oxide v0.7.3
  Downloaded indexmap v1.9.3
  Downloaded wasmparser v0.80.2
  Downloaded hashbrown v0.12.3
  Downloaded wasm-bindgen-cli-support v0.2.92
  Downloaded cc v1.0.98
  Downloaded serde_json v1.0.117
  Downloaded walrus v0.20.3
  Downloaded gimli v0.26.2
  Downloaded devserver_lib v0.4.2
  Downloaded nv-flip-sys v0.1.1
  Downloaded wasm-bindgen-externref-xform v0.2.92
  Downloaded cargo-run-wasm v0.4.0
  Downloaded nv-flip v0.1.2
  Downloaded 26 crates (2.0 MB) in 1.79s
   Compiling proc-macro2 v1.0.85
   Compiling unicode-ident v1.0.12
   Compiling autocfg v1.3.0
   Compiling libc v0.2.155
   Compiling cfg-if v1.0.0
   Compiling log v0.4.21
   Compiling bitflags v2.5.0
   Compiling arrayvec v0.7.4
   Compiling thiserror v1.0.61
   Compiling smallvec v1.13.2
   Compiling hashbrown v0.14.5
   Compiling anyhow v1.0.86
   Compiling termcolor v1.4.1
   Compiling equivalent v1.0.1
   Compiling cfg_aliases v0.1.1
   Compiling unicode-width v0.1.13
   Compiling bitflags v1.3.2
   Compiling hexf-parse v0.2.1
   Compiling core-foundation-sys v0.8.6
   Compiling rustc-hash v1.1.0
   Compiling unicode-xid v0.2.4
   Compiling parking_lot_core v0.9.10
   Compiling codespan-reporting v0.11.1
   Compiling foreign-types-shared v0.3.1
   Compiling once_cell v1.19.0
   Compiling scopeguard v1.2.0
   Compiling num-traits v0.2.19
   Compiling lock_api v0.4.12
   Compiling bit-vec v0.6.3
   Compiling paste v1.0.15
   Compiling indexmap v1.9.3
   Compiling wgpu-hal v0.21.1
   Compiling indexmap v2.2.6
   Compiling bit-set v0.5.3
   Compiling block v0.1.6
   Compiling raw-window-handle v0.6.2
   Compiling syn v1.0.109
   Compiling wgpu-core v0.21.1
   Compiling wgpu-types v0.20.0
   Compiling libloading v0.8.3
   Compiling hashbrown v0.12.3
   Compiling profiling v1.0.15
   Compiling litrs v0.4.1
   Compiling quote v1.0.36
   Compiling unicode-segmentation v1.11.0
   Compiling wgpu v0.20.1
   Compiling kurbo v0.11.0
   Compiling syn v2.0.66
   Compiling svg_fmt v0.4.3
   Compiling heck v0.3.3
   Compiling leb128 v0.2.5
   Compiling utf8parse v0.2.1
   Compiling objc-sys v0.3.5
   Compiling document-features v0.2.8
   Compiling stable_deref_trait v1.2.0
   Compiling euclid v0.22.10
   Compiling fallible-iterator v0.2.0
   Compiling simd-adler32 v0.3.7
   Compiling anstyle-parse v0.2.4
   Compiling core-foundation v0.9.4
   Compiling malloc_buf v0.0.6
   Compiling objc v0.2.7
   Compiling wasm-encoder v0.29.0
   Compiling gimli v0.26.2
   Compiling static_assertions v1.1.0
   Compiling id-arena v2.2.1
   Compiling parking_lot v0.12.3
   Compiling core-graphics-types v0.1.3
   Compiling peniko v0.1.1
   Compiling is_terminal_polyfill v1.70.0
   Compiling anstyle-query v1.1.0
   Compiling colorchoice v1.0.1
   Compiling wasmparser v0.80.2
   Compiling adler v1.0.2
   Compiling anstyle v1.0.7
   Compiling miniz_oxide v0.7.3
   Compiling guillotiere v0.6.2
   Compiling anstream v0.6.14
   Compiling getrandom v0.2.15
   Compiling crc32fast v1.4.2
   Compiling futures-core v0.3.30
   Compiling memchr v2.7.2
   Compiling objc2-encode v4.0.3
   Compiling flate2 v1.0.30
   Compiling futures-intrusive v0.5.0
   Compiling rand_core v0.6.4
   Compiling fdeflate v0.3.4
   Compiling objc2 v0.5.2
   Compiling regex-syntax v0.8.3
   Compiling strsim v0.11.1
   Compiling aho-corasick v1.1.3
   Compiling crossbeam-utils v0.8.20
   Compiling clap_lex v0.7.0
   Compiling zune-core v0.4.12
   Compiling heck v0.5.0
   Compiling ppv-lite86 v0.2.17
   Compiling clap_builder v4.5.2
   Compiling zune-jpeg v0.4.11
   Compiling png v0.17.13
   Compiling cfg_aliases v0.2.1
   Compiling rustix v0.38.34
   Compiling rand_chacha v0.3.1
   Compiling block2 v0.5.1
   Compiling serde v1.0.203
   Compiling regex-automata v0.4.6
   Compiling dispatch v0.2.0
   Compiling byteorder v1.5.0
   Compiling objc2-foundation v0.2.2
   Compiling rand v0.8.5
   Compiling winit v0.30.3
   Compiling walrus-macro v0.19.0
   Compiling errno v0.3.9
   Compiling tracing-core v0.1.32
   Compiling pin-project-lite v0.2.14
   Compiling pollster v0.3.0
   Compiling serde_json v1.0.117
   Compiling same-file v1.0.6
   Compiling roxmltree v0.20.0
   Compiling wasm-bindgen-shared v0.2.92
   Compiling cc v1.0.98
   Compiling walrus v0.20.3
   Compiling walkdir v2.5.0
   Compiling crossbeam-channel v0.5.13
   Compiling filetime v0.2.23
   Compiling fsevent-sys v4.1.0
   Compiling humantime v2.1.0
   Compiling ryu v1.0.18
   Compiling dpi v0.1.1
   Compiling cursor-icon v1.1.0
   Compiling fastrand v2.1.0
   Compiling smol_str v0.2.2
   Compiling itoa v1.0.11
   Compiling notify v6.1.1
   Compiling regex v1.10.4
   Compiling base64 v0.21.7
   Compiling rustc-demangle v0.1.24
   Compiling nv-flip-sys v0.1.1
   Compiling env_filter v0.1.0
   Compiling env_logger v0.11.3
   Compiling thiserror-impl v1.0.61
   Compiling foreign-types-macros v0.2.3
   Compiling bytemuck_derive v1.7.0
   Compiling clap_derive v4.5.4
   Compiling wasm-bindgen-wasm-conventions v0.2.92
   Compiling tracing-attributes v0.1.27
   Compiling foreign-types v0.5.0
   Compiling metal v0.28.0
   Compiling core-graphics v0.23.2
   Compiling bytemuck v1.16.0
   Compiling naga v0.20.0
   Compiling font-types v0.5.5
   Compiling read-fonts v0.19.3
   Compiling image v0.25.1
   Compiling tracing v0.1.40
   Compiling objc2-app-kit v0.2.2
   Compiling clap v4.5.4
   Compiling wasm-bindgen-wasm-interpreter v0.2.92
   Compiling wasm-bindgen-threads-xform v0.2.92
   Compiling wasm-bindgen-multi-value-xform v0.2.92
   Compiling wasm-bindgen-externref-xform v0.2.92
   Compiling tempfile v3.10.1
   Compiling wasm-bindgen-cli-support v0.2.92
   Compiling notify-debouncer-mini v0.4.1
   Compiling vello_tests v0.0.0 (*rust/vello/vello_tests)
   Compiling devserver_lib v0.4.2
   Compiling pico-args v0.5.0
   Compiling nv-flip v0.1.2
   Compiling cargo-run-wasm v0.4.0
   Compiling run_wasm v0.0.0 (*rust/vello/examples/run_wasm)
   Compiling skrifa v0.19.3
   Compiling vello_shaders v0.2.0 (*rust/vello/vello_shaders)
warning: fields `module` and `module_info` are never read
  --> vello_shaders/src/compile/mod.rs:80:9
   |
78 | pub struct ShaderInfo {
   |            ---------- fields in this struct
79 |     pub source: String,
80 |     pub module: Module,
   |         ^^^^^^
81 |     pub module_info: ModuleInfo,
   |         ^^^^^^^^^^^
   |
   = note: `ShaderInfo` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis
   = note: `#[warn(dead_code)]` on by default

warning: field `name` is never read
  --> vello_shaders/src/types.rs:34:9
   |
33 | pub struct BindingInfo {
   |            ----------- field in this struct
34 |     pub name: Option<String>,
   |         ^^^^
   |
   = note: `BindingInfo` has derived impls for the traits `Debug` and `Clone`, but these are intentionally ignored during dead code analysis

warning: fields `size_in_bytes` and `index` are never read
  --> vello_shaders/src/types.rs:41:9
   |
40 | pub struct WorkgroupBufferInfo {
   |            ------------------- fields in this struct
41 |     pub size_in_bytes: u32,
   |         ^^^^^^^^^^^^^
42 |     /// The order in which th...
43 |     pub index: u32,
   |         ^^^^^
   |
   = note: `WorkgroupBufferInfo` has derived impls for the traits `Debug` and `Clone`, but these are intentionally ignored during dead code analysis

warning: `vello_shaders` (build script) generated 3 warnings
   Compiling vello_encoding v0.2.0 (*rust/vello/vello_encoding)
warning: field `th1` is never read
  --> vello_shaders/src/cpu/euler.rs:36:9
   |
34 | pub struct EulerParams {
   |            ----------- field in this struct
35 |     pub th0: f32,
36 |     pub th1: f32,
   |         ^^^
   |
   = note: `EulerParams` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis
   = note: `#[warn(dead_code)]` on by default

warning: `vello_shaders` (lib) generated 1 warning
   Compiling wgpu-profiler v0.17.0
   Compiling vello v0.2.0 (*rust/vello/vello)
   Compiling scenes v0.0.0 (*rust/vello/examples/scenes)
   Compiling simple v0.0.0 (*rust/vello/examples/simple)
   Compiling with_winit v0.0.0 (*rust/vello/examples/with_winit)
   Compiling headless v0.0.0 (*rust/vello/examples/headless)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 58.75s

It got stuck and nothing responded except the mouse. I had to restart it.

@DJMcNab
Copy link
Member

DJMcNab commented Jul 22, 2024

Oh, I didn't realise that your machine isn't an M1 machine, as you indicated it was by posting that in this issue.

Would you mind creating a new issue for the behaviour you're seeing? As a starting point, could you please determine which GPU is seeing this crash.

@sfjohnson
Copy link

I'm still getting the full system hang on main branch, but a little different to above. I'm on 8GB M2, Ventura 13.6.7. The issue happens immediately when launching with_winit, without zooming.

let force_gpu_from = Some("fine_area"); doesn't hang, but let force_gpu_from = Some("coarse"); does, so I'm getting the hang even with flatten on CPU. macOS is reporting

Termination Reason:    Namespace WATCHDOG, Code 1 monitoring timed out for service
(1 monitored services unresponsive): checkin with service: WindowServer returned not alive with context:
unresponsive work processor(s): WindowServer main thread

@DJMcNab
Copy link
Member

DJMcNab commented Aug 8, 2024

Can you please confirm which commit you're using @sfjohnson? We have had a memory leak issue which was solved today (#661), which I could see causing this kind of issue.

@sfjohnson
Copy link

sfjohnson commented Aug 8, 2024

It was the latest 59c0fa5 with the fix applied.

@DJMcNab
Copy link
Member

DJMcNab commented Aug 9, 2024

Can you determine whether this was a regression. If so, which commit was it introduced in?

We have several developers on M1 family chips, so your experience is surprising to me.

@sfjohnson
Copy link

Found it! It's 1daf2a4. If I apply compilation_options: PipelineCompilationOptions::default() back to main branch, it works.

@DJMcNab
Copy link
Member

DJMcNab commented Aug 9, 2024

Hmm, that's concerning. Do you think you can extract the full MSL for the relevant shader with and without that setting?

Is it coarse which is failing, or path_tiling?

@sfjohnson
Copy link

Looks like it's the same MSL regardless of compilation_options . It's coarse which is failing: coarse.metal.txt

@DJMcNab
Copy link
Member

DJMcNab commented Aug 9, 2024

Hmm, the same MSL being generated doesn't track with my expectations. The only thing that compilation_options does is pass a different argument to naga, to change what MSL is being generated.

@sfjohnson
Copy link

sfjohnson commented Aug 9, 2024

I double checked and same result. I'm not sure if I collected the MSL correctly though, I did:

  1. Change to default = ["wgsl", "full", "cpu", "msl"] in vello_shaders/Cargo.toml
  2. cargo build
  3. Copy the MSL string out of target/debug/build/vello_shaders-<some hex>/out/shaders.rs
  4. Write to a file, parsing out all the newlines
  5. Undo 1daf2a4
  6. cargo clean; cargo build
  7. Repeat 3 and 4

Is that right?

@DJMcNab
Copy link
Member

DJMcNab commented Aug 9, 2024

Those aren't the shaders being generated by wgpu - those are shaders generated by vello_shaders for third-party users of our shaders.

You wouldn't need to change any features to get the shaders from wgpu, although unfortunately I don't know the best way. I think it might involve either adding debug prints inside wgpu, or using your system's GPU debugging tools.

@sfjohnson
Copy link

Ok I think this makes more sense now, I am logging from inside wgpu. The diff is (hangs when not present):

if (metal::all(local_id == metal::uint3(0u))) {
    for (int __i0 = 0; __i0 < 8; __i0++) {
        for (int __i1 = 0; __i1 < 256; __i1++) {
            metal::atomic_store_explicit(&sh_bitmaps.inner[__i0].inner[__i1], 0, metal::memory_order_relaxed);
        }
    }
    sh_part_count = {};
    sh_part_offsets = {};
    sh_drawobj_ix = {};
    sh_tile_stride = {};
    sh_tile_width = {};
    sh_tile_x0y0_ = {};
    sh_tile_count = {};
    sh_tile_base = {};
}
metal::threadgroup_barrier(metal::mem_flags::mem_threadgroup);

Full sources:
coarse-ziwm-false.metal.txt
coarse-ziwm-true.metal.txt

@DJMcNab
Copy link
Member

DJMcNab commented Aug 9, 2024

I realise that this will be quite hard to do, but do you think you could isolate which of those is required?

The easiest one to validate would be the barrier, because you can just add a workgroupBarrier() at the start of the shader.

Thanks so much for being so patient with debugging this so far!

@sfjohnson
Copy link

It seems to hang unless everything is cleared, and I had to add some extra barriers. Here's what I have working, added to the start of fn main in coarse.wgsl:

    for (var i = 0u; i < N_SLICE; i += 1u) {
        for (var j = 0u; j < N_TILE; j += 1u) {
            atomicStore(&sh_bitmaps[i][j], 0u);
        }
    }

    for (var i = 0u; i < WG_SIZE; i += 1u) {
        workgroupBarrier();
        sh_part_count[i] = 0u;
        sh_part_offsets[i] = 0u;
        sh_drawobj_ix[i] = 0u;
        sh_tile_stride[i] = 0u;
        sh_tile_width[i] = 0u;
        sh_tile_x0y0[i] = 0u;
        sh_tile_count[i] = 0u;
        sh_tile_base[i] = 0u;
    }

    workgroupBarrier();

Note that this might not be completely optimal as I've never written WGSL before and I'm trying to minimise subjecting my computer to lots of hard reboots. Fortunately it seems this is all that is required; all other shaders work without zero initialisation.

@DJMcNab
Copy link
Member

DJMcNab commented Aug 9, 2024

That clearing routine is UB. The pattern you actually want in this case is:

buffer[local_id.x] = 0;

for each buffer, and not in a loop

I wonder if the buffers start in a poison state, so metal now decides that it can just do ub?

@sfjohnson
Copy link

Oh I see, like this right? (removed one barrier and it still works):

    for (var i = 0u; i < N_SLICE; i += 1u) {
        for (var j = 0u; j < N_TILE; j += 1u) {
            atomicStore(&sh_bitmaps[i][j], 0u);
        }
    }

    sh_part_count[local_id.x] = 0u;
    sh_part_offsets[local_id.x] = 0u;
    sh_drawobj_ix[local_id.x] = 0u;
    sh_tile_stride[local_id.x] = 0u;
    sh_tile_width[local_id.x] = 0u;
    sh_tile_x0y0[local_id.x] = 0u;
    sh_tile_count[local_id.x] = 0u;
    sh_tile_base[local_id.x] = 0u;

    workgroupBarrier();

@raphlinus
Copy link
Contributor

raphlinus commented Aug 9, 2024

I'm desk-checking the code now to see if there's any uninitialized read. Would it be possible to isolate which of these initializations is responsible?

Also, the pattern of initializing sh_bitmaps is way less efficient than it could be (though not undefined behavior, as the store is atomic). A better pattern is the initialization on lines 205-207.

@sfjohnson
Copy link

Hmm, unfortunately while trying to isolate each initialisation things stopped being predictable. Now the code I posted above sometimes causes a hang. It looks like the bug might not actually be isolated to coarse.wgsl. It's quite troublesome to debug due to all the hard reboots, and I'm concerned about data corruption. Maybe there's a way to test in a VM with GPU access?

@raphlinus
Copy link
Contributor

I'm also quite willing to dig into this myself, but it's unclear how to repro. Just so I understand, it's failing just running the default scene, nothing special? That certainly works on my machine (M1 Pro, 14.2.1).

It's certainly possible that there's an uninitialized memory read elsewhere in the pipeline, that was getting masked by the zeroing.

@sfjohnson
Copy link

I just double checked and yeah it's super easy to repro for me just by cloning the repo and running cargo run -p with_winit. I get an instant hang with nothing rendered in the window. I'm on the latest stable Rust. Not sure what is different about my computer but I might look into this and see how it runs Metal.

@94bryanr
Copy link

94bryanr commented Aug 23, 2024

I am also running into this problem on an M2 mac. In my case:

  • The application runs fine when the resolution is "small" - window taking less than half of a 4k screen.
  • When the window is resized with the mouse up to 4k size I see some of the bins in the lower right corner stop rendering.
  • The larger the window, the more bins fail - if I make the application full screen almost half of the canvas isn't rendered.
  • If I leave the application running for about 15 seconds in this state it freezes completely, freezing the display and forcing me to reboot my computer (computer reboots automatically after another minute if I don't do anything).
  • The failure depends on what I have rendering in the scene. If I remove certain scene elements I can prevent the issue, but it isn't clear to me which ones are causing the issue yet. I think it might have something to do with the scene elements using clipping masks.
  • Changing antialiasing method does not change behavior (tried Area, Msaa8 and Msaa16).

This happens consistently on version 0.2.1.
Here is a video of the behavior: https://www.youtube.com/watch?v=y5-IIJHvLgY.
I am not changing the scene at all during that video, just resizing the screen.

The behavior from the video is happening here: https://www.cocube.com/console. Happy to work with you to fix this (its not great UX to crash someones computer from your website) and I'd like to stick with vello. I've looked over the code for the shaders and have a decent enough high-level understanding to try making some changes but I could still use some guidance.

@waywardmonkeys
Copy link
Contributor

@94bryanr Just for extra info, how much memory does your M2 Mac have?

@raphlinus
Copy link
Contributor

I have a hypothesis: this might be uninitialized memory read of workgroup shared memory. That would be consistent with zeroing the memory mitigating the problem, and would also explain why it manifests after long running time - it may be a low probability that a particular value causes an infinite loop.

It's somewhat frustrating, because decent tooling could help catch it, but we don't have that. A couple things can be done. One is to carefully desk check the shaders for UMR (I looked over coarse, didn't find anything, but I could have missed something, and it might be a different shader). Another is to deliberately inject garbage initial values (3735928559u etc) and see if that changes behavior.

Another pathway is to get a repro case I can observe. The application of @94bryanr sounds promising if we can get that to happen on my machine.

It would be really good to get this tracked down.

@sfjohnson
Copy link

I recently upgraded from macOS 12 to 14 and now the issue is gone, even when zooming in close multiple times on with_winit. Interestingly, after the testing I did a few weeks ago my system would randomly hang every few days without running Vello, with the same WindowServer returned not alive error. I'm thinking my Metal drivers were updated with a fix.

@DJMcNab
Copy link
Member

DJMcNab commented Aug 26, 2024

Thanks for that report. I'm glad to hear it.

This is the third report we've received of this kind of hang happening on macOS 12, and the second of it being fixed after an update of macOS. I don't think we can meaningfully take any action here. @XdaTk, please update your macOS version, but I'm going to close this on the assumption that would fix this.

We can always re-open if that hypothesis is wrong.

@DJMcNab DJMcNab closed this as not planned Won't fix, can't repro, duplicate, stale Aug 26, 2024
@94bryanr
Copy link

My M2 Mac Pro is the 32GB version and it is running MacOS Ventura 13.6.4.
I'll go ahead and update my system to MacOS Sonoma and see if anything changes. If not I'll try digging into the code more to see if I can isolate the issue.

@94bryanr
Copy link

94bryanr commented Aug 27, 2024

The problem persists even after updating to MacOS Sonoma. Just to recap I am experiencing the issue on an M2 Mac 32GB while using vello 0.2.1. The problem happens on MacOS Ventura and on a fully updated MacOS Sonoma. The issue does not happen on Windows. For most people experiencing this it sounds like the issue is related to zooming in and out but for me the issue only happens when the resolution of the rendering context (in my case an HTML canvas) is increased to nearly 4k.

I haven't been able to test the main branch yet since it looks like most of the wgpu types are now re-exported under vello::wgpu, which required more refactoring than I was able to get done at the time (without sidetracking too much please reconsider re-exporting those types as it makes vello take over the entire wgpu pipeline. I need to change all of my wgpu::GPU and wgpu::Device etc to vello::wgpu::*.). I'm going to take another look at this over this though.

@raphlinus You should be able to at least repro in the browser at https://www.cocube.com/console if you stretch the window to 4k and try scrolling up and down, but I'm not sure how valuable that will be.

And thanks for the amazing work on vello on this so far - very excited about the future of the project!

Update: It seems like the display freezing and requiring a reboot is no longer happening on the updated MacOS version, but I am still seeing the visual artifact of nothing rendering below a certain line, with the line rising the more the resolution is expanded.

@dominikh
Copy link

but I am still seeing the visual artifact of nothing rendering below a certain line, with the line rising the more the resolution is expanded.

That problem reproduces for me on Linux, but my assumption is that it's another instance of #366.

@DJMcNab
Copy link
Member

DJMcNab commented Aug 28, 2024

Yes, I suspect that is probably one of the drive-by fixes I have done in #606, give me half an hour to make a small PR fixing it. That is, segments wasn't properly write-protected. See #673
I'm slightly surprised that your scene is large enough to run past our bump buffer limits, based on the videos you've sent.

I haven't been able to test the main branch yet since it looks like most of the wgpu types are now re-exported under vello::wgpu, which required more refactoring than I was able to get done at the time (without sidetracking too much please reconsider re-exporting those types as it makes vello take over the entire wgpu pipeline. I need to change all of my wgpu::GPU and wgpu::Device etc to vello::wgpu::*.). I'm going to take another look at this over this though.

I don't understand what you're saying here, sorry. Our wgpu re-export shouldn't have any impact on whether you can add your own dependency on wgpu.

If the hangs are not happening, then that vindicates the decision not to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests