Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freezes on Metal #89

Open
cwfitzgerald opened this issue May 20, 2020 · 8 comments
Open

Freezes on Metal #89

cwfitzgerald opened this issue May 20, 2020 · 8 comments
Labels
bug Something isn't working crash Program panicked or worse, hard crashed platform - mac Mac specific issues

Comments

@cwfitzgerald
Copy link
Member

Reports of the current builds freezing up Macs. The entire machine freezes, likely due to a graphics issue.

@cwfitzgerald cwfitzgerald added bug Something isn't working crash Program panicked or worse, hard crashed platform - mac Mac specific issues labels May 20, 2020
@StarArawn
Copy link

More info:

validateComputeFunctionArguments:771: failed assertion `Compute Function(main0): argument _80[0] from buffer(0) with offset(0) and length(136) has space for 136 bytes, but argument has a length(144).'

This might be the metal shader that is created:

    #pragma clang diagnostic ignored "-Wmissing-prototypes"
    #pragma clang diagnostic ignored "-Wmissing-braces"
    
    #include <metal_stdlib>
    #include <simd/simd.h>
    
    using namespace metal;
    
    template<typename T, size_t Num>
    struct spvUnsafeArray
    {
        T elements[Num ? Num : 1];
        
        thread T& operator [] (size_t pos) thread
        {
            return elements[pos];
        }
        constexpr const thread T& operator [] (size_t pos) const thread
        {
            return elements[pos];
        }
        
        device T& operator [] (size_t pos) device
        {
            return elements[pos];
        }
        constexpr const device T& operator [] (size_t pos) const device
        {
            return elements[pos];
        }
        
        constexpr const constant T& operator [] (size_t pos) const constant
        {
            return elements[pos];
        }
        
        threadgroup T& operator [] (size_t pos) threadgroup
        {
            return elements[pos];
        }
        constexpr const threadgroup T& operator [] (size_t pos) const threadgroup
        {
            return elements[pos];
        }
    };
    
    struct _74
    {
        packed_float3 _m0;
        float _m1;
    };
    
    struct _77
    {
        _74 _m0[4];
    };
    
    struct _78
    {
        float4x4 _m0;
        _77 _m1;
        uint2 _m2;
    };
    
    struct _234
    {
        packed_float3 _m0;
        float _m1;
    };
    
    struct _236
    {
        _234 _m0[4];
    };
    
    struct _238
    {
        _236 _m0[1];
    };
    
    constant uint3 gl_WorkGroupSize [[maybe_unused]] = uint3(8u, 8u, 1u);
    
    kernel void main0(constant _78& _80 [[buffer(0)]], device _238& _240 [[buffer(1)]], uint3 gl_GlobalInvocationID [[thread_position_in_grid]])
    {
        float2 _86 = float2(_80._m2);
        float2 _87 = float2(float3(gl_GlobalInvocationID).xy) / _86;
        float2 _99 = float2(float3(gl_GlobalInvocationID + uint3(1u)).xy) / _86;
        float _108 = mix(-1.0, 1.0, _87.x);
        float _111 = mix(-1.0, 1.0, _87.y);
        spvUnsafeArray<float4, 4> _102;
        _102[0] = float4(_108, _111, 1.0, 1.0);
        float _118 = mix(-1.0, 1.0, _99.x);
        _102[1] = float4(_118, _111, 1.0, 1.0);
        float _129 = mix(-1.0, 1.0, _99.y);
        _102[2] = float4(_108, _129, 1.0, 1.0);
        _102[3] = float4(_118, _129, 1.0, 1.0);
        spvUnsafeArray<float3, 4> _171;
        for (int _431 = 0; _431 < 4; )
        {
            float4 _159 = _80._m0 * _102[_431];
            _171[_431] = (_159.xyz / float3(_159.w)).xyz;
            _431++;
            continue;
        }
        uint _373 = (gl_GlobalInvocationID.y * _80._m2.x) + gl_GlobalInvocationID.x;
        _240._m0[_373]._m0[0]._m0 = -normalize(cross(_171[0], _171[2]));
        _240._m0[_373]._m0[0]._m1 = 0.0;
        _240._m0[_373]._m0[1]._m0 = -normalize(cross(_171[3], _171[1]));
        _240._m0[_373]._m0[1]._m1 = 0.0;
        _240._m0[_373]._m0[2]._m0 = -normalize(cross(_171[1], _171[0]));
        _240._m0[_373]._m0[2]._m1 = 0.0;
        _240._m0[_373]._m0[3]._m0 = -normalize(cross(_171[2], _171[3]));
        _240._m0[_373]._m0[3]._m1 = 0.0;
    }

@kvark
Copy link

kvark commented May 30, 2020

I think it's saying that the shader expects a uniform struct of size 144, but your buffer binding only has space for 136. This means that accessing ._m2 fields is going outside of bounds, which is a good reason for the HW to freeze.

@cwfitzgerald
Copy link
Member Author

I think this is possibly related to gfx-rs/wgpu#185

@kvark
Copy link

kvark commented Jun 12, 2020

Can't you easily test if that's the case (by providing more space to the buffers), or you have no platform to reproduce this?

@cwfitzgerald
Copy link
Member Author

I don't actually have access to a mac with xcode on it to really test thoroughly, though my friend does, so when I put some effort into solving it, I can bug them. I am probably also running afowl of alignment requirements that 0.5 didn't test for, and I've yet to update this codebase fully to master.

@StarArawn
Copy link

I found out what the issue was on metal, and thought I should document it here.

In froxels.cs.glsl:

layout(set = 0, binding = 0) uniform Uniforms {
    mat4 inv_proj;
    Frustum frustum;
    uvec2 froxel_count;
};

gets turned into:

struct _78
{
    float4x4 _m0;
    _77 _m1;
    uint2 _m2;
};

Metal expects that the alignment matches perfectly even for uniforms and not just buffers because uniforms and buffers are treated the same. If you use a uint4 or rather a uvec4 instead the alignment would be fine.

@kvark
Copy link

kvark commented Jul 21, 2020

Metal validation catches that, we saw similar errors reported. Make sure to run with it from time to time, until wgpu's own validation is getting there!

@cwfitzgerald
Copy link
Member Author

Ooo thank you for documenting this! So the solution is to use uvec4 for the froxel count? Is there anything I could do on my end to keep the shader the same? I guess the fundamental issue is that I'm uploading the padding bytes after the struct...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working crash Program panicked or worse, hard crashed platform - mac Mac specific issues
Projects
None yet
Development

No branches or pull requests

3 participants