You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Runtime Error: Misaligned Address in fused_add_rmsnorm with hidden_dim=3584
I encountered a runtime error when using the fused_add_rmsnorm operator with a model configured for hidden_dim=3584 (28*128). The error message is as follows:
RuntimeError: CUDA error: misaligned address
This issue can be reproduced by modifying the test case in tests/test_norm.py. Specifically, setting:
@pytest.mark.parametrize("hidden_size", [3584])
will trigger the error during testing.
Upon investigation, I identified the problematic line in the code:
…apes (#636)
This PR fixes the issue #634, which is brought by #592 .
If we want to use 16-bytes vectorized read/write, we need to confirm the
address is aligned to 16 bytes.
When `num_warps` is not a multiple of 4 (4*sizeof(float) = 16), the
address of `smem + num_warps` might not align to 16 bytes.
We can fix this by shifting the start offset of vectorized read/write to
`smem + ceil_div(num_warps, 4) * 4` to force the alignment.
cc @ovowei@Abatom
Runtime Error: Misaligned Address in
fused_add_rmsnorm
withhidden_dim=3584
I encountered a runtime error when using the
fused_add_rmsnorm
operator with a model configured forhidden_dim=3584 (28*128)
. The error message is as follows:This issue can be reproduced by modifying the test case in tests/test_norm.py. Specifically, setting:
@pytest.mark.parametrize("hidden_size", [3584])
will trigger the error during testing.
Upon investigation, I identified the problematic line in the code:
located [here](
flashinfer/include/flashinfer/norm.cuh
Line 203 in ae501ed
This change resolves the issue; however, it may lead to performance degradation.
Below is the detailed pytest error output for reference:
The text was updated successfully, but these errors were encountered: