Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test more WMMA configurations #171

Merged
merged 3 commits into from
Jan 2, 2024
Merged

Conversation

thomasfaingnaert
Copy link
Member

No description provided.

Base automatically changed from tf/refactor-configs to master November 14, 2023 15:57
@thomasfaingnaert thomasfaingnaert force-pushed the tf/test-more-configs branch 3 times, most recently from 07ff68b to 4078daa Compare November 15, 2023 16:22
@thomasfaingnaert thomasfaingnaert marked this pull request as ready for review November 15, 2023 16:26
catch err
# Count tests with config errors as "broken".
if isa(err, GemmKernels.ConfigError)
@test true skip=true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maleadt For now, I've marked unsupported configurations as "Broken". The term is not really appropriate, as broken tests should be tests that ought to pass, but do not currently. We could also just mark these as "pass", but I quite like that they are reported separately so we can easily see how many configurations are skipped due to ConfigErrors. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we hard-code whether a configuration is unsupported, i.e., which configuration we know is supposed to throw a ConfigError. That way the broken would work as intended, resulting in test failures when it anything starts failing/passing without updating the tests. However, I guess that would be hard to do, given how we now generate configurations using multiple loops. Maybe we could maintain a separate list of known-broken configurations? I guess it would also depend on the device though (because of the shmem limitation), so maybe that's not feasible either...

@thomasfaingnaert thomasfaingnaert marked this pull request as draft December 7, 2023 22:24
@thomasfaingnaert thomasfaingnaert marked this pull request as ready for review December 7, 2023 22:29
@thomasfaingnaert
Copy link
Member Author

Remaining CI failure is due to illegal memory access during benchmarking, which should be fixed in CUDA 12.3 Update 2.

@thomasfaingnaert thomasfaingnaert merged commit 3c328d1 into master Jan 2, 2024
1 check failed
@thomasfaingnaert thomasfaingnaert deleted the tf/test-more-configs branch January 2, 2024 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants