-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Update CI docker and suppress warnings #2333
Conversation
@JehandadKhan @atamazov
another issue:
which might be sanitizer checks related. by making this change
is not helping unfortunately. |
@junliume BTW I see that submodule |
@junliume I've been addressing many of the rocm 5.7 related make check issues in an MIOpen-internal branch, I'll send you. |
This seems like a a problem in the implementation of the precompiled HIP headers. I do not know what #ifndef NDEBUG
#define __hip_assert(ex) ... // normal definition of assert
#else
#define __hip_assert(ex) (void(0))
#endif The code like Again, AFAICS this happens in the PCH, not in our code. I recommend informing the guys who are responsible for PCH, then silence the warning in MIOpen permanently (I do not think it's worth a full-featured "workaround" stuff) and forgetting about this. |
@atamazov I agree, this should be fixed in the runtime headers |
This is to keep track of FIN's
I think we need a standard procedure to update FIN :) notably, Clang formatting will change one file in FIN and I wanted to make that change a while ago. |
new(buffer) T(x); // NOLINT (clang-analyzer-cplusplus.PlacementNew) | ||
new(buffer + second_index) U(y); | ||
} | ||
char buffer[second_index + sizeof(U)] = {}; | ||
alignas(U) char buffer[second_index + sizeof(U)] = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may change the layout of kernel arguments in memory. Without alignas(U), the required buffer
alignment is 1 (i.e., no alignment is required). Therefore, the alignment of KernelArgsPair instances is also 1, and several such instances reside in memory without any gaps.
With alignas(U), the padding required for KernelArgsPair is alginof(U), which may lead to gaps between instances of KernelArgsPair.
If you see kernel failures, then please revert this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atamazov I do see lots of failures on gfx906
with
handlehip.cpp:80: Failed getting available memory: invalid argument
However, it looks more likely a runtime issue since it cannot be reproduced on other newer ASICs. But I will revert this change and try again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junliume This is not related and I see the same on navi21.
Reverting this change won't resolve the issue with getting available memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junliume It seems like this change is indeed correct and should be kept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atamazov can you verify it happens on Navi21?
For Vega nodes some have problems but some other nodes do not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junliume Of course, this happens on Navi21, as I've reported a while ago at #2307 (comment) where you can find the dirty hacks for this. I am working on more or less regular W/A which should be suitable for merging into develop.
@junliume I am working on this. Please expect update soon. |
…age. Fix: Removed unnecessary params.TARGET_NAVI21 from AnyGPU stages.
# RESOLVED Conflicts: # requirements.txt
@junliume I think that https://github.com/atamazov/MIOpen/blob/ci_rocm57_ata1 is ready for merging here.
🌀 TestingTested on Navi21 and gfx906/60 (Radeon VII) systems. Note that my Navi21 system has the "hipMemGetInfo" issue, but my Radeon VII doesn't. Tried the followed tests on both systems:
|
@atamazov I am not reproducing the hipGetMemoryInfo() error with 5.7 RC3 docker (at least so far), but indeed many CI nodes are not as stable now, and I have asked these base OS to be updated. |
@junliume Are you going to merge https://github.com/atamazov/MIOpen/blob/ci_rocm57_ata1 into this branch? If yes, then I will double-check if hipMemGetInfo problem is resolved in RC3 and narrow (or remove) the W/A. |
I wish to and need to figure out how to sync from Update: done. |
@junliume hipMemGetInfo problem is NOT resolved in 5.7 RC3. This PR can be merged as is. |
@junliume @JehandadKhan [Notice] Interesting detail:
"Thanks" to some problems with |
Ping @JehandadKhan and @atamazov for review after CI has passed. |
No description provided.