rocm: fix bug in intercept mode path #106
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Description
The intercept mode path keeps track of events of intercepted kernel using the same hash table used to map event names to entries in the native event table (hash table normally used to do
ntv_name_to_code
conversions). The event names in the hash table don't collide because intercept mode keeps track of the base name of the event (discarding device and instance number), while native event table entries are referenced throughname:device=N:instance=M
keys.The reason for storing only the name of the event in intercept mode is that events are set up on all devices' dispatch queues, regardless the device specified by the user (this approach follows
rocprof
strategy). However, using only the event name without the instance number will cause problems. Instances represent separate events and should not be treated as a single event (XXXX:instance=2 != XXXX:instance=3
).The proposed patch uses a separate hash table for intercept mode and inserts the feature name rather than the event base name. This means that events with more than one instance will have a hash table key of the form
name[M]
, where M represents the instance number. If the event only has one instance the key will bename
.Author Checklist
Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
Commits are self contained and only do one thing
Commits have a header of the form:
module: short description
Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
The PR needs to pass all the tests