Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MI200][NCHW][SWDEV-312112] The find-db NCHW records for ConvAsmImplicitGemmGTCDynamic*XdlopsNHWC must be updated #1349

Closed
atamazov opened this issue Dec 17, 2021 · 18 comments · Fixed by #1354
Assignees
Milestone

Comments

@atamazov
Copy link
Contributor

The find-db NCHW records for

  • ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC
  • ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC
  • ConvAsmImplicitGemmGTCDynamicWrwXdlopsNHWC

must be refreshed.

Originally posted by @atamazov in #1324 (comment)

@atamazov
Copy link
Contributor Author

atamazov commented Dec 17, 2021

#1324 increases WS up to 3 bytes, but the system find-db remains the same. The library may run into fatal errors from time to time.

The problem may manifest in develop, Staging tests and in 5.0 Release Staging tests. The latter is potentially the most problematic.

How we can fix this

We do not need to actually run tuning sessions to update find-db files. Adding 3 to the "required workspace size" field for ConvAsmImplicitGemmGTCDynamic*XdlopsNHWC solvers in NCHW records should be enough. However it is better to accommodate for #1327 in advance and increase requires WS size not by 3, but by (255*3).

Alternatives:

  • Regenerate find-db for MI100/MI200 (@JehandadKhan) from the existing Tuna db.
  • Patch existing find-db files using some script (@DrizztDoUrden )

Let's decide which way we prefer (assuming that the goal is to have the fix ASAP).

@JehandadKhan
Copy link
Collaborator

@atamazov Thanks for highlighting this. Can we hold off on merging #1324 until January ?

@atamazov
Copy link
Contributor Author

Sure. But the immediate problem is that we have this issue in 5.0 staging.

@junliume
Copy link
Collaborator

AFAIK, we tune after the branch is cut, and will need to regenerate find-db then anyway?

@atamazov
Copy link
Contributor Author

Of course, but we do not want to receive bug reports before that, right?

@atamazov
Copy link
Contributor Author

atamazov commented Dec 17, 2021

...And currently we have this problem in Staging branch. After promotion it will be replicated to Master branch and will remain there (i.e. in all Mainline builds) until 5.0 tuning is done, backported to develop, promoted to Staging, tested, and finally promoted to Mainline.

@junliume
Copy link
Collaborator

@atamazov @JehandadKhan : which way is "faster", a.k.a. can be completed by next Tuesday? I will vote for "Patch existing find-db files using some script (@DrizztDoUrden )" since Chris is off and patching looks like a more "direct" way to ensure problems are fixed. What do you think?

@atamazov
Copy link
Contributor Author

Either variant works for me.

@JehandadKhan
Copy link
Collaborator

I vote for @DrizztDoUrden to update it since that sounds the safest to me as well.

@atamazov atamazov assigned atamazov and unassigned JehandadKhan and junliume Dec 17, 2021
@atamazov
Copy link
Contributor Author

@DrizztDoUrden Let's update MI100/MI200 fdb as per #1349 (comment). I am available for a call at any time.

@atamazov
Copy link
Contributor Author

@shaojiewang FYI. Please share your opinion about this. Thanks.

@shaojiewang
Copy link
Contributor

@atamazov Thanks for findings and updates. I think that patching the existing find-db is a better way.

In the future, for the find db part, could it be possible to remove the Workspace_size from the db and just let it be computed on the fly? Or else, is it possible to have some reserved space for WS?

@atamazov
Copy link
Contributor Author

@JehandadKhan FYI There are none records with ConvAsmImplicitGemmGTCDynamic*XdlopsNHWC in MI100 find-db files.

@atamazov atamazov changed the title [MI100][MI200] The find-db NCHW records for ConvAsmImplicitGemmGTCDynamic*XdlopsNHWC must be updated [MI200] The find-db NCHW records for ConvAsmImplicitGemmGTCDynamic*XdlopsNHWC must be updated Dec 20, 2021
@atamazov atamazov changed the title [MI200] The find-db NCHW records for ConvAsmImplicitGemmGTCDynamic*XdlopsNHWC must be updated [MI200][NCHW][SWDEV-312112] The find-db NCHW records for ConvAsmImplicitGemmGTCDynamic*XdlopsNHWC must be updated Dec 20, 2021
@atamazov
Copy link
Contributor Author

@JehandadKhan FYI There are none records with ConvAsmImplicitGemmGTCDynamic*XdlopsNHWC in MI100 find-db files.

I mean, is it expected in develop?

@JehandadKhan
Copy link
Collaborator

@JehandadKhan FYI There are none records with ConvAsmImplicitGemmGTCDynamic*XdlopsNHWC in MI100 find-db files.

I mean, is it expected in develop?

Not according to my data

@atamazov
Copy link
Contributor Author

@JehandadKhan Still unclear... ;) So there is nothing to fix in MI100 System find-db files, right?

@JehandadKhan
Copy link
Collaborator

@atamazov That is correct, no records for gfx908 of the above/mentioned solvers

@JehandadKhan
Copy link
Collaborator

Perhaps I could have worded my earlier response better !

junliume pushed a commit that referenced this issue Dec 21, 2021
…mmGTCDynamic*XdlopsNHWC in find-db (leftover of 1324) (#1354)

Resolves #1349

Added 3 * 255 bytes to workspace to the following solvers workspace size in fdb where it was non-zero:
ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC
ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC
ConvAsmImplicitGemmGTCDynamicWrwXdlopsNHWC
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment