-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoScheduler] Separate shapes from DAG hash and enable schedule sharing #7317
Conversation
Because this PR breaks compatibility, should we add some warning messages and point users to this PR? |
How about adding the message to the exsiting schedule not found warning and point to this PR? |
0d6c891
to
9352836
Compare
9352836
to
0509e45
Compare
The added warning message is too general. There are a lot of other reasons to hit this warning message. So I prefer not having it. |
I agreed that the added warning message was too general so I reverted it. Meanwhile, I came up with another idea that adds the warning message when loading the records. Now users will see the following message when loading the logs with old format:
|
2309060
to
a95c810
Compare
Thanks @comaniac @merrymercy It is merged. |
…ring (apache#7317) * [AutoScheduler] Separate shapes from DAG hash and enable schedule sharing * Update CI logs * lint * fix registry * add message; fix layout rewrite mismatch * update message * support other formats
…ring (apache#7317) * [AutoScheduler] Separate shapes from DAG hash and enable schedule sharing * Update CI logs * lint * fix registry * add message; fix layout rewrite mismatch * update message * support other formats
…ring (apache#7317) * [AutoScheduler] Separate shapes from DAG hash and enable schedule sharing * Update CI logs * lint * fix registry * add message; fix layout rewrite mismatch * update message * support other formats
…ring (apache#7317) * [AutoScheduler] Separate shapes from DAG hash and enable schedule sharing * Update CI logs * lint * fix registry * add message; fix layout rewrite mismatch * update message * support other formats
In this PR, we attempt to enable schedule sharing as a workaround before the dynamic shape support is fully landed. The idea is that if we have a schedule for batch size 1, then it is actually applicable to all other batch sizes (regardless the performance). This is useful when we only tune the workload with batch size 1 but wish to use it for all batch sizes to at least make the flow working.
To do so, we introduce "workload distance factor", which indicates the similarity of two workloads. Specifically, it is calculated by the following rules:
inf
.factor=prod(a / b) for a, b in zip(wkl1.args, wkl2.args)
,inf
.As a result, the distance factor ranges from 1 to
inf
. When the distance factor is notinf
, meaning that it is safe to apply the schedule of workload 2 to workload 1.The above mechanism works well for registered TE computes but not the ComputeDAG extracted from Relay programs. This is because currently when extracting tasks from Relay, we use MD5 to hash the ComputeDAG serialized string to be its key, which includes not only the DAG structure but the shapes, so it's impossible to calculate the distance factor. To make it work, this PR also improves the hashing mechanism of ComputeDAG by separating the input/output tensor shapes so that they can be accessed. For example, the workload key of a ComputeDAG was:
and it now becomes:
Please note that since we change the workload key format of ComputeDAG, the tuning logs won't match anymore. To make it work again, we can use the following script to update the keys in existing log files. This is also the way I used to update the CI logs:
cc @merrymercy @jcf94