[ANSOR][AUTOTVM] Combine Ansor and AutoTVM to Improve Scheduling #16499

canesche · 2024-01-31T19:44:54Z

Description

This pull request aims to enhance model optimization by combining parts of Ansor and AutoTVM. The proposed approach involves the following steps:

Execution of Ansor over an end-to-end model that requires optimization.
Selection of the best implementation identified by Ansor for the given model.
Utilization of AutoTVM's Droplet Search to exploit the selected candidate.

By integrating Ansor with AutoTVM's Droplet Search (droplet paper), we anticipate a reduction in the number of trials explored by Ansor while still achieving faster kernel performance. Our experimentation has demonstrated significant improvements in kernel speed with reduced search times across various architectures, including Nvidia A100, Nvidia 3080, AMD x86, and ARM A64FX. The results can be found in this report: bennu paper

Proposed Changes

Integration of Ansor and Droplet Search methodologies.
Utilization of Droplet Search to exploit the best candidates identified by Ansor.

Motivation

The motivation behind this pull request is to streamline the model optimization process by leveraging the complementary strengths of Ansor and Droplet Search. By combining these techniques, we aim to enhance the efficiency and effectiveness of kernel search and optimization, ultimately improving overall model performance across different hardware architectures.

Testing and Validation

Extensive testing has been conducted to validate the efficacy and performance improvements achieved through the integration of Ansor and Droplet Search. Benchmarking tests have been performed across Nvidia A100, AMD x86, and ARM A64FX architectures to assess the impact on kernel speed and search time reduction compared with 10,000 trials from Ansor execution. These results are available in Section 3 of this manuscript: bennu paper

Additional Notes

This pull request builds upon prior research and experimentation in model optimization. The proposed approach improves end-to-end models across diverse hardware platforms while still reducing Ansor's search time. We welcome the community’s feedback, suggestions, and contributions to further refine and enhance these methodologies.

Thank you.

Sincerely,

Michael Canesche, Gaurav Verma, and Fernando Pereira

pfk-beta

I'm not main developer, but I have few suggestions to improve code quality.:

try to not mix 2 formatting methods f-string and %-formatting. I think we should prefer f-string, because it is more robust and actual.
try to not one-letter variables.
not sure if your PR is suitable for tests, but it would be great to have some unit tests for your code.

canesche · 2024-02-01T12:13:17Z

I'm not main developer, but I have few suggestions to improve code quality.:

try to not mix 2 formatting methods f-string and %-formatting. I think we should prefer f-string, because it is more robust and actual.

try to not one-letter variables.

not sure if your PR is suitable for tests, but it would be great to have some unit tests for your code.

Thank you @pfk-beta ! All comments are very welcome. I'll improve my code.

tqchen · 2024-02-01T23:19:49Z

thanks for the contribution, just want to bring some of the context in https://discuss.tvm.apache.org/t/discuss-tvm-core-strategy-for-operator-scheduling-and-tuning/16352

would love to see how we can leverage some of the techniques in MetaSchedule and TensorIR in future

canesche · 2024-02-02T11:38:25Z

thanks for the contribution, just want to bring some of the context in https://discuss.tvm.apache.org/t/discuss-tvm-core-strategy-for-operator-scheduling-and-tuning/16352

would love to see how we can leverage some of the techniques in MetaSchedule and TensorIR in future

Thanks @tqchen ! We already have a plan to work with MetaSchedule. I hope to bring contributions in the near future.

canesche · 2024-02-10T18:39:59Z

@pfk-beta Could you review my code? Thanks!

pfk-beta

In general:

I have commented only one instance of problem, e.g. one-letter variable. But single problem appears one or more times. Sometime it is annoying(for reviewer, and for author) to mark the same problem multiple time.
There are many levels of reviewing (important - less important, pythonic - not pythonic, readable - not readable). I just pick problems which are most annoying and simple to me.
What is spotted, that you are mixing 2 styles, e.g. with statement and no-with statement. Or %-formatting and fstring.
one letter variables

python/tvm/auto_scheduler/task_scheduler.py

python/tvm/auto_scheduler/utils.py

tests/python/auto_scheduler/test_auto_scheduler_droplet.py

python/tvm/auto_scheduler/utils.py

python/tvm/auto_scheduler/space.py

canesche · 2024-02-11T23:45:02Z

@pfk-beta Thanks for the review! I applied modifications to each point you commented on. Could you see if further modifications need to be made?

pfk-beta

In general: it looks very very good. For me, it looks pretty in almost 95% :)

python/tvm/auto_scheduler/space.py

python/tvm/auto_scheduler/task_scheduler.py

canesche · 2024-02-12T18:20:32Z

@pfk-beta Thanks for the review! Could you see if further modifications need to be made?

pfk-beta · 2024-02-12T21:03:51Z

@canesche Thanks for your effort. LGTM :)

canesche · 2024-02-13T13:41:29Z

@pfk-beta Thanks!
@tqchen Could you see if anything else is needed to accept the PR?

canesche · 2024-02-21T18:04:47Z

Hi @pfk-beta , I'm not that familiar with the whole PR process, but I think you forgot to approve my PR. Could you look at it?

pronesto · 2024-02-21T18:35:10Z

Hi! I'd like to share some updates on the experiments conducted for this pull request. We've included the performance data for an RTX3080 in addition to the existing dataset in our report. The report now uses four hardware configurations: AMD x86-64 R7, ARM aarch64 A64FX, Nvidia A100, and Nvidia RTX3080. Across all these scenarios, reducing the number of trials for Ansor while using Droplet Search to exploit the best results tends to outperforms Ansor with 10,000 trials per model, considering both search time and model quality.

We've also conducted a study on the impact of the model size on the combination of Ansor and AutoTVM's Droplet Search. That's Section 3.3 of the manuscript. Here are our conclusions:

The larger the model, the fewer samples the combined approach needs to observe to outperform Ansor (in terms of the speed of the final model), when Ansor uses a budget of 10,000 samples.
The larger the model, the less significant the benefit in terms of search time for the combined approach over Ansor, although there is still improvement.

adding ansor optimization

782d2e2

pfk-beta reviewed Feb 1, 2024

View reviewed changes

canesche added 3 commits February 6, 2024 19:36

adding ansor optimization

a7d7024

Merge branch 'apache:main' into main

46f5fbe

Merge branch 'main' of https://github.com/canesche/tvm-bennu

34cba65

canesche force-pushed the main branch 4 times, most recently from e3ac901 to 5572dc4 Compare February 10, 2024 11:59

canesche requested a review from pfk-beta February 10, 2024 18:39

pfk-beta reviewed Feb 11, 2024

View reviewed changes

canesche force-pushed the main branch from 5572dc4 to 94967e4 Compare February 11, 2024 14:57

canesche requested a review from pfk-beta February 11, 2024 23:41

pfk-beta reviewed Feb 12, 2024

View reviewed changes

python/tvm/auto_scheduler/space.py Outdated Show resolved Hide resolved

python/tvm/auto_scheduler/task_scheduler.py Show resolved Hide resolved

fixed lint

d6cdcef

canesche force-pushed the main branch from 94967e4 to d6cdcef Compare February 12, 2024 14:10

canesche requested a review from pfk-beta February 12, 2024 18:20

pfk-beta approved these changes Feb 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ANSOR][AUTOTVM] Combine Ansor and AutoTVM to Improve Scheduling #16499

[ANSOR][AUTOTVM] Combine Ansor and AutoTVM to Improve Scheduling #16499

canesche commented Jan 31, 2024

pfk-beta left a comment

canesche commented Feb 1, 2024

tqchen commented Feb 1, 2024

canesche commented Feb 2, 2024

canesche commented Feb 10, 2024

pfk-beta left a comment

canesche commented Feb 11, 2024

pfk-beta left a comment

canesche commented Feb 12, 2024

pfk-beta commented Feb 12, 2024

canesche commented Feb 13, 2024

canesche commented Feb 21, 2024

pronesto commented Feb 21, 2024

[ANSOR][AUTOTVM] Combine Ansor and AutoTVM to Improve Scheduling #16499

Are you sure you want to change the base?

[ANSOR][AUTOTVM] Combine Ansor and AutoTVM to Improve Scheduling #16499

Conversation

canesche commented Jan 31, 2024

Description

Proposed Changes

Motivation

Testing and Validation

Additional Notes

pfk-beta left a comment

Choose a reason for hiding this comment

canesche commented Feb 1, 2024

tqchen commented Feb 1, 2024

canesche commented Feb 2, 2024

canesche commented Feb 10, 2024

pfk-beta left a comment

Choose a reason for hiding this comment

canesche commented Feb 11, 2024

pfk-beta left a comment

Choose a reason for hiding this comment

canesche commented Feb 12, 2024

pfk-beta commented Feb 12, 2024

canesche commented Feb 13, 2024

canesche commented Feb 21, 2024

pronesto commented Feb 21, 2024