Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ANSOR][AUTOTVM] Combine Ansor and AutoTVM to Improve Scheduling #16499

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

canesche
Copy link
Contributor

Description

This pull request aims to enhance model optimization by combining parts of Ansor and AutoTVM. The proposed approach involves the following steps:

  1. Execution of Ansor over an end-to-end model that requires optimization.

  2. Selection of the best implementation identified by Ansor for the given model.

  3. Utilization of AutoTVM's Droplet Search to exploit the selected candidate.

By integrating Ansor with AutoTVM's Droplet Search (droplet paper), we anticipate a reduction in the number of trials explored by Ansor while still achieving faster kernel performance. Our experimentation has demonstrated significant improvements in kernel speed with reduced search times across various architectures, including Nvidia A100, Nvidia 3080, AMD x86, and ARM A64FX. The results can be found in this report: bennu paper

Proposed Changes

  • Integration of Ansor and Droplet Search methodologies.

  • Utilization of Droplet Search to exploit the best candidates identified by Ansor.

Motivation

The motivation behind this pull request is to streamline the model optimization process by leveraging the complementary strengths of Ansor and Droplet Search. By combining these techniques, we aim to enhance the efficiency and effectiveness of kernel search and optimization, ultimately improving overall model performance across different hardware architectures.

Testing and Validation

Extensive testing has been conducted to validate the efficacy and performance improvements achieved through the integration of Ansor and Droplet Search. Benchmarking tests have been performed across Nvidia A100, AMD x86, and ARM A64FX architectures to assess the impact on kernel speed and search time reduction compared with 10,000 trials from Ansor execution. These results are available in Section 3 of this manuscript: bennu paper

Additional Notes

This pull request builds upon prior research and experimentation in model optimization. The proposed approach improves end-to-end models across diverse hardware platforms while still reducing Ansor's search time. We welcome the community’s feedback, suggestions, and contributions to further refine and enhance these methodologies.

Thank you.

Sincerely,

Michael Canesche, Gaurav Verma, and Fernando Pereira

Copy link
Contributor

@pfk-beta pfk-beta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not main developer, but I have few suggestions to improve code quality.:

  • try to not mix 2 formatting methods f-string and %-formatting. I think we should prefer f-string, because it is more robust and actual.
  • try to not one-letter variables.
  • not sure if your PR is suitable for tests, but it would be great to have some unit tests for your code.

@canesche
Copy link
Contributor Author

canesche commented Feb 1, 2024

I'm not main developer, but I have few suggestions to improve code quality.:

  • try to not mix 2 formatting methods f-string and %-formatting. I think we should prefer f-string, because it is more robust and actual.
  • try to not one-letter variables.
  • not sure if your PR is suitable for tests, but it would be great to have some unit tests for your code.

Thank you @pfk-beta ! All comments are very welcome. I'll improve my code.

@tqchen
Copy link
Member

tqchen commented Feb 1, 2024

thanks for the contribution, just want to bring some of the context in https://discuss.tvm.apache.org/t/discuss-tvm-core-strategy-for-operator-scheduling-and-tuning/16352

would love to see how we can leverage some of the techniques in MetaSchedule and TensorIR in future

@canesche
Copy link
Contributor Author

canesche commented Feb 2, 2024

thanks for the contribution, just want to bring some of the context in https://discuss.tvm.apache.org/t/discuss-tvm-core-strategy-for-operator-scheduling-and-tuning/16352

would love to see how we can leverage some of the techniques in MetaSchedule and TensorIR in future

Thanks @tqchen ! We already have a plan to work with MetaSchedule. I hope to bring contributions in the near future.

@canesche canesche force-pushed the main branch 4 times, most recently from e3ac901 to 5572dc4 Compare February 10, 2024 11:59
@canesche canesche requested a review from pfk-beta February 10, 2024 18:39
@canesche
Copy link
Contributor Author

@pfk-beta Could you review my code? Thanks!

Copy link
Contributor

@pfk-beta pfk-beta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general:

  • I have commented only one instance of problem, e.g. one-letter variable. But single problem appears one or more times. Sometime it is annoying(for reviewer, and for author) to mark the same problem multiple time.
  • There are many levels of reviewing (important - less important, pythonic - not pythonic, readable - not readable). I just pick problems which are most annoying and simple to me.
  • What is spotted, that you are mixing 2 styles, e.g. with statement and no-with statement. Or %-formatting and fstring.
  • one letter variables

python/tvm/auto_scheduler/task_scheduler.py Outdated Show resolved Hide resolved
python/tvm/auto_scheduler/utils.py Outdated Show resolved Hide resolved
python/tvm/auto_scheduler/utils.py Outdated Show resolved Hide resolved
python/tvm/auto_scheduler/space.py Outdated Show resolved Hide resolved
python/tvm/auto_scheduler/space.py Outdated Show resolved Hide resolved
@canesche
Copy link
Contributor Author

@pfk-beta Thanks for the review! I applied modifications to each point you commented on. Could you see if further modifications need to be made?

Copy link
Contributor

@pfk-beta pfk-beta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general: it looks very very good. For me, it looks pretty in almost 95% :)

python/tvm/auto_scheduler/space.py Outdated Show resolved Hide resolved
python/tvm/auto_scheduler/task_scheduler.py Show resolved Hide resolved
@canesche
Copy link
Contributor Author

@pfk-beta Thanks for the review! Could you see if further modifications need to be made?

@pfk-beta
Copy link
Contributor

@canesche Thanks for your effort. LGTM :)

@canesche
Copy link
Contributor Author

@pfk-beta Thanks!
@tqchen Could you see if anything else is needed to accept the PR?

@canesche
Copy link
Contributor Author

Hi @pfk-beta , I'm not that familiar with the whole PR process, but I think you forgot to approve my PR. Could you look at it?

@pronesto
Copy link

Hi! I'd like to share some updates on the experiments conducted for this pull request. We've included the performance data for an RTX3080 in addition to the existing dataset in our report. The report now uses four hardware configurations: AMD x86-64 R7, ARM aarch64 A64FX, Nvidia A100, and Nvidia RTX3080. Across all these scenarios, reducing the number of trials for Ansor while using Droplet Search to exploit the best results tends to outperforms Ansor with 10,000 trials per model, considering both search time and model quality.

We've also conducted a study on the impact of the model size on the combination of Ansor and AutoTVM's Droplet Search. That's Section 3.3 of the manuscript. Here are our conclusions:

  1. The larger the model, the fewer samples the combined approach needs to observe to outperform Ansor (in terms of the speed of the final model), when Ansor uses a budget of 10,000 samples.
  2. The larger the model, the less significant the benefit in terms of search time for the combined approach over Ansor, although there is still improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants