Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Make Graviton3 default AArch64 job runner node #15352

Merged
merged 1 commit into from
Jul 19, 2023

Conversation

Mousius
Copy link
Member

@Mousius Mousius commented Jul 18, 2023

In order to support SVE testing, migrating the current default AArch64 nodes to Graviton3 based nodes. Using r7g.large instances which have the memory requirements to support the TVM workloads.

In order to support SVE testing, migrating the current default AArch64
nodes to Graviton3 based nodes. Using r7g.large instances which have the
memory requirements to support the TVM workloads.
@tvm-bot
Copy link
Collaborator

tvm-bot commented Jul 18, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@@ -19,7 +19,7 @@

{% call m.invoke_build(
name='BUILD: arm',
node='ARM-SMALL',
node='ARM-GRAVITON3',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it would be useful to have a analysis of cost and the way we structure the tests.

As of now running the UTs directly through e2e compilation can take up a lot of CI time. A lot of that comes from tests that are as a matter of fact integration tests

Would be great for us to isolate out a limited set of integration tests (in cases with tests/arm_sve/) and only run limited set of testcases over these would be useful. Like our require_cuda tag, while majority of tests do not have to go through the specific nodes

@tqchen
Copy link
Member

tqchen commented Jul 18, 2023

Thanks, it would be useful to have a analysis of cost of the new instance.

As of now running the UTs directly through e2e compilation can take up a lot of CI time. A lot of that comes from tests that likely do not need SVE.

My understanding is that we will need SVE for some of the integration tests. Ideally we should isolate out a limited set of integration tests(e.g. via tests/arm_sve/) and only run those. We can also have tests can be enabled through require_sve tag, that optionally disables the related tests when they are not available. As such majority of tests do not have to go through the specific hw.

Most remainder of the tests can be structured through UTs and likely do not need SVE

@Mousius
Copy link
Member Author

Mousius commented Jul 18, 2023

@tqchen the new instance type is slightly more expensive on paper, as detailed below:

CI Instance Type On-Demand Reserved Minimum Spot
ARM-GRAVITON3 r7g.large $87.5270 monthly $57.8890 monthly $49.1290 monthly
ARM-SMALL r6g.large $82.3440 monthly $51.9030 monthly $48.0340 monthly

However, the new generation of instance has been proven to improve performance (see: Re:invent presentation). Which indicates this is an improvement for CI costs.

If you look at the diff, this replaces the r6g.large instances with r7g.large instances and does not add any additional nodes, replacing the existing AArch64 CI with a new instance type that can also run the SVE tests. That means the SVE tests will already be targeted appropriately to AArch64, and this isn't an entirely new set of nodes specifically catering to just SVE.

@tqchen
Copy link
Member

tqchen commented Jul 18, 2023

OK get it, seems to be good on this

@ashutosh-arm ashutosh-arm merged commit 0603cce into main Jul 19, 2023
@junrushao junrushao deleted the Make_Graviton3_default_AArch64_job_runner_node branch July 21, 2023 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants