Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can we (Arm) help? #33

Open
diegorusso opened this issue Nov 20, 2023 · 16 comments
Open

How can we (Arm) help? #33

diegorusso opened this issue Nov 20, 2023 · 16 comments

Comments

@diegorusso
Copy link

Hello @mdboom, @gvanrossum pointed me to this repo and suggested to raise an issue to introduce myself.

I'm Diego from Arm Ltd and I was wondering if there is anything we could do to help you out with the benchmarks story you are maintaining.
Recently I've been working with Łukasz Langa to have aarch64 benchmarks on speed.python.org (more info on this thread) and Łukasz is fixing a few issues on the website front.

For instance I see you have arm64 results but not aarch64 results. What's the reason for that? What about Windows on Arm?

This is really an initial contact to see if we could help each other, start a discussion and helping out filling any gap you might have in your infrastructure.

Thanks!

@mdboom
Copy link
Contributor

mdboom commented Nov 20, 2023

Hi! Thanks for reaching out. I saw your work with Łukasz and that's great to see.

There's no real reason we don't have aarch64 or Windows on Arm yet other than prioritizing the Tier 1 (and 1.5 in the case of darwin-arm64) first. We'd obviously need dedicated, bare metal hardware to run the Github self-hosted runner on.

@diegorusso
Copy link
Author

Michael, ok thanks for the update. Let me see what we can do to help you out.

@gvanrossum
Copy link

Did this discussion move elsewhere? Can this issue be closed? Or are we still waiting for @diegorusso ?

@diegorusso
Copy link
Author

Hello, I'm still busy fixing things up for speed.python.org and enabling aarch64 metrics. In theory (but I want to check with Łukasz first) we could use that machine to run more benchmarks. At the moment we are running nightly benchmarks (at midnight) to mimimc the same behaviour of the x86 counterpart. We should utilise it more: it's a pity to keep idle 80 cores with 256GB of RAM :)
We setup the machine with CPU isolation and we could run pyperformance in parallels using CPU affinity (ATM up to 8 parallel runs of pyperformance).
I'll ping Łukasz so we can work out what the best plan is to maximise the use of this machine.

@mdboom
Copy link
Contributor

mdboom commented Mar 29, 2024

@diegorusso: We are at a point where we could definitely use native aarch64 hardware. CPython is currently developing a JIT, and while it does work on aarch64/Linux, we currently use emulation for CI, and we have no visibility into its performance. Would you be available to discuss how we could get access to that machine (or some other)? Our benchmarking infrastructure is currently based on Github Actions self-hosted runners, so the main lift would be getting the GHA software installed on it and talking to our benchmarking repo.

@diegorusso
Copy link
Author

hello @mdboom, how are you? Thanks for reaching out. Before exploring the options, I have a few questions:

  • how many runs do you have per day?
  • how long do they last? Is it the standard pyperformance run?
  • who can kick the build/run? Is it just a set of people or anyone from the community?
  • How the builds are kicked? Is it a cronjob like, automatically (via PR) or manually?

Apologies for the list of questions but it will help to understand the use case and see if there is a viable solution here.

Thanks

@mdboom
Copy link
Contributor

mdboom commented Apr 2, 2024

hello @mdboom, how are you? Thanks for reaching out. Before exploring the options, I have a few questions:

  • how many runs do you have per day?

Probaby 4-5 times a day on average.

  • how long do they last? Is it the standard pyperformance run?

Yes, it's the standard pyperformance run -- usually about 20 minutes for a PGO compile and 1h15m for the benchmark runs.

  • who can kick the build/run? Is it just a set of people or anyone from the community?

It's just a restricted set of people we trust -- security on raw metal is challenging, and it's just easier that way (and what Github recommends).

  • How the builds are kicked? Is it a cronjob like, automatically (via PR) or manually?

They are usually kicked off manually by a developer wanting to test a particular change, but we also do a weekly cronjob. We don't have automatic via a PR due to the same security concerns.

Apologies for the list of questions but it will help to understand the use case and see if there is a viable solution here.

No problem.

We also have the different use case of just needing occasional direct access to a Linux-on-ARM machine to debug things when code generation isn't working. We currently use emulation for this, but I think having access to real hardware would be simpler than dealing with cross compilation, etc. @brandtbucher can probably provide more details.

@brandtbucher
Copy link
Member

We also have the different use case of just needing occasional direct access to a Linux-on-ARM machine to debug things when code generation isn't working. We currently use emulation for this, but I think having access to real hardware would be simpler than dealing with cross compilation, etc. @brandtbucher can probably provide more details.

Just to clarify: I already have AArch64 Linux hardware that I've been using to develop and debug the JIT for that platform. As far as I see it, our needs are currently:

  • Individual access to a WoA machine for development/debugging.
  • Benchmarking infrastructure for AArch64 Linux.
  • Benchmarking infrastructure for WoA.

Less important, but still on the wish list:

  • AArch64 Linux JIT CI for CPython (non-emulated)
  • WoA JIT CI for CPython (non-emulated)

JIT buildbots are something we may want to consider at some point to help fill CI gaps.

@diegorusso
Copy link
Author

Hello,

thanks for providing more information about your use case.

I've put a request (https://github.com/WorksOnArm/equinix-metal-arm64-cluster/issues/325) in for a bare metal AARch64 machine via WorksOnArm to help you out with AArch64 Linux.

Regarding "AArch64 Linux JIT CI for CPython" if you mean the public CPython project and for generic AArch64 test, you can follow the latest here: https://discuss.python.org/t/pep-11-proposal-to-promote-aarch64-plaftorms-to-tier-1/44774/24 TLDR: we're working on it.

@brandtbucher
Copy link
Member

Thanks! Looks like that issue got an "approved" label, which seems promising. :)

@diegorusso
Copy link
Author

Correct, the request has been approved. I'm going to provision the machine by tomorrow. In the meantime can you suggest who the best person is to discuss access/admin to the machine?

@brandtbucher
Copy link
Member

That would be @mdboom.

@mdboom
Copy link
Contributor

mdboom commented Apr 25, 2024

That would be @mdboom.

Yep. I just now got in touch via e-mail. Looking forward to chatting.

@diegorusso
Copy link
Author

Quick update on this. The machine has been provisioned and it has a FQDN. On Wednesday we will have a catch-up with Mike so we can decide the access to this machine.

@diegorusso
Copy link
Author

diegorusso commented May 1, 2024

@mdboom has now full access to the AArch64 machine. It's an Ubuntu 22.04, 80 cores, 256GB memory, ~870GB NVMe storage.
I've installed basic dependencies to build CPython, but I'll leave to Mike to install whatever he needs to have the machine hooked into the CI system.

@brandtbucher
Copy link
Member

Thanks @diegorusso!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants