-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hyperthreading #32
Comments
There are two sockets, with 16 CPUs each. HyperThreading is disabled. |
...why is HyperTreading disabled? |
Hyperthreading is typically off on HPC clusters, as it is expected that each processor is almost always fully utilized. If you oversubscribe threads to cores, you can lose performance due to the constant context switching. Of course, it depends a lot on what the typical workload looks like. For CPU-bound workloads, hyperthreading is better left disabled. It might be different for other workloads. |
Ok, I think this needs to be revisited. Trixie is not a "general" HPC machine - it was designed for GPU and data intensive workloads, so we should be tuning it for the purpose. Hyperthreading should be turned back on, as I suspect we are bottlenecking the cards right now (or at least we have the potential to). |
Have you looked at |
That just indicates there isn't a bottleneck with that particular model. I am not aware of any vendor who is supplying deep learning gear with hyperthreading disabled. The onus of proof is the other way here. |
I am fairly sure this is how the cluster was first delivered. If you want to enable hyperthreading, we can queue that work for the next compute node refresh. We'll probably want to involve the working group on that decision. My own experience is with CPU-based workloads, so I won't argue for/against hyperthreading here. We can implement whatever the working group thinks is best for the cluster. |
on the niagara supercomputer, if you request 40 CPUs you get 40 physical CPUs and the hyperthreads that go along with it (that would make 80 threads). The user can easily turn hyperthreading off on-the-fly using OMP_NUM_THREADS=1. |
Documentation update to reflect that hyperthreading is currently off. |
Agreement has been reached to turn hyperthreading on for all compute nodes during the next scheduled maintenance window. Details will be communicated once this occurs. |
Did this happen? Can this issue be closed? |
Closing issue; HT is enabled and will remain enabled until requested otherwise by the users |
The command top shows 32 CPU. Are trixie compute nodes dual socket, i.e. they have 2 Xeon6130 processors? If so, I don't understand why we aren't seeing 64 CPU in top (because of hyperthreading).
The text was updated successfully, but these errors were encountered: