-
Notifications
You must be signed in to change notification settings - Fork 0
meeting July 1 2021
Kenneth Hoste edited this page Jul 1, 2021
·
5 revisions
- date & time: Thu July 1st 2021 - 2pm CEST (12:00 UTC)
- (every first Thursday of the month)
- venue: (online, see mail for meeting link, or ask in Slack)
- agenda:
- Quick introduction by new people
- EESSI-related meetings in last month
- Progress update per EESSI layer
- 2021.06 version of pilot repository
- AWS/Azure update
- Infrastructure in AWS
- Progress towards automating deployment of software installations
- Q&A
(by Bob, Kenneth)
- Ward Poelmans (VUB)
- Laura Redfern (Microsoft, HPC specialist)
- ...
- new multi-arch containers available through GitHub container registry
- container images should be removed from Docker Hub?
-
- notification on EESSI mailing list + Slack
-
- To solve the issue with the Github Action, let's go for the solution that provides a Prefix installation on ci.eessi-hpc.org.
- user-level CUDA driver update via
cuda-compat
- Alan could use some help
- We bumped the limits on the number of cores that we can request in AWS.
- project approved by MS to provide $40k in Azure credits (through SURF)
- could be leveraged for
- build nodes
- GitHub Actions (CI stuff)
- Caspar's PR for the GROMACS ReFrame test should be more or less ready: https://github.com/EESSI/software-layer/pull/115 It's a very generic test, where a user just have to provide a few required parameters (e.g. number of tasks). It would be nice if people can try it out, even on their local GROMACS installations.
- Christian: end result of Arm/AWS hackathon (https://a-hug.org/hackathons/aws-hackathon/ + https://github.com/arm-hpc-user-group/Cloud-HPC-Hackathon-2021) could be a nice set of ReFrame tests to play with...
- Some PRs open that need to be finished and merged first
- If you want to help out with building software, let us know
- Terje ran into and solved a lot of issues with terraform, permissions, etc.
- Perhaps it's useful to share how all those issues have been solved and how we set up everything for other AWS users?
- Limits for total # vCPUs has been raised significantly
- We can now look into replicating things like the GROMACS scaling results posted in https://aws.amazon.com/blogs/hpc/gromacs-price-performance-optimizations-on-aws/
- Caspar: why does the Stratum 0 make the PR?
- This was mostly done because the machine that does this needs a Github token, and the Stratum 0 is a secure machine for storing this.
- Jörg: how we can ensure that no malicious stuff gets injected into the tarball on a build node?
- Hard to guarantee that this will be impossible, but we try to automate the builds by creating/destroying the build machines on the fly, to prevent human access as much as possible.
- Jörg went to an EOSC workshop, and thinks that EESSI could fit in well here. Also Alan comments that it could be a good opportunity for EESSI to get funding. There are several EOSC working groups already, but it's not clear if there's one for software. https://eosc-portal.eu/eosc-working-groups