-
Notifications
You must be signed in to change notification settings - Fork 0
AWS meeting 2023 06 08
Kenneth Hoste edited this page Jun 8, 2023
·
1 revision
- link to AWS project doc: https://docs.google.com/document/d/1CHG9fCh2LkfJ-EI8J-_Wr5NpHL5iwm8Wu6syfK9h7-c
- Thu 13 July 2023, 12:00 UTC
- 11 May 2023: https://github.com/EESSI/meetings/wiki/AWS-meeting-2023-05-11
- 13 April 2023: https://github.com/EESSI/meetings/wiki/AWS-meeting-2023-04-13
- 9 Mar 2023: https://github.com/EESSI/meetings/wiki/AWS-meeting-2023-03-09
- 11 Jan 2023: https://github.com/EESSI/meetings/wiki/AWS-meeting-2023-01-11
- status update on sponsored credits
- Costs are about $3k/month for March-May'23 (up from ~$1.5k/month)
- EFS costs cut in half after cleanup
- $15k left under end of Nov'23
- Is there a tutorial about how to explore and manage costs?
- Should we have alarms for everything? Is this possible?
- Need to get familiar with Cost Explorer
- Allows cost by tags - allow us to classify things
- tags already there for CitC Slurm cluster (fair-mastodon)
- EBS is the thing that may cost us in the end
- Current EFS cost is about 800USD a month (used to be double that - before cleanup)
- ~1.8TB in size (only for Slurm cluster, CVMFS Stratum-1 is using 1 TB dedicated disk)
- should look into tiering in our EFS filesystem, to offload stuff not being access to cold storage - see https://aws.amazon.com/efs/features/infrequent-access/ + https://aws.amazon.com/blogs/aws/new-amazon-efs-intelligent-tiering-optimizes-costs-for-workloads-with-changing-access-patterns/
- New S3 POSIX plugin that may be interesting
- Allows cost by tags - allow us to classify things
- Costs are about $3k/month for March-May'23 (up from ~$1.5k/month)
- Looking into using a CDN
- partially motivated by really low bandwidth when talking to CVMFS Stratum-1 mirror server
- even different CDNs for different contents (CloudFront in AWS vs CloudFlare, Azure CDN, ...)
- could help with costs, bandwidth, ...
- observed ~40MB/sec when having both CVMFS mirror and ParallelCluster in eu-west
- vs 2MB/sec when having ParallelCluster in eu-central (and CMVFS mirror in eu-west)
- what speedup would we get when adopting a CDN?
- Tech short on EESSI
- Edit looks good
- Tweaked some of our links to new domain (
eessi.io
,eessi.io/docs
) - Integration with ParallelCluster (separate video?)
- Situation with
libfabric
is a bit tricky but solveable- We control the linker, so there are tricks we can use
- However, have noticed issues that need discussion (demo prepared)
- GPU support is in the works
- Proof of concept working
- Permission from NVIDIA to distribute CUDA compat libraries under CUDA EULA
- Need to figure out how to integrate things with our linker
- Relevant libraries are listed in https://github.com/apptainer/apptainer/blob/main/etc/nvliblist.conf
- We want a specific order for library resolution (need to handle compat libraries, host libraries and containers)
- We control the linker in EESSI so this is do-able
- Situation with