Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orka allocated storage Alert #3257

Closed
UlisesGascon opened this issue Mar 25, 2023 · 13 comments
Closed

Orka allocated storage Alert #3257

UlisesGascon opened this issue Mar 25, 2023 · 13 comments

Comments

@UlisesGascon
Copy link
Member

We just received a notification in this ticket via email.

Hey OpenJS Foundation,
This is an automated message from MacStadium monitoring. You are currently using 90.60% of your allocated storage.

Customers using more than their allocated storage (i.e. >100%) have 30 days to delete unused or unwanted data.

After 30 days, any storage overage will be added to your billing plan. Please contact your sales rep if you’d like to discuss this further.

Thank you,
Team MacStadium

There is more relevant information in the thread as there are multiple automatic messages in the ticket:

Host or Device Name: MS-A-51-PSB01
Host or Device IP Address: 10.86.32.60
LMD2922272393 MS-A-51-PSB01 - error - File Systems Utilization-ORKA-A-BETA-20 PercentUsed
Host: MS-A-51-PSB01
Datasource: File Systems Utilization-ORKA-A-BETA-20
Datapoint: PercentUsed
Level: error
Start: 2023-03-25 13:54:40 EDT
Duration: 0h 0m
Value: 90.6087
Reason: PercentUsed > 80 90
Alert Rule:

I can't find the reference to MS-A-51-PSB01 or 10.86.32.60 in the macstadium inventory. 🤔

@UlisesGascon
Copy link
Member Author

Maybe this is related to #3240 (comment), but not sure at all.. 🤔

@UlisesGascon
Copy link
Member Author

As agreed in #3299, I asked the support team about the available space and how the snapshots are stored. Ticket reference: SERVICE-157767

@UlisesGascon
Copy link
Member Author

Current status: No technical answer yet from support (follow up on your request internally).

@UlisesGascon
Copy link
Member Author

Current Status:

  • Support team didn't provided the details requested few weeks ago, so I re-asked the question.
  • 🚨 There is a new ticket auto-generated SERVICE-160622 2023-05-01 at 23:47 with the same alert template as the previous ticket. Based on the new ticket You are currently using 94.20% of your allocated storage.

@nodejs/build I am not sure why the allocated space had increased as I removed several VMs for #3087. If we pass the 100% allocation they might add additional charges, but seems like the Orka cluster will keep working. But I am not 100% confident.

@UlisesGascon
Copy link
Member Author

UlisesGascon commented May 8, 2023

Current Status:

  • Just confirmed by the Support team that currently we are using 89% of the allocated storage. This makes sense now as the changes made in MacOS 10.14 deprecation plan #3087 where made after the alert was created.
  • As well there is a confirmation regarding to the storage usage: This storage is associated with images and ISOs in your environment.

Next Steps

  • Remove the non-relevant images to free some space
  • Investigate if the VMs can be smaller: some VMs are using 80G (Ex: macos1014-x64-1_01-may-2023) while others are using only 40G (Ex: macos1014-x64-2_01-may-2023)
  • Ask support how to check the storage in use, as far I see there is no direct command in the CLI for this 🤔

@UlisesGascon UlisesGascon self-assigned this May 8, 2023
@UlisesGascon
Copy link
Member Author

UlisesGascon commented May 8, 2023

Legacy images removed: 90GBigSurSSH.img, 90GCatalinaSSH.img, Mojave.img, empty90G.img, nodejs-test-1014.img and nodejs-release-1015.img. In total 108GB of space freed.

Before
Captura de pantalla 2023-05-08 a las 20 09 57

After

Captura de pantalla 2023-05-08 a las 20 19 06

@targos
Copy link
Member

targos commented May 10, 2023

Nice! I wonder if we could have only one image per macos version? All instances of the same version could be based on the same image.

@UlisesGascon
Copy link
Member Author

Yes @targos, I think is a good idea to keep a base image with SSH Keys as the base image (prior to Ansible step). That way we can save a lot of space. I noticed also that some of the backups are not restoring well, at least for 10.15 (#3218 (comment)).

If we want to reduce the VMs size from 80G to 40G we will need to purge the working VMs and generate a new ones, so we can use the new slim-images for it. So, this will help us to have enough space for 12.x machines (#3240 (comment))

Known limitations:

  • You cannot reduce the size of a disk. See
  • You cannot delete an image that is used by a VM

@AshCripps
Copy link
Member

Making the images smaller is risky as weve struggled in the past with macos bloating quite suddenly and the nodejs repo itself bloating when built. Also did removing the 90G* images save space? I thought they were macstadium provided and therefore not part of our pool.

@UlisesGascon
Copy link
Member Author

@AshCripps seems like the images are part of the pool based on latest response in Ticket SERVICE-160622

Your environment ORKA-A-BETA-20 currently has 89.25% full.

This is associated with images and ISOs. You are welcome to delete any images you think you may not need or set up to purchase more storage.

image

Do you think that 60G VM size will work? Or it is better to keep 80G?

@AshCripps
Copy link
Member

IIRC the vms on nearform were around that size and they were always a problem but things might be differenet now. The only real issue is if a machine gets selected for a node-commit-test and then like a debug build and a citgm build rapidly eating up the disk space.

@UlisesGascon
Copy link
Member Author

After the discussion in #3362, we agreed to simplify the number of images as follows: 1 image per MacOS version and usage (e.g., test-macos11-x64, release-macos11-x64). This will help us keep only 6 base images in total in the inventory.

So, after a basic calculation (6 images at 90GB each = 540GB), we will be more than okay with the pool size. Also, this confirms that backing up each running VM machine will be impossible given the pool size.

Next steps

  • Delete non-used images and keep 1 image per MacOS version and usage

@UlisesGascon
Copy link
Member Author

Previous status
Images:
Captura de pantalla 2023-07-13 a las 20 00 23

VMs
Captura de pantalla 2023-07-13 a las 20 00 13

Current status

Images:
Captura de pantalla 2023-07-13 a las 20 03 05

VMs:

Captura de pantalla 2023-07-13 a las 20 03 25

Images deleted

macos11-x64-2_11012023
macos1015-x64-2_11012023
macos1015-x64-1_11012023
macos1015-x64-1_08052023
macos1014-x64-3_01-may-2023
macos1014-x64-2_01-may-2023
macos1014-x64-1_01-may-2023

Future steps

  • In the next Orka upgrade I will rename the image macos11-x64-1_11012023 to nodejs-release-11.img, just to follow the convention.
  • Ensure that Infrastructure for MacOS 13.x #3240 will follow the convention for the mac 12 images related.

I will close the issue as the alert has been resolved and there are no actionable items pending.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants