-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci.jenkins.io] Use a new VM instance type #3535
Comments
Should use a backup policy (merged #3527):
|
…m resource group (#348) Related to jenkins-infra/helpdesk#3535, this PR is the first major step to create a new (faster + cheaper) ci.jenkins.io VM managed as code. Please note the following elements: - A new resource group is created to avoid messing up with the current one (to avoid any confusion). The goal will be to remove the older resource group once the migration will be complete - Neither DNS record, subnets or snapshot (of the data disk for migration) are constrained by this - StandardSSD are used instead of PremiumSSD (V1/V2). The goal is to start with a cheaper storage and see the current IOPS usage (since we [removed the plugin-config-history plugin](jenkins-infra/helpdesk#3528) and [added an S3 artifact caching proxy](jenkins-infra/helpdesk#3496) the need for IOPS is lowered). - Keep using a 512 Gb SSD though, because current usage is ~ 240 Gb: if we want to stay under 80% usage, AND since [Azure managed disks](https://learn.microsoft.com/en-us/azure/virtual-machines/disks-types#standard-ssds) are either making you pay for 256 or 512 Gb (unless Premium SSD v2), better to have the bigger disk as possible. - No need for a storage account (it was only used for boot diagnostics, which we should not need with Terrafomr IAC based VM) - Registration to Puppet is done under the name `ci.jenkins.io` : the current VM is [registered as `azure.ci.jenkins.io`](https://github.com/jenkins-infra/jenkins-infra/blob/e425203e0ffafdcf8b2d5e675ad838c5e73cd687/manifests/site.pp#L76-L79) so no risk of conflicts Please note that no security group is added...yet. I want to start with an "open VM" before applying security groups in a subsequent PR (everything will be removed and re-created once verified functionnal). --------- Signed-off-by: Damien Duportal <damien.duportal@gmail.com> Co-authored-by: Tim Jacomb <21194782+timja@users.noreply.github.com>
|
…om internet (#349) Related to jenkins-infra/helpdesk#3535, this PR follows up #348 By default, all accesses are forbidden in the security group, so we cannot reach the VM. This changes adds a set of security group rules to the ci.jenkins.io controller subnet to: - Allow incoming SSH requests from the private VPN (as public and private networks are peered) to the private IP of the VM - Nice to have once the access is validated: a private DNS record in the VPN subnet - Allow incoming HTTP, HTTPS and TCP Inbound protocols from the internet to the VM --------- Signed-off-by: Damien Duportal <damien.duportal@gmail.com>
Update: ci.jenkins.io is now using inbound agent running from the new virtual network.
Watching the builds (ping @lemeurherve not urgent but I'll try to check the CI integration in datadog to see if any pattern arise here - #3573) |
Next step: bootstraping a fully operational VM for the new ci.jenkins.io
|
Related to jenkins-infra/helpdesk#3535 - Adds 2 DNS A records to reach the VM (without changing ci.jenkins.io, yet) - Rename resources to stick to the "controller" naming like we did with trusted.ci (the proposal to make a module from @timja makes sense for a controller setup as we instantiate it 3 times, so let's get trusted and CI close in order to make the module happen later) - Set up a first set of NSG rules to controler outbound --------- Signed-off-by: Damien Duportal <damien.duportal@gmail.com>
Update (4th of July):
|
Update (5th of July)
|
…426) The NSG rules names global, I had collisions with trusted.ci.jenkins.io's NSG while working on jenkins-infra/helpdesk#3535. This PR renames ci.jenkins.io's NSG rules that could be confusing. That kind of problem would be ideally solved by creating our own Terraform module. Signed-off-by: Damien Duportal <damien.duportal@gmail.com>
Todo list to close this issue:
|
Related to jenkins-infra/helpdesk#3535 (comment) This PR removes: - The storage account `cijenkinsiovmagents` which was used by the Azure VM plugin of the former VM - The resource group `eastus-cijenkinsio` which was used by the Azure VM and ACI plugins of the former VM Signed-off-by: Damien Duportal <damien.duportal@gmail.com>
Closing the issue as the work is finished |
What is the problem?
The current VM for ci.jenkins.io starts to show issues:
Also, this VM was sized a few years ago with a slighlty different context: JDK8 for running the controller (e.g. less CPU usage but more memory usage), no UEFI bootloader (v1 generation), Ubuntu 18.04.
Finally, managing this VM is manually managed for the infrasrtucture layer (initially created with Terraform, but then changed to manual management).
** What should we do**
There are numerous tasks for this VM:
** How could we do it**
Proposal: to avoid any maintenance overhead and migration risk, the infra team thought of the following plan:
ci.jenkins.io
(reminder:ci.jenkins.io
used to be the named under Puppet for the AWS VM, the current name of the Azure VM isazure.ci.jenkins.io
)=> this would avoid disrupting the current ci.jenkins.io service until the effective migration
Validation steps would be:
Checking an empty controller is set up under Ubuntu 22.04 by Puppet management
Checking that a manual
bom
build (chore(pipeline) switch to podTemplate instead of label to use the new node pool jenkinsci/bom#1969 (comment)) is not worse than the current (ideally, it should have a bit less overhead forsh
steps)Then, the next step would be to add a data disk, smaller than the current one (thanks to
ci.jenkins.io
disk almost full #3492 and [ci.jenkins.io] Use Artifact Manager to store archived artifacts (and stashes) #3496 we are able to use less disk space) to decrease costsFinally migrating the service:
The text was updated successfully, but these errors were encountered: