build-scaleway-x64-ubuntu-16-04-2 "installer" machine is build pipeline bottleneck #952

andrew-m-leonard · 2019-10-07T10:16:48Z

Nightly builds are getting bottlenecked on this machine due to it being the only machine capable of installer job due to the require GPG keys. It is not helped by the fact that some very long running jobs also run on it, eg.DockerFile builds taking 4+hrs and openjdk_build_docker_multiarch docker&x64 jobs which take 5-6 hours!! every day!

sxa · 2019-10-29T11:32:37Z

We need more machines capable of performing this fuinction - one across docker and installers is not appropriate.

For the docker work, only docker is required on the machine, so that should be easy to offload

sxa · 2019-10-29T11:45:00Z

System being created - docker-godaddy-ubuntu1604-x64-1 - to offload this work

sxa · 2019-10-29T15:04:25Z

Docker images are struggling to connect to external systems, therefore this is not currently working on the machine ...

sxa · 2019-10-29T15:20:07Z

@cmdc0de Do you know why GoDaddy provisioned machines appear to have issues with external connectivity to docker containers? We've seen this in issue 721 as well

Haroon-Khel · 2019-11-01T10:26:59Z

A proposed fix is to spin up the ubuntu image by running docker run --network=host -it ubuntu. The docker container should should then be able to connect to external systems without error

sxa · 2019-11-16T20:54:29Z

Considering switching away from godaddy for this purpose since I'd rather have a provider that works out of the box - will investigate...

sxa · 2020-02-25T19:29:52Z

@Haroon-Khel Do you know if that option can be set as the default so that we don't need to update the scripts to make it work properly?

Haroon-Khel · 2020-03-03T10:37:05Z

@sxa555 Ive looked through the documentation, but I cant seem to find a solution which sets that variable globally/as a default. Theres a way to do it by using Docker Compose files, but I think that would be a slight overkill. Updating our existing build scripts would be our best bet, though this issue affects only our Go Daddy machines yes?

sxa · 2020-03-03T17:06:40Z

Correct ... I suppose it would depend how many places we needed to make the change in. May be good to get @dinogun involved at this point to see if adding that option to each docker command is feasible and/or whether he knows of a way to default it globally

karianna · 2020-03-30T22:56:47Z

Adding Top priority to this as we're holding up pipelines

sxa · 2020-03-31T12:06:16Z

@karianna Is it still holding pipelines up? The original problems was the docker builds chewing all the resources on the machine, and the machine now has two executors to prevent that.

karianna · 2020-03-31T12:39:41Z

It's still only one host that we're relying on though right? I think we should get rid of the single point of failure in that case.

sxa · 2020-03-31T12:41:48Z

Yes absolutely, but it's not currently holding up pipelines.

aahlenst · 2020-03-31T12:47:18Z

Yesterday, two parallel Docker jobs blocked that machine for hours. There's a thread on Slack in #infrastructure started by Simon.

sxa · 2020-03-31T12:49:48Z

OK thanks - I hadn't seen the system in a state where two docker jobs were running on it. That job should be single threaded I suspect as I'm not sure it's safe to run it in parallel. @dinogun can you comment/confirm?

[EDIT: Just checked and openjdk_build_docker_multiarch is set not to allow concurrent builds]

sxa · 2020-03-31T17:19:22Z

It's still only one host that we're relying on though right? I think we should get rid of the single point of failure in that case.

We had looked at moving this to another machine at godaddy but godaddy servers have unresolved issues with their networking in docker images. See also adoptium/temurin-build#1044 where we are covering where we have logged a few single points of failure that exist in the build systems today.

dinogun · 2020-03-31T17:35:04Z

OK thanks - I hadn't seen the system in a state where two docker jobs were running on it. That job should be single threaded I suspect as I'm not sure it's safe to run it in parallel. @dinogun can you comment/confirm?

If two docker jobs of the same type were running in parallel, that would be very strange (and ideally should never happen). Wondering if the for some reason the multiarch job and the manifest job were running at the same time. Though that should not happen either as jenkins should mark the machine as busy once a job begins executing right ?

sxa · 2020-03-31T17:39:35Z

Though that should not happen either as jenkins should mark the machine as busy once a job begins executing right ?

Incorrect for this case - that machine has two executors therefore allows two jobs to run in parallel (which is why I said that despite being locked to this machine, the docker build shouldn't hold up other things as they'll run on the second executor)

dinogun · 2020-03-31T17:42:20Z

Hmm can we somehow limit only one executor to run all docker related jobs then ?

sxa · 2020-03-31T17:44:44Z

As it happens right now it's running two and it is indeed the multiarch and manifest files that are running together:

I hadn't appreciated that both of those run for over 10 hours, so both executors are getting clogged up, preventing other jobs from running on this machine.

dinogun · 2020-03-31T17:59:25Z

I've stopped the multiarch for now. These are not designed to run together as they periodically cleanup all docker images on the box and so would cause both to fail. We need a way to fix the docker jobs to only one executor.

sxa · 2020-04-01T10:13:27Z

OK - that hasn't been the case for some time - the builds will likely have been running together unless anything else has changed, although I don't recall anyone reporting it until this week.

sxa · 2020-04-01T16:28:42Z

Looking at setting up one or two more machines for this. It will potentially destabilise daily docker image creation on the Linux/x64 machine until we have clear setup instructions for the docker jobs. I've got a setup with the docker keys available and am running a test job on it at the moment. For what are obvious reasons this will take a while ;-)

For future reference, the server used for this appears to require at least 8GB of RAM (4GB without swap wasn't enough - I might also try 4GB with a swapfile just to check)

sxa · 2020-04-02T09:20:32Z

docker job ran successfully on docker-aws-ubuntu1604-x64-1. It also completed in 6h8m instead of 31 hours for the last completed run on build-scaleway-x64-ubuntu-16-04-2

The manifest job triggered after that build ran on docker-aws-ubuntu1604-x64-2 and completed in 2h12 where the previous runs on the scaleway box were up to 11 hours (maybe due to contention, fastest of recent runs on that system was 5h29)

Follow-on job on docker-aws-ubuntu1604-x64-2 has also completed (slightly faster at 4h41 compared to the aforementioned 6h08 on the other new machine). I would not be surprised if these times dropped as the machines re-run the jobs and have more data cached locally

I have locked ("Keep this build forever") one multiarch and manifest job from the old machine temporarily so we can compare output if needed)

dinogun · 2020-04-03T09:25:56Z

@sxa555 docker push to DockerHub is failing on docker-aws-ubuntu1604-x64-1

The push refers to repository [docker.io/adoptopenjdk/openjdk14]
27480ab25448: Preparing
25866305528d: Preparing
16542a8fc3be: Preparing
6597da2e2e52: Preparing
977183d4e999: Preparing
c8be1b8f4d60: Preparing
c8be1b8f4d60: Waiting
denied: requested access to the resource is denied

This can happen if the auth is missing. Can you please check if the ~jenkins/.docker/config.json has been copied over as well ?

sxa · 2020-04-03T11:42:32Z

Fixed - I'd copied the file over with the wrong name onto that machine - apologies

dinogun · 2020-04-03T12:16:47Z

Can you do a quick check and see if docker login works without any prompt for password ?

sxa · 2020-04-03T12:33:57Z

Yep it's fine - I'm also re-running multiarch on x64 (we can restrict the jobs to specific combinations now instead of running all architectures!) to verify it

sxa · 2020-04-03T12:36:05Z

I think it would be good if we can modify the scripts to return suitable non-zero exit codes in that situation (and others) in order to make it easier to understand the success or otherwise of the jobn from the jenkins job status. I might have a go at that or assign one of my team to look at it.

dinogun · 2020-04-03T12:39:36Z

I have some upcoming changes that will fix this. In general better reporting of failures and a summary of which specific docker images failed to build if any.

sxa added the enhancement label Oct 7, 2019

sxa self-assigned this Oct 7, 2019

karianna added bug and removed enhancement labels Oct 15, 2019

sxa modified the milestones: November 2019, October 2019 Oct 29, 2019

sxa modified the milestones: October 2019, November 2019 Nov 1, 2019

sxa pinned this issue Nov 1, 2019

sxa added the Machine Request label Nov 16, 2019

sxa modified the milestones: November 2019, December 2019 Nov 29, 2019

sxa unpinned this issue Dec 12, 2019

sxa modified the milestones: December 2019, January 2020 Dec 31, 2019

karianna modified the milestones: January 2020, February 2020 Feb 3, 2020

karianna modified the milestones: February 2020, March 2020 Mar 6, 2020

sxa pinned this issue Apr 1, 2020

sxa modified the milestones: March 2020, April 2020 Apr 1, 2020

sxa closed this as completed Apr 2, 2020

sxa unpinned this issue Apr 2, 2020

sxa mentioned this issue Apr 16, 2020

Add additional machines that can build Linux installers #1283

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build-scaleway-x64-ubuntu-16-04-2 "installer" machine is build pipeline bottleneck #952

build-scaleway-x64-ubuntu-16-04-2 "installer" machine is build pipeline bottleneck #952

andrew-m-leonard commented Oct 7, 2019

sxa commented Oct 29, 2019

sxa commented Oct 29, 2019

sxa commented Oct 29, 2019

sxa commented Oct 29, 2019

Haroon-Khel commented Nov 1, 2019

sxa commented Nov 16, 2019

sxa commented Feb 25, 2020

Haroon-Khel commented Mar 3, 2020

sxa commented Mar 3, 2020

karianna commented Mar 30, 2020

sxa commented Mar 31, 2020

karianna commented Mar 31, 2020

sxa commented Mar 31, 2020

aahlenst commented Mar 31, 2020 •

edited

Loading

sxa commented Mar 31, 2020 •

edited

Loading

sxa commented Mar 31, 2020

dinogun commented Mar 31, 2020

sxa commented Mar 31, 2020 •

edited

Loading

dinogun commented Mar 31, 2020

sxa commented Mar 31, 2020

dinogun commented Mar 31, 2020

sxa commented Apr 1, 2020 •

edited

Loading

sxa commented Apr 1, 2020 •

edited

Loading

sxa commented Apr 2, 2020 •

edited

Loading

dinogun commented Apr 3, 2020

sxa commented Apr 3, 2020

dinogun commented Apr 3, 2020

sxa commented Apr 3, 2020

sxa commented Apr 3, 2020

dinogun commented Apr 3, 2020

build-scaleway-x64-ubuntu-16-04-2 "installer" machine is build pipeline bottleneck #952

build-scaleway-x64-ubuntu-16-04-2 "installer" machine is build pipeline bottleneck #952

Comments

andrew-m-leonard commented Oct 7, 2019

sxa commented Oct 29, 2019

sxa commented Oct 29, 2019

sxa commented Oct 29, 2019

sxa commented Oct 29, 2019

Haroon-Khel commented Nov 1, 2019

sxa commented Nov 16, 2019

sxa commented Feb 25, 2020

Haroon-Khel commented Mar 3, 2020

sxa commented Mar 3, 2020

karianna commented Mar 30, 2020

sxa commented Mar 31, 2020

karianna commented Mar 31, 2020

sxa commented Mar 31, 2020

aahlenst commented Mar 31, 2020 • edited Loading

sxa commented Mar 31, 2020 • edited Loading

sxa commented Mar 31, 2020

dinogun commented Mar 31, 2020

sxa commented Mar 31, 2020 • edited Loading

dinogun commented Mar 31, 2020

sxa commented Mar 31, 2020

dinogun commented Mar 31, 2020

sxa commented Apr 1, 2020 • edited Loading

sxa commented Apr 1, 2020 • edited Loading

sxa commented Apr 2, 2020 • edited Loading

dinogun commented Apr 3, 2020

sxa commented Apr 3, 2020

dinogun commented Apr 3, 2020

sxa commented Apr 3, 2020

sxa commented Apr 3, 2020

dinogun commented Apr 3, 2020

aahlenst commented Mar 31, 2020 •

edited

Loading

sxa commented Mar 31, 2020 •

edited

Loading

sxa commented Mar 31, 2020 •

edited

Loading

sxa commented Apr 1, 2020 •

edited

Loading

sxa commented Apr 1, 2020 •

edited

Loading

sxa commented Apr 2, 2020 •

edited

Loading