Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System unavailable: build-alibaba-win2012r2-x64-[12] #1818

Closed
sxa opened this issue Jan 7, 2021 · 78 comments
Closed

System unavailable: build-alibaba-win2012r2-x64-[12] #1818

sxa opened this issue Jan 7, 2021 · 78 comments

Comments

@sxa
Copy link
Member

sxa commented Jan 7, 2021

This will prevent alibaba windows builds working as they are currently tied to these machines.

@Willsparker
Copy link
Contributor

-1 is back as I've come to it, I'll look at -2 :-)

@Willsparker Willsparker self-assigned this Jan 11, 2021
@Willsparker
Copy link
Contributor

Rather interestingly, neither of the machines have the Jenkins Agent installed as a service. They appeared to be running the agent in a cygwin terminal window. I'll install it on both

@Willsparker
Copy link
Contributor

I can't install them on both, due to the lack of IcedTea-Web. I can install it, but it seems that the machines have a stripped down version of the playbook running on them, and I'm unsure of the reason for that. (asked about it here: https://adoptopenjdk.slack.com/archives/C53GHCXL4/p1610356948467700 )
In the meanwhile, I'll get the Jenkins agent running in a Cygwin terminal again, so they're at least usable.

@Willsparker
Copy link
Contributor

@Haroon-Khel said he's installing the missing packages on the machines (ref: https://adoptopenjdk.slack.com/archives/C53GHCXL4/p1610356948467700 ).

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Jan 11, 2021

Missing packages have been installed on both alibaba machines, except for OpenSSL packages. Both experienced the error

TASK [Install OpenSSL-1.1.1i 64-bit (VS2013)] ******************************************************************************************************************************************
task path: /Users/hkhel/AdoptOpenJDK/openjdk-infrastructure/ansible/playbooks/AdoptOpenJDK_Windows_Playbook/roles/OpenSSL/tasks/main.yml:73
fatal: [8.208.87.18]: FAILED! => {"changed": true, "cmd": "set PATH=C:\\Strawberry\\perl\\bin;C:\\openjdk\\nasm-2.13.03;%PATH% && .\\vcvarsall.bat AMD64 && cd C:\\temp\\OpenSSL-1.1.1i && perl C:\\temp\\OpenSSL-1.1.1i\\Configure VC-WIN64A --prefix=C:\\openjdk\\OpenSSL-1.1.1i-x86_64-VS2013 && nmake install > C:\\temp\\openssl64-VS2013.log &&
 nmake -f makefile clean", "delta": "0:00:03.448210", "end": "2021-01-11 12:34:31.186987", "msg": "non-zero return code", "rc": 1, "start": "2021-01-11 12:34:27.738777", "stderr": "'nmake' is not recognized as an internal or external command,\r\noperable program or batch file.\r\n", "stderr_lines": ["'nmake' is not recognized as an internal or external command,", "operable program or batch file."], "stdout": "The specified configuration type is missing.  The tools for the\r\nconfiguration might not be installed.\r\nConfiguring OpenSSL version 1.1.1i (0x1010109fL) for VC-WIN64A\r\nUsing os-specific seed configuration\r\nCreating configdata.pm\r\n

Looking into it

@sxa
Copy link
Member Author

sxa commented Jan 13, 2021

We're having some issuejs on these machines after (a) running the rest of the playbooks and (b) Switching the jenkins agent to run as the jenkins user. While most of them have now been resolved I'm still getting the following issue (even after a reboot) on -1 which I haven't yet been able to fully diagnose ... Still working on it but any crazy ideas welcome :-)

17:00:23  Running gradle with /cygdrive/c/openjdk/jdk-11 at /cygdrive/c/workspace/openjdk-build/workspace/.gradle
17:00:23  Exception in thread "main" java.io.FileNotFoundException: \cygdrive\c\workspace\openjdk-build\workspace\.gradle\wrapper\dists\gradle-6.5-bin\6nifqtx7604sqp1q6g8wikw7p\gradle-6.5-bin.zip.lck (Access is denied)

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Jan 13, 2021

OpenSSL 64 bit VS2013 also isnt installed on either -1 or -2 due to vcvarsall.bat not being available in in the VS2013 folders. Reinstalling VS2013 didnt seem to solve this

@Willsparker
Copy link
Contributor

@sxa Have you tried running it with a different JDK (or reinstalled JDK11) ? Presuming you've already looked at all the permissions of the folders and everything.

@Haroon-Khel
Copy link
Contributor

Latest failure https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-windows-x64-dragonwell/38/console
Still the same error, but running the same build command on a cygwin shell, as the jenkins user, on build-alibaba-win2012r2-x64-1 in an rdp session doesnt seem to hit this error

@Haroon-Khel
Copy link
Contributor

I changed the variable CYGWIN_WORKSPACE to C:\Users\Jenkins\workspace (it was C:\Jenkins\workspace before). This may have done the trick https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-windows-x64-hotspot/884/console (the hotspot builds were failing for the same reason too)

@Haroon-Khel
Copy link
Contributor

https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-windows-x64-dragonwell/40/console
A dragonwell build on alibaba -1 passed, but failed at the installer stage. I think the variable change helped to circumvent the gradle error

@Haroon-Khel
Copy link
Contributor

Running the dragonwell jdk8 job on alibaba-1, https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-windows-x64-dragonwell/46/console, jenkins seems to have a problem with clearing the C:\Users\Jenkins\workspace workspace

@Haroon-Khel
Copy link
Contributor

Re ran the jdk11 dragonwell job on alibaba-1, same error https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-windows-x64-dragonwell/41/console. Oddly this wasnt a problem yesterday when I ran both jdk11 hotspot and dragonwell jobs on the same machine one after the other

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Jan 19, 2021

I deleted the C:\Users\Jenkins\workspace directory. I re ran the jdk11 hotspot and dragonwell and jdk 8 dragonwell jobs one after the other. Jenkins didnt seem to complain about not being able to delete workspaces. The CYGWIN_WORKSPACE variable is still C:\Users\Jenkins\workspace for alibaba-1

@Haroon-Khel
Copy link
Contributor

Regarding the 2013 compiler on alibaba-2, jdk 8 hotspot can build fine. jdk 8 dragonwell exits with this error

Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
27 errors
make[2]: *** [CompileJavaClasses.gmk:336: /cygdrive/c/cygwin/home/jenkins/openjdk-build/workspace/build/src/build/windows-x86_64-normal-server-release/jdk/classes/_the.BUILD_JDK_batch] Error 1
make[1]: *** [BuildJdk.gmk:64: classes-only] Error 2
make: *** [/home/jenkins/openjdk-build/workspace/build/src//make/Main.gmk:117: jdk-only] Error 2

@sxa
Copy link
Member Author

sxa commented Jan 20, 2021

Looking at https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-windows-x64-dragonwell/53/consoleFull I think that might be the same error occurring on one of the other build machines, so it could well be a problem in the codebase at the moment as opposed to a problem with that machine, so at least for now I wouldn't worry too much about that error.

@Haroon-Khel
Copy link
Contributor

Just an update:
It was identified that the alibaba machines are having the same problem as #1662, in which the leftover _the.. file prevents jenkins from deleting the workspace before running its job. This has affected other windows boxes, hence the pr adoptium/temurin-build#2204, so I have put in a similar pr adoptium/temurin-build#2400.

Related issue adoptium/temurin-build#2205

@Haroon-Khel
Copy link
Contributor

I have also changed the CYGWIN_WORKSPACE variable on both alibaba machines to C:\Jenkins\temp since C:\Jenkins\workspace results in the gradle error

17:00:23  Running gradle with /cygdrive/c/openjdk/jdk-11 at /cygdrive/c/workspace/openjdk-build/workspace/.gradle
17:00:23  Exception in thread "main" java.io.FileNotFoundException: \cygdrive\c\workspace\openjdk-build\workspace\.gradle\wrapper\dists\gradle-6.5-bin\6nifqtx7604sqp1q6g8wikw7p\gradle-6.5-bin.zip.lck (Access is denied)

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Feb 3, 2021

Yeah until now I assumed that the cmake role was responsible for installing cmake in cygwin64\bin\cmake. I guess not

@Haroon-Khel
Copy link
Contributor

Im going to install cmake on alibaba-2 and see if I get the same error

@sxa
Copy link
Member Author

sxa commented Feb 3, 2021

I assume the way our cmake role works is that it checks in cygwin64\bin\ for a prepackaged cmake, else it installs it separately in Program Files

While that looks like it's what it's doing it doesn't seem particuarly sensible since it'll be repeatedly trying to install the seperate one in the case where there isn't a copy in c:\cygwin64\bin - it shouldn't be trying to reinstall the separate cmake if it's already there under C:\Program Files

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Feb 3, 2021

I just ran the cmake role on -2. It didnt install it in either Program Files or cygwin\bin, depsite it saying changed. Thats confusing

TASK [Download cmake installer] *********************************************************************************************************************************************************************************************************************************************************
task path: /Users/hkhel/AdoptOpenJDK/openjdk-infrastructure/ansible/playbooks/AdoptOpenJDK_Windows_Playbook/roles/cmake/tasks/main.yml:12
changed: [8.208.87.18] => {"changed": true, "checksum_dest": "8b0cbfc6be83e31a058c8ef282fe204862809ffcd8788bc19a8f0eb457f71187", "checksum_src": "8b0cbfc6be83e31a058c8ef282fe204862809ffcd8788bc19a8f0eb457f71187", "dest": "C:\\temp\\cmake.msi", "elapsed": 3.1962816, "msg": "OK", "size": 18235900, "status_code": 200, "url": "https://cmake.org/files/v3.7/cmake-3.7.2-win64-x64.msi"}

TASK [Install cmake] ********************************************************************************************************************************************************************************************************************************************************************
task path: /Users/hkhel/AdoptOpenJDK/openjdk-infrastructure/ansible/playbooks/AdoptOpenJDK_Windows_Playbook/roles/cmake/tasks/main.yml:22
changed: [8.208.87.18] => {"changed": true, "rc": 0, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

@sxa
Copy link
Member Author

sxa commented Feb 3, 2021

You'll need to go onto the machine and try and install it manually using the command in the playbook and see what happens and/or search the whole machine for cmake.exe to see if the playbook has put it anywhere, but the point I was making before is whether it's even used at all on the other machines or if they are actually using cmake from cygwin for the OpenJ9 builds.

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Feb 3, 2021

Both build-azure machines use 3.14.5, while both build-ibmcloud machines use 3.17.3. Both use cygwin's cmake

@Haroon-Khel
Copy link
Contributor

Before your comment, I installed cmake on -2 via the msi. The same error appeared on -2
https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-windows-x64-openj9/914/console

@sxa
Copy link
Member Author

sxa commented Feb 3, 2021

Before your comment, I installed cmake on -2 via the msi. The same error appeared on -2
https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-windows-x64-openj9/914/console

Is that different from what happened on the machine before you installed cmake from the MSI? If every other machine is using the cmake from cygwin I'm not sure we have a requirement for the other one

@Haroon-Khel
Copy link
Contributor

Is that different from what happened on the machine before you installed cmake from the MSI?

Yes. Before installing it, there wasnt a cmake on the machine so would give a cmake not found error. I installed it via the msi on -2 just to see if I could recreate the error. It can always be uninstalled.

In terms of next steps, the only thing I can think of is to reinstall cygwin using the playbooks to get the cygwin cmake

@sxa
Copy link
Member Author

sxa commented Feb 3, 2021

Yes. Before installing it, there wasnt a cmake on the machine so would give a cmake not found error. I installed it via the msi on -2 just to see if I could recreate the error. It can always be uninstalled.

Hmmm - has it also added itself to the system PATH? We don't add that directory during the build scripts

@sxa
Copy link
Member Author

sxa commented Feb 3, 2021

In terms of next steps, the only thing I can think of is to reinstall cygwin using the playbooks to get the cygwin cmake

I don't think it needs a reinstall - from memory you can add packages by repliacting the command-line parameters to the cygwin installer like https://github.com/AdoptOpenJDK/openjdk-infrastructure/blob/81d61d27006e6832c063905c80421a6ca3cd0db9/ansible/playbooks/AdoptOpenJDK_Windows_Playbook/roles/cygwin/tasks/main.yml#L20

Or worst case you just re-run the installer and select the new packages.

@Haroon-Khel
Copy link
Contributor

Screenshot 2021-02-04 at 15 56 14

Using the installer, there isnt an option to install cmake as a package

@Haroon-Khel
Copy link
Contributor

Even the command line arguments you posted doesnt include cmake. I can only assume it comes with the cygwin package from the checklist I posted?

@sxa
Copy link
Member Author

sxa commented Feb 4, 2021

Using the installer, there isnt an option to install cmake as a package

Hmmm that's a bit odd. Since we have it on the others machines it must havecome from somewhere

Even the command line arguments you posted doesnt include cmake. I can only assume it comes with the cygwin package from the checklist I posted?

It's possible it disappeared in the changes made to speed up the cygwin installs a few months ago in which case it needs to be added back in. Although that shoulkd have been picked up by VagrantPlaybookCheck (NOTE: It's just about possible we haven't run a JDK11/J9 build on VPC since it was added I suppose)

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Feb 5, 2021

It's possible it disappeared in the changes made to speed up the cygwin installs a few months ago in which case it needs to be added back in. Although that shoulkd have been picked up by VagrantPlaybookCheck (NOTE: It's just about possible we haven't run a JDK11/J9 build on VPC since it was added I suppose)

I checked for changes in the cygwin role, I dont think that cmake was ever in the arguments list

On alibaba-1, I tried uninstalling cygwin by removing the cygwin directory. This partially worked; some files were prevented from being deleted due to a permission error eventhough I tried to delete the directory as the admin user. Nonetheless, I ran the playbook's cygwin role onto the machine which ran fine. Cygwin installed itself in the C:\cygwin64 directory, independent of the existing cygwin directory. The only problem is that this didnt come with a cmake install, so I do not know from where cmake is installed on the azure or ibmcloud machines

@Haroon-Khel
Copy link
Contributor

Like before, Ive created a symlink for the Program Files\CMake in the cygwin64\bin directory so jenkins can use it, but like before it will likely result in a CMake error

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Feb 24, 2021

Update: #1958 allows for cmake to get installed alongside cygwin. Cmake is now on both alibaba machines.
openj9 jdk11 job passed on -2 https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-windows-x64-openj9/929/console

The same job is now running on -1
https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-windows-x64-openj9/933/console

Openssl has been updated from i to j on both machines too

@sxa
Copy link
Member Author

sxa commented Feb 24, 2021

Build seems to have worked ok :-)

@sxa
Copy link
Member Author

sxa commented Feb 24, 2021

I've added the build tag back onto -1 so it should get used for tonight's builds

@sxa
Copy link
Member Author

sxa commented Mar 1, 2021

@Haroon-Khel Is there any outstanding work to be done here?

@Haroon-Khel
Copy link
Contributor

This issue can be closed. The cmake issue was the last missing thing for these machines. Thats now been resolved

@sxa sxa unpinned this issue Mar 1, 2021
@karianna karianna modified the milestones: February 2021, March 2021 Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants