Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All tomee related test issues #2057

Closed
5 tasks
smlambert opened this issue Nov 17, 2020 · 25 comments
Closed
5 tasks

All tomee related test issues #2057

smlambert opened this issue Nov 17, 2020 · 25 comments
Labels
Milestone

Comments

@smlambert
Copy link
Contributor

Collapsing all open tomee issues into this one epic issue:

Will close each sub-issue, as all of the above can be gathered under this one issue and addressed (and if we can not resurrect the health and stability of this set of tests, we can consider removing them).

@andrew-m-leonard
Copy link
Contributor

andrew-m-leonard commented Dec 8, 2020

I've done some local testing of running the external/tomee locally running just the fault-tolerance sub-tests:
Local Ubuntu VM:

  • Build time: 13mins
  • Test time: 1hr13mins

Jenkins Grinder:
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/5240/consoleFull

  • Build time: 1h02mins
  • Test time: 2hr45mins

What seems to be the main difference is the JBoss repository downloads taking 4 times longer on average in Jenkins. The TCK is designed to download many many dependencies and some of these are slow downloads,eg:

Local:

[INFO] Downloaded from JBoss Repository: https://repository.jboss.org/nexus/content/groups/public-jboss/net/sourceforge/htmlunit/htmlunit-core-js/2.33/htmlunit-core-js-2.33.jar
 (1.2 MB at 399 kB/s)

Jenkins:

17:23:26  [INFO] Downloaded from JBoss Repository: https://repository.jboss.org/nexus/content/groups/public-jboss/net/sourceforge/htmlunit/htmlunit-core-js/2.33/htmlunit-core-js-2.33.jar
 (1.2 MB at 56 kB/s)

@aahlenst
Copy link
Contributor

aahlenst commented Dec 8, 2020

@andrew-m-leonard htmlunit-core-js is on Maven Central. I don't see a reason why we download from jboss.org. Is there a way to override that in the tests? Otherwise we can override on the level of settings.xml, but that has probably to go into the playbooks.

@andrew-m-leonard
Copy link
Contributor

@aahlenst The tck pom points at JBoss https://github.com/apache/tomee/blob/5f4da4236c914e249091afdd7f356e7977747e43/tck/pom.xml#L52
However, not everything comes from there, it does download stuff from maven central as well, eg:

[INFO] Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven-toolchain/3.0-alpha-2/maven-toolchain-3.0-alpha-2.jar (36 kB at 222 kB/s)

@andrew-m-leonard
Copy link
Contributor

@smlambert Shelley, the full tomee tck is going to take a long time to "build" and "run tests", it's exaggerated by the slower dependency download time from Jenkins for doing the "build" stage, but even so the test run is going to take a long time...
My suggestion would be to split tomee into parallel sub-modules:

    <module>config</module>
    <module>jwt</module>
    <module>fault-tolerance</module>
    <module>health</module>
    <module>metrics</module>
    <module>rest-client</module>
    <module>openapi</module>
    <module>opentracing</module>

Would that entail duplicating the external/tomee folder for each sub-module and updating dockerfile/tomee-test.sh to target the sub-module in each?

We would only want it to run "weekly", xLinux to start with.

@aahlenst
Copy link
Contributor

aahlenst commented Dec 8, 2020

You could try to override that with http://maven.apache.org/settings.html#Mirrors. But that has to happen in the settings.xml in ~/.m2. Alternatively, using a profile could work (https://maven.apache.org/guides/mini/guide-multiple-repositories.html).

If all that does not help or the JBoss repository contains artifacts that aren't on Maven Central, we should think about placing a proxy within Adopt.

@andrew-m-leonard
Copy link
Contributor

andrew-m-leonard commented Dec 8, 2020

A single test:
Run locally:

[INFO] Running org.eclipse.microprofile.fault.tolerance.tck.config.ConfigPropertyGlobalVsClassVsMethodTest
Dec 07, 2020 11:48:57 AM org.apache.openejb.arquillian.common.Setup findHome
INFO: Unable to find home in: /tomee/tck/microprofile-tck/fault-tolerance/target/tomee
Dec 07, 2020 11:48:57 AM org.apache.openejb.arquillian.common.MavenCache getArtifact
INFO: Downloading org.apache.tomee:apache-tomee:8.0.2:zip:microprofile please wait...
Dec 07, 2020 11:48:57 AM org.apache.openejb.arquillian.common.Zips unzip
INFO: Extracting '/root/.m2/repository/org/apache/tomee/apache-tomee/8.0.2/apache-tomee-8.0.2-microprofile.zip' to '/tomee/tck/microprofile-tck/fault-tolerance/target/tomee'
Dec 07, 2020 11:48:58 AM org.apache.tomee.arquillian.remote.RemoteTomEEContainer configure
INFO: Downloaded container to: /tomee/tck/microprofile-tck/fault-tolerance/target/tomee/apache-tomee-microprofile-8.0.2
Dec 07, 2020 11:48:58 AM org.apache.openejb.arquillian.common.Setup synchronizeFolder
WARNING: Can't find /tomee/tck/microprofile-tck/fault-tolerance/src/test/conf
==>Dec 07, 2020 11:49:03 AM org.apache.openejb.client.EventLogger log
==>INFO: RemoteInitialContextCreated{providerUri=http://localhost:41329/tomee/ejb}
(long delay here)
==>Dec 07, 2020 11:50:27 AM org.apache.openejb.client.EventLogger log
==>INFO: RemoteInitialContextCreated{providerUri=http://localhost:41329/tomee/ejb}
Dec 07, 2020 11:50:28 AM org.apache.openejb.arquillian.common.TomEEContainer undeploy
INFO: cleaning /tomee/tck/microprofile-tck/fault-tolerance/target/workdir/0/ftConfigTest.war
Dec 07, 2020 11:50:28 AM org.apache.openejb.arquillian.common.TomEEContainer undeploy
INFO: cleaning /tomee/tck/microprofile-tck/fault-tolerance/target/workdir/0/ftConfigTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 91.31 s - in org.eclipse.microprofile.fault.tolerance.tck.config.ConfigPropertyGlobalVsClassVsMethodTest

Same test from job: https://ci.adoptopenjdk.net/job/Test_openjdk8_hs_sanity.external_x86-64_linux_tomee/484/consoleFull

12:13:51  [INFO] Running org.eclipse.microprofile.fault.tolerance.tck.config.ConfigPropertyGlobalVsClassVsMethodTest
12:13:51  Nov 02, 2020 12:13:51 PM org.apache.openejb.arquillian.common.Setup findHome
12:13:51  INFO: Unable to find home in: /tomee/tck/microprofile-tck/fault-tolerance/target/tomee
12:13:51  Nov 02, 2020 12:13:51 PM org.apache.openejb.arquillian.common.MavenCache getArtifact
12:13:51  INFO: Downloading org.apache.tomee:apache-tomee:8.0.2:zip:microprofile please wait...
12:13:51  Nov 02, 2020 12:13:51 PM org.apache.openejb.arquillian.common.Zips unzip
12:13:51  INFO: Extracting '/root/.m2/repository/org/apache/tomee/apache-tomee/8.0.2/apache-tomee-8.0.2-microprofile.zip' to '/tomee/tck/microprofile-tck/fault-tolerance/target/tomee'
12:13:52  Nov 02, 2020 12:13:52 PM org.apache.tomee.arquillian.remote.RemoteTomEEContainer configure
12:13:52  INFO: Downloaded container to: /tomee/tck/microprofile-tck/fault-tolerance/target/tomee/apache-tomee-microprofile-8.0.2
12:13:52  Nov 02, 2020 12:13:52 PM org.apache.openejb.arquillian.common.Setup synchronizeFolder
12:13:52  WARNING: Can't find /tomee/tck/microprofile-tck/fault-tolerance/src/test/conf
12:13:56  Nov 02, 2020 12:13:55 PM org.apache.openejb.client.EventLogger log
12:13:56  INFO: RemoteInitialContextCreated{providerUri=http://localhost:37036/tomee/ejb}
12:13:59  Nov 02, 2020 12:13:59 PM org.apache.openejb.client.EventLogger log
12:13:59  INFO: RemoteInitialContextCreated{providerUri=http://localhost:37036/tomee/ejb}
12:13:59  Nov 02, 2020 12:13:59 PM org.apache.openejb.arquillian.common.TomEEContainer undeploy
12:13:59  INFO: cleaning /tomee/tck/microprofile-tck/fault-tolerance/target/workdir/0/ftConfigTest.war
12:13:59  Nov 02, 2020 12:13:59 PM org.apache.openejb.arquillian.common.TomEEContainer undeploy
12:13:59  INFO: cleaning /tomee/tck/microprofile-tck/fault-tolerance/target/workdir/0/ftConfigTest
12:13:59  [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.521 s - in org.eclipse.microprofile.fault.tolerance.tck.config.ConfigPropertyGlobalVsClassVsMethodTest

@andrew-m-leonard
Copy link
Contributor

andrew-m-leonard commented Dec 11, 2020

so i’ve been trying to debug why the tomee microprofile-tck is very slow on some VMs, and I have replicated it on my local VM, but I am seeing an odd behaviour I can’t explain. The tests seem to go slow around where it is doing EJB lookups I believe, but I am seeing the following behaviour that seems to make it go 10 times quicker!

  • Tests running locally on my VirtualBox Ubuntu VM which I am ssh’d into to run the tests
  • I have a separate VirtualBox Terminal UI window attach to the VM
  • The tests run slow when my Terminal Window is asleep (signin screen window)
  • If I go to the Terminal and login to the UI, the separate tests session starts to run 10 times quicker!!

Why??

I am wondering if its some sort of XTerm session thingy…?

@andrew-m-leonard
Copy link
Contributor

Examing "top" while running tests provided no obvious clues

@andrew-m-leonard
Copy link
Contributor

The biggest clue we have is it run very fast on this node: https://ci.adoptopenjdk.net/label/test-ibmcloud-ubuntu1604-x64-1/

@andrew-m-leonard
Copy link
Contributor

@smlambert I think we have at least two options for this issue:

  1. Target external/tomee to run on node https://ci.adoptopenjdk.net/label/test-ibmcloud-ubuntu1604-x64-1/
  2. Split external/tomee down into 8 sub-parallel jobs, one for each of the tomee microprofile tck modules
    The testcases seem to run successfully, they just take a long time to run on some nodes...

@andrew-m-leonard andrew-m-leonard removed their assignment Dec 11, 2020
@smlambert
Copy link
Contributor Author

and option 3 is get to the core issue and figure out why they run fast on ibmcloud and if its a 'solvable' infra issue, correct and/or improve the config of the other nodes so all machines are improved.

@andrew-m-leonard
Copy link
Contributor

andrew-m-leonard commented Dec 11, 2020

and option 3 is get to the core issue and figure out why they run fast on ibmcloud and if its a 'solvable' infra issue, correct and/or improve the config of the other nodes so all machines are improved.

Yes ideally... i've tried for several days, and can't work it out, so i'm leaving it for the moment as it's frustrating me!!
It's difficult when you don't really know what the 3rd party code is doing..

@smlambert
Copy link
Contributor Author

smlambert commented Dec 11, 2020

gotcha, if we are to do an interim mitigation, I guess I would not vote for option 2, as:

  • you have already said the external tests are confusing since there are so many child jobs running in parallel, this would add grandchildren running in parallel
  • provides no continuity in job targets as we would be creating new ones, affects the view of history in TRSS etc
  • the other flavours of mp-tcks tests (openliberty, payara, thorntail) are running in a single target

@smlambert
Copy link
Contributor Author

smlambert commented Dec 11, 2020

Noting that a few other actions may be worth trying:

  • pose questions to the upstream tomee community
  • grab an older release of a JDK (back when these tests were added, when they ran 'better') and see if same issues exist
  • presume you already tried with other JVMs, and are seeing same issues with them?
  • run on a different tag of tomee

@andrew-m-leonard
Copy link
Contributor

thanks for the ideas @smlambert I have been investigating the tomee community, i've search their Jira bugs database and have requested access to their maillist, awaiting response...
I've not tried openj9 or an older jvm, i'll give that a try today

@andrew-m-leonard
Copy link
Contributor

jdk8u275_openj9-0.23.0 exhibits the same slow performance

@andrew-m-leonard
Copy link
Contributor

Discovered the Catalina log file, and this shows on every test where it starts the server during the long time it sits here:

14-Dec-2020 09:21:24.483 INFO [main] sun.reflect.DelegatingMethodAccessorImpl.invoke Server initialization in [3,530] milliseconds
14-Dec-2020 09:21:24.551 INFO [main] org.apache.tomee.catalina.OpenEJBNamingContextListener.bindResource Importing a Tomcat Resource with id 'UserDatabase' of type 'org.apache.catalina.UserDatabase'.
14-Dec-2020 09:21:24.554 INFO [main] org.apache.openejb.assembler.classic.Assembler.createRecipe Creating Resource(id=UserDatabase)
14-Dec-2020 09:21:24.596 INFO [main] sun.reflect.DelegatingMethodAccessorImpl.invoke Starting service [Catalina]
14-Dec-2020 09:21:24.601 INFO [main] sun.reflect.DelegatingMethodAccessorImpl.invoke Starting Servlet engine: [Apache Tomcat (TomEE)/9.0.35 (8.0.2)]
14-Dec-2020 09:21:24.753 INFO [main] org.apache.catalina.core.StandardContext.setClassLoaderProperty Unable to set the web application class loader property [clearReferencesRmiTargets] to [true] as the property does not exist.
14-Dec-2020 09:21:24.783 INFO [main] org.apache.catalina.core.StandardContext.setClassLoaderProperty Unable to set the web application class loader property [clearReferencesObjectStreamClassCaches] to [true] as the property does not exist.
14-Dec-2020 09:21:24.785 INFO [main] org.apache.catalina.core.StandardContext.setClassLoaderProperty Unable to set the web application class loader property [clearReferencesObjectStreamClassCaches] to [true] as the property does not exist.
14-Dec-2020 09:21:24.785 INFO [main] org.apache.catalina.core.StandardContext.setClassLoaderProperty Unable to set the web application class loader property [clearReferencesThreadLocals] to [true] as the property does not exist.

@andrew-m-leonard
Copy link
Contributor

It seems wasteful that the Catalina server is starting & stopping on every test...

@andrew-m-leonard
Copy link
Contributor

andrew-m-leonard commented Dec 14, 2020

ah hah!

14-Dec-2020 09:28:18.597 WARNING [main] org.apache.catalina.util.SessionIdGeneratorBase.createSecureRandom Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [96,387] milliseconds.

https://stackoverflow.com/questions/28201794/slow-startup-on-tomcat-7-0-57-because-of-securerandom

@andrew-m-leonard
Copy link
Contributor

The following PR fixes the problem: https://github.com/AdoptOpenJDK/openjdk-tests/pull/2113/files

@andrew-m-leonard
Copy link
Contributor

andrew-m-leonard commented Dec 14, 2020

Following maybe just glitches, not seen during my many re-runs, suggest we get these re-enabled and see if it re-occurs:

@andrew-m-leonard
Copy link
Contributor

My Grinder re-runs show the gradle dependency downloads are all downloaded using https, suggest the following has already been fixed upstream:

@smlambert
Copy link
Contributor Author

#2113 is now merged, will close this but monitor to see if glitches re-occur.

@andrew-m-leonard
Copy link
Contributor

For #1993 that includes openliberty tck on s390x, I ran a test and that one still takes 10hrs, but that is down to using ZeroVM, since it runs about 10x slower so that would make sense.
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/5507/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants