-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky tests in trino-main & others: surefire sometimes fails: Cannot invoke "java.io.File.getAbsolutePath()" because the return value of "org.apache.maven.artifact.Artifact.getFile()" #14618
Comments
cc @kokosing |
https://github.com/trinodb/trino/actions/runs/3241747761/jobs/5314824158 second time for same PR (#14506) |
It seems possibly related to junit, since it fails after running the junit tests, but before it starts the testng tests. |
Can we try updating it to surefire plugin to a recent version like mentioned here - apache/maven-surefire#504 and https://issues.apache.org/jira/browse/SUREFIRE-2051 |
I was just looking to surefire plugin upgrade for this but it turns out we rolled that back for some other reasons airlift/airbase#313 |
I think I've found a way to trigger it:
Stacktrace:
|
Log tail with
|
Is this testng specific? Is there any issue for this in maven or for surefire plugin? |
Looks like a bug in Surefire, supposedly "fixed" in |
Latest master build also failed with this https://github.com/trinodb/trino/actions/runs/3328105585/jobs/5504695733 This looks like we're not running all trino-main tests on master I will therefore mark this as a release blocker. |
Looks like it fails to download testng 5.10, which is a really old version and it's jar is missing on maven central. I wonder what requires it and if that changed recently. |
@findepi I see we rolled back Surefire from |
Is it gone permanently or intermittently? I can see it on Maven Central's web page |
This fails:
|
I guess we should update the dependecy anyway, regardless of any other fixes, as that version is 12 years old. |
Airbase should be using 6.10: https://github.com/airlift/airbase/blob/master/airbase/pom.xml#L205 and we don't override it in Trino. I'm checking where does 5.10 come from. |
It seems Maven Central TestNG 5.10 repo has specific JARs for JDK 1.4 and JDK 1.5, but there's no general-purpose JAR. |
It turns out Surefire only needs TestNG 5.10 POM, it doesn't need the JAR. Once I re-enabled my laptop's wifi, tests started passing beyond the failure point, but my local repo only has TestNG 5.10 POM:
|
The only thing we can do to improve resiliency is to make sure this version of TestNG is cached in the local Maven repo. Right now the cache can be created by one of the jobs like
|
|
i don't know what the problems were.
I guess because surefire is compiled against that version, but use is still in control which version is used at runtime. @nineinchnick @MiguelWeezardo I added the |
Why? Because they get interrupted and we don't wait for a green run before merging more changes or doing releases? AFAIK we run all tests on master. There was a release recently, so is the |
We attempt to.
That was my understanding. I don't know what's the current state.
I think for this particular issue it was offline deemed as "not really a blocker". Not my decision. However, if we can safely remove the label from here, it would be good. For label in general -- |
@MiguelWeezardo did you solve the issue? I haven't experienced it recently |
I haven't done anything, maybe it was temporary or someone re-uploaded. Or it was fixed by improved caching, although I don't see how it could get cached if it was missing. |
I think this helped: 527abc5. We also did improve how we cache the local Maven repository a few months ago, like making sure even a slightly stale cache is restored, hoping that most artifacts will be reused: #14882 If you're wondering if this issue can be closed, I think yes. We can reopen if necessary. |
After refreshing my memory of this issue I will credit this to putting the responsibility of updating GitHub cache on one CI job 😆 |
thank you guys for confirming! |
https://github.com/trinodb/trino/actions/runs/4064936716/jobs/6998972779
|
https://github.com/trinodb/trino/actions/runs/4065666261/jobs/7000699640
|
That job failed on downloading That's probably a similar case that we observed with the Surefire plugin having some hard-coded dependencies on an old version of TestNG and trying to download it when executing tests, but it's not detected by other tools that build the dependency tree using Maven APIs. I don't know what we can do about it other than downloading these explicitly in the |
@nineinchnick thanks for through analysis of the problem!
let's do that |
#15797 will make sure these dependencies are downloaded. |
https://github.com/trinodb/trino/actions/runs/4075799341/jobs/7022685485
but it could have been run before #15797 got merged ... let's optimistically close and reopen if re-observed |
eg https://github.com/trinodb/trino/actions/runs/3237106331/jobs/5306229110
The text was updated successfully, but these errors were encountered: