-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradle PR build failing #6130
Comments
Tried so far
Observed
|
Also now occuring on data manager:
|
Running more tests in parallel it seems as if every FVT may now be failing with guid search issues. EVERY test most likely. May just be seen as I upgraded the failing image to more cpu/mem resource - so more parallel tests without timeouts. |
Noting configuration on failing system:
|
Tips on debugging the graph.
Change properties
follow the tips to connect, check vertices - it's then possible to connect remotely
|
A query on the graph (for the analytics modeling FVT) seems to show iBASEMODULE is present within the underlying graph -> https://gist.github.com/443b5f9c1f14b40ce007814b6399c9f0 |
@planetf1 Can you provide some info on what steps you followed to be able to debug the PR please. |
Possible fix merged. Will summarise after |
To summarise what happened here in terms of FVT failures
Note: the investigation was delayed by a few days due to writing & running the Egeria dojo - In summary:
I think this came about because
Using a github hosted runner we DO NOT have fine grained control over the workers - see https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners One remaining concern is that the //exact// reason for the failure is not known - we should revisit this, possibly leave until after a further image update in case it's a more transient issue Closing & will open a new issue to debug & revert back to Ubuntu 20.04 |
Reopening. This has continued to occur. Most recently in #6450 I addressed some issues with overlapping port numbers being used in tests. Whilst that was a potential problem, it appears not to be the only cause, since intermittently PR builds have continued to fail. The most recent failure was at
|
Signed-off-by: Nigel Jones <nigel.l.jones+git@gmail.com>
Hvaing checked the Asset Consumer FVT code, the exception above is normal. It occurs when a tag is not found -- and in this case, this is exactly what the FVT is verifying (perhaps not ideal that we get an exception in a not found case) Therefore the error is NOT related to the exceptions above. Instead the failure point seems to be around:
which is the daemon appearing. The most likely cause of this is resource exhaustion - max processes, files, memory etc Will disable to gradle build daemon to see if this helps (or provides a clearer error) |
Disable gradle daemon for CI build #6130
change has resulted in clean builds so far. Closing for now and will re-open if the issue returns |
Reopening as this is still ongoing (ie #6611 ) |
Signed-off-by: Nigel Jones <nigel.l.jones+git@gmail.com>
#6130 disable parallel gradle build on ci/cd (github runner)
Latest change = stable builds for now |
Occured another twice today, again the gradle daemon disappeared. Looked through recent gradle changes and issues, and gradle/gradle#19750 seems a possible explanation. We are setting jvm opts -- and perhaps it may be better not to. This will override gradle's curated defaults. Combined with leaving parallel builds etc off, we should try this. Cannot be reproduced in an external environment, only on the gradle runners, so this will have to be another exploratory change. |
…6130 Signed-off-by: Nigel Jones <nigel.l.jones+git@gmail.com>
odpi#6130 Signed-off-by: Nigel Jones <nigel.l.jones+git@gmail.com>
Build logs continue to show a similar failure, as if the JVM - or the gradle process has gone away. Tried
both of these have been backed off. Harmless but not addressing issue. Then
|
tests have been stable, so closing |
The gradle PR build is failing with errors such as:
or
This was first seen with PR #6116 -see https://github.com/odpi/egeria/actions/runs/1674299644
At the time of this initial failure other PRs were continuing to pass. This continued for up to 3 days, including the last good build
at https://github.com/odpi/egeria/actions/runs/1687765378
Subsequently all PR checks are now failing
The text was updated successfully, but these errors were encountered: