Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

edu.harvard.iq.dataverse.api.FilesIT.testForceReplaceAndUpdate Test Failing #6846

Closed
djbrooke opened this issue Apr 21, 2020 · 18 comments · Fixed by #8858
Closed

edu.harvard.iq.dataverse.api.FilesIT.testForceReplaceAndUpdate Test Failing #6846

djbrooke opened this issue Apr 21, 2020 · 18 comments · Fixed by #8858
Milestone

Comments

@djbrooke
Copy link
Contributor

Expected status code <200> doesn't match actual status code <500>.

See build 425 on dataverse.org Jenkins for more info.

@pdurbin pdurbin self-assigned this Apr 22, 2020
@pdurbin
Copy link
Member

pdurbin commented Apr 22, 2020

At first I could reproduce the FilesIT.testForceReplaceAndUpdate failure locally but the error went away after I doubled the following

-XX:MaxMetaspaceSize=512m
-XX:MetaspaceSize=256m

(I did this because I'm getting this java.lang.OutOfMemoryError: Metaspace in my logs when running even a single API test.)

Now I can't reproduce the original errors but when I run the whole API test suite (which I couldn't do with Glassfish 4.1 without it falling over) I get a seemingly random assortment of failing tests, just like we're seeing on Jenkins.

When I look at my logs, I see many cases of org.postgresql.util.PSQLException: ERROR: deadlock detected which we discussed but never fixed in #2460. I think these deadlocks should be discussed in tech hours.

I'm not sure what to do with this issue.

@pdurbin pdurbin removed their assignment Apr 22, 2020
@djbrooke
Copy link
Contributor Author

djbrooke commented Apr 22, 2020

Thanks @pdurbin for the details.

@scolapasta can you get automated testing on the docket for a future tech hours? We've been picking off the failing tests one by one, but I saw on some recent runs that 4 new ones were failing so I'm not sure if we should re-evaluate. Generally I'm happy to spend sprint time in the automated test area. :)

@poikilotherm
Copy link
Contributor

poikilotherm commented Apr 23, 2020

About the MetaspaceSize and MaxMetaspaceSize: I stumbeld over https://dzone.com/articles/permgen-and-metaspace and my attention was catched here:

Whenever there is a need to resize PermGen/Metaspace, JVM will do it as it does with the standard heap. Resizing those spaces requires a full GC, which is always an expensive operation. It can usually be observed during a startup when a lot of classes are being loaded. Especially if the application has dependencies on many external libraries. If there are a lot of full GCs during the startup, it’s usually because of that. If that case, increasing the initial size can boost the startup performance.

Has anyone taken a look at this during startup? During playing a bit with container memory limits, I noticed a few times rising and sudden drops in memory use. I wonder if this is related to GC and if the metaspace is related to our very long deploy times, next to our massive amount of beans we are loading...

@donsizemore
Copy link
Contributor

@poikilotherm I sent the Metaspace settings along with https://blog.payara.fish/fine-tuning-payara-server-5-in-production to roll in as part of the switch to Payara 5:

./asadmin $ASADMIN_OPTS create-jvm-options "-XX\:MaxMetaspaceSize=512m"

In tinkering with Prometheus I did see a series of fairly steep memory reclamation during garbage collection; I can try again with some test Metaspace settings during deployment.

@djbrooke
Copy link
Contributor Author

djbrooke commented May 20, 2020

@pdurbin
Copy link
Member

pdurbin commented Jul 28, 2020

@djbrooke @scolapasta now that #6865 is closed (🎉 ) should we close this issue as well?

@scolapasta
Copy link
Contributor

I am not sure if this was caused by deadlocks, but more importantly the past two jenkins builds (I'm not including the one that couldn't connect) passed. So unless we see this occur again, yes we can close.

@djbrooke
Copy link
Contributor Author

@pdurbin Sure, we can reopen if we start to see consistent failures from this test !

@djbrooke
Copy link
Contributor Author

@scolapasta 👍

@scolapasta
Copy link
Contributor

@djbrooke beat me to it, while I was typing my comment! (but I got the comment in first at least :))

@pdurbin
Copy link
Member

pdurbin commented Mar 3, 2022

We just noticed another case of testForceReplaceAndUpdate failing at https://jenkins.dataverse.org/blue/organizations/jenkins/IQSS-Dataverse-Develop-PR/detail/PR-8440/2/tests for pull request #8440.

@pdurbin
Copy link
Member

pdurbin commented Mar 15, 2022

Another case of testForceReplaceAndUpdate failing at https://jenkins.dataverse.org/blue/organizations/jenkins/IQSS-Dataverse-Develop-PR/detail/PR-8486/2/tests for PR #8486.

@pdurbin
Copy link
Member

pdurbin commented May 5, 2022

testForceReplaceAndUpdate just failed at https://jenkins.dataverse.org/blue/organizations/jenkins/IQSS-Dataverse-Develop-PR/detail/PR-8624/4/tests for #8624.

That's it. Time to reopen this issue. 😄

@donsizemore
Copy link
Contributor

testForceReplaceAndUpdate failure seen again at https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/view/change-requests/job/PR-8689/1/consoleFull

The Payara log doesn't seem particularly helpful, sorry.

@pdurbin
Copy link
Member

pdurbin commented May 24, 2022

https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/view/change-requests/job/PR-8689/1/testReport/edu.harvard.iq.dataverse.api/FilesIT/testForceReplaceAndUpdate/ is showing a 500 error on FilesIT.testForceReplaceAndUpdate(FilesIT.java:668)

That's this line:

assertEquals(OK.getStatusCode(), updateMetadataResponse.getStatusCode());

From server.log.txt we see that the metadata of the file couldn't be edited due to a dataset lock:

[#|2022-05-11T13:30:47.039+0000|WARNING|Payara 5.2021.6|edu.harvard.iq.dataverse.api.Files|_ThreadID=69;_ThreadName=http-thread-pool::http-listener-1(2);_TimeMillis=1652275847039;_LevelValue=900;|
Dataset publication finalization: exception while exporting:{0}
edu.harvard.iq.dataverse.api.AbstractApiBean$WrappedResponse: edu.harvard.iq.dataverse.engine.command.exception.IllegalCommandException: Dataset cannot be edited due to dataset lock.
at edu.harvard.iq.dataverse.api.AbstractApiBean.execCommand(AbstractApiBean.java:633)
at edu.harvard.iq.dataverse.api.Files.updateFileMetadata(Files.java:423)

So, I think we just need to add our normal UtilIT.sleepForLock in the test code just before the failing line.

@mreekie
Copy link

mreekie commented May 25, 2022

sprint

  • sized small

@mreekie
Copy link

mreekie commented Jun 8, 2022

Sprint:

  • pm.sprint.2022_05_25 ended OnDeck

@pdurbin pdurbin self-assigned this Jul 25, 2022
@pdurbin pdurbin removed their assignment Jul 25, 2022
kcondon added a commit that referenced this issue Jul 27, 2022
add sleep in FilesIT, clean up assertions #6846
@pdurbin pdurbin added this to the 5.12 milestone Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants