Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jt400: tests are not cleaning after themselves and parallel run fails #6001

Merged
merged 1 commit into from
Apr 16, 2024

Conversation

JiriOndrusek
Copy link
Contributor

fixes #5999

This PR fixes tests behavior in several corner cases.

  1. Better data clearing - whenever test writes any data. This entry is registered for the removal at the end of the execution. Moreover - it is possible to execute clear all which clears all queues used by the test, This behavior is described in the readme.adoc and is suggested t o use during development.
  2. Parallel executions - it might happen (very easily) that the execution A may influence execution B. I wasn't able to use some locking mechanism, which might be present in jt400. Therefore I added a simple locking mechanism based on the keyed data queue.
    • Before all tests, the attempt for locking is executed . The lock is released after all. The timeout to achieve log is set to 5 mins, after which the test fails (fi lock is not achieved)
    • Lock is written in a keyed data queue under hardcoded key cq.jt400.global-lock (described in readme.adoc).
    • Each execution is able to remove the old lock, if it is more then 5 minutes old.
    • Also clear-all clears the lock queue, therefore there it tests can not be blocked forever.
  3. More debug logging is added.
  4. Helper method for dumping data is added.

The coverage or test amount stays the same.

@JiriOndrusek JiriOndrusek force-pushed the jt400-parallel-global-lock branch 3 times, most recently from c507f8a to 4f07cae Compare April 15, 2024 15:39
@JiriOndrusek
Copy link
Contributor Author

Failure is not related

Failures: 
2024-04-15T16:06:35.5931973Z [ERROR]   GroupedAws2KinesisTest>BaseAWs2TestSupport.failingDefaultCredentialsProviderTest:99 Expected java.lang.AssertionError to be thrown, but nothing was thrown.

@jamesnetherton
Copy link
Contributor

Just so I understand things better. The docs state Simple locking mechanism is implemented for the test to allow parallel executions..

What that means is that if you start parallel test executions, they will be queued and wait for each other to complete?

@JiriOndrusek
Copy link
Contributor Author

Just so I understand things better. The docs state Simple locking mechanism is implemented for the test to allow parallel executions..

What that means is that if you start parallel test executions, they will be queued and wait for each other to complete?

I used a keyed dataque (FIFO).

  • Each participant saves unique token into a key cq.jt400.global-lock
  • Each participant the reads the FIFO queue and if the resulted string is its own unique token, execution is allowed
  • When execution ends, the key is removed

If the token is not its own

  • read of the token is repeated until timeout or its own token is returned (so the second participant waits, until the first participant removes its token)

Dead lock prevention

  • part of the unique token is timestamp, if participant finds a token, which is too old, token is removed
  • action to clear-all data removes also the locking tokens

I tested the scenario several times with 9 parallel executions

@JiriOndrusek
Copy link
Contributor Author

Therefore only 1 token (thus 1 participant) is allowed to run the tests, the others have to wait

@JiriOndrusek
Copy link
Contributor Author

@jamesnetherton I'll add my previous comment (with explanation) as a code comment to locking method

@JiriOndrusek
Copy link
Contributor Author

@jamesnetherton jamesnetherton merged commit 5b74e44 into apache:main Apr 16, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Jt400: tests are not cleaning after themselves and parallel run fails
2 participants