Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThreadGroup instance leaked when using Timeout rule #1517

Merged
merged 4 commits into from
Jun 20, 2018

Conversation

elrodro83
Copy link
Contributor

@elrodro83 elrodro83 commented Apr 25, 2018

This was seen with heavy impact because Spring uses the ThreadGroupContext to store internal information.

The thread group created for handling the timeout of tests is never destroy, so the main thread group keeps a reference to all timeout groups created during the tests. This causes the threadGroup to remain in memory, and all of its context along with it.

This change avoids that by destroying the ThreadGroup after the test.

However, in some cases (like if a thread from the group is hanged), the ThreadGroup destruction fails. In that case, the exception is ignored so the behavior is not changed.

@marcphilipp marcphilipp requested a review from kcooney April 28, 2018 14:47
@marcphilipp
Copy link
Member

@kcooney Can you please take a look? IIRC you're most familiar with this part of the codebase.

@marcphilipp marcphilipp added this to the 4.13 milestone Apr 28, 2018
fail("A 'FailOnTimeoutGroup' thread group remains referenced after the test execution.");
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readability of the test can be improved by extracting a method subGroupsOfCurrentThread

@Test
public void threadGroupNotLeaked()
        throws Throwable {

    Collection<ThreadGroup> groupsBeforeSet = subGroupsOfCurrentThread();

    evaluateWithWaitDuration(0);

    for (ThreadGroup group: subGroupsOfCurrentThread()) {
        if(!groupsBeforeSet.contains(group) && "FailOnTimeoutGroup".equals(group.getName())) {
            fail("A 'FailOnTimeoutGroup' thread group remains referenced after the test execution.");
        }
    }
}

private Collection<ThreadGroup> subGroupsOfCurrentThread() {
    ThreadGroup[] subGroups = new ThreadGroup[256];
    int numGroups = currentThread().getThreadGroup().enumerate(subGroups);
    return asList(subGroups).subList(0, numGroups);
}

@stefanbirkner
Copy link
Contributor

Thanks for extracting the method subGroupsOfCurrentThread.

throw throwable;
}
} finally {
thread.join(100);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we can't pass 1 to thread.join()? At this point, the test either completed (in which case this should return right away) or the test timed out (in which case the thread will likely take more than 100ms to exit). Waiting extra time just makes the test run longer.

In fact, maybe we shouldn't call join at all. thread.join() can throw InterruptedException.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reduced the timeout as suggested.

join has to be called because otherwise the thread will still be active when calling threadGroup.destroy();, which will throw an IllegalThreadStateException and keep the group referenced, causing the leak.

Copy link
Member

@kcooney kcooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@kcooney
Copy link
Member

kcooney commented Jun 3, 2018

@elrodro83 Note I applied a minor code change to your branch, so you'll need to pull it back to your local repo before making more changes.

* Handle thread interruption
@elrodro83
Copy link
Contributor Author

@kcooney pushed the requested change.

@kcooney kcooney merged commit 63fd277 into junit-team:master Jun 20, 2018
@kcooney
Copy link
Member

kcooney commented Jun 20, 2018

@elrodro83 Could you please update the release notes in the wiki to document this change? Thanks!

@elrodro83
Copy link
Contributor Author

@elrodro83
Copy link
Contributor Author

elrodro83 commented Jun 22, 2018

Thank you for the review and merge! @kcooney @stefanbirkner

@jglick
Copy link
Contributor

jglick commented Apr 3, 2020

@kcooney since this change caused regressions for tests of HBase and EclEmma and now also at least one Jenkins plugin, perhaps it should be either reverted, or made opt-in (e.g. Builder withCleanup())? Destroying the thread group means that if some executor pool happened to have been initialized during one test with a timeout, it will be broken henceforth. You can work around that by resetting the pool or setting the test runner to start a new JVM for each test, but the symptom is far from obvious; I had to spend some hours with git bisect to find this.

@kcooney
Copy link
Member

kcooney commented Apr 3, 2020

FYI: @jglick is referring to the issue being discussed in #1652

@boaks
Copy link

boaks commented Dec 29, 2020

My experience with the ThreadGroup.destroy() of this PR isn't positive as well.

In Eclipse Californium it causes unit tests to fail randomly .
The randomness is caused by a "race" of terminating threads and the call of ThreadGroup.destroy().
And there is also a not to obvious order condition between the Timeout rule and static ThreadGroups. If the group is instantiated outside the Timeout rule, it's preserved, otherwise destroyed.

In my opinion, if such a feature is required, please

  1. add a "opt-in" (or "opt-out").
  2. One simple idea maybe to not "destroy", if any of the sub-groups are "none-daemon-groups" (daemon groups are destroy with the last terminating thread, for these groups the destroy should work. For none-daemon-groups, it may cause trouble, if such thread-groups are destroyed, they maybe considered to be static.)
  3. prevent/reduce the race-condition with terminating threads. One idea maybe to use a ThreadGroup.interrupt() before joining the "Thread". Maybe it's required to first join all Threads of the groups and sub-groups.

As temporary workaround I try to ensure, that all "static" thread-groups are initialized before any Timeout rule.
But that's not too easy to see.

@kcooney
Copy link
Member

kcooney commented Jan 2, 2021

FYI: I'm proposing we undo this change and instead mark the thread group as a daemon group; see #1687

Copy link

@eshabaddie98 eshabaddie98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants