Skip to content

Commit

Permalink
Use a threadlocal to hold the NodeIterator
Browse files Browse the repository at this point in the history
Ensures that an Evaluator query can be multi-threaded

Fixes #2088
  • Loading branch information
jhy committed Dec 22, 2023
1 parent 4fb1036 commit 374ded2
Show file tree
Hide file tree
Showing 4 changed files with 67 additions and 3 deletions.
3 changes: 3 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@
* When generating XML-syntax output from parsed HTML, script nodes containing (pseudo) CData sections would have an
extraneous CData section added, causing script execution errors. Now, the data content is emitted in a HTML/XML/XHTML
polyglot format, if the data is not already within a CData section. [2078](https://github.com/jhy/jsoup/issues/2078)
* The `:has` evaluator held a non-thread-safe Iterator, and so if an Evaluator object was shared across multiple
concurrent threads, a NoSuchElement exception may be thrown, and the selected results may be incorrect. Now, the
iterator object is a thread-local. [2088](https://github.com/jhy/jsoup/issues/2088)

---
Older changes for versions 0.1.1 (2010-Jan-31) through 1.17.1 (2023-Nov-27) may be found in
Expand Down
8 changes: 6 additions & 2 deletions src/main/java/org/jsoup/select/StructuralEvaluator.java
Original file line number Diff line number Diff line change
Expand Up @@ -58,16 +58,20 @@ public boolean matches(Element root, Element element) {
}
}

static class Has extends StructuralEvaluator {
final NodeIterator<Element> it = new NodeIterator<>(new Element("html"), Element.class);
final ThreadLocal<NodeIterator<Element>> threadHasIter =
ThreadLocal.withInitial(() -> new NodeIterator<>(new Element("html"), Element.class));
// the element here is just a placeholder so this can be final - gets set in restart()

static class Has extends StructuralEvaluator {

public Has(Evaluator evaluator) {
super(evaluator);
}

@Override public boolean matches(Element root, Element element) {
// for :has, we only want to match children (or below), not the input element. And we want to minimize GCs
NodeIterator<Element> it = threadHasIter.get();

it.restart(element);
while (it.hasNext()) {
Element el = it.next();
Expand Down
58 changes: 58 additions & 0 deletions src/test/java/org/jsoup/select/SelectorIT.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
package org.jsoup.select;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.junit.jupiter.api.Test;

import java.util.concurrent.atomic.AtomicInteger;

import static org.junit.jupiter.api.Assertions.assertEquals;

public class SelectorIT {

@Test
public void multiThreadHas() throws InterruptedException {
final String html = "<div id=1></div><div id=2><p>One</p><p>Two</p>";
final Evaluator eval = QueryParser.parse("div:has(p)");

int numThreads = 20;
int numThreadLoops = 5;

SelectorIT.ThreadCatcher catcher = new SelectorIT.ThreadCatcher();

Thread[] threads = new Thread[numThreads];
for (int threadNum = 0; threadNum < numThreads; threadNum++) {
Thread thread = new Thread(() -> {
Document doc = Jsoup.parse(html);
for (int loop = 0; loop < numThreadLoops; loop++) {
Elements els = doc.select(eval);
assertEquals(1, els.size());
assertEquals("2", els.get(0).id());
}
});
thread.setName("Runner-" + threadNum);
thread.start();
thread.setUncaughtExceptionHandler(catcher);
threads[threadNum] = thread;
}

// now join them all
for (Thread thread : threads) {
thread.join();
}

assertEquals(0, catcher.exceptionCount.get());
}

static class ThreadCatcher implements Thread.UncaughtExceptionHandler {
AtomicInteger exceptionCount = new AtomicInteger();

@Override
public void uncaughtException(Thread t, Throwable e) {

e.printStackTrace();
exceptionCount.incrementAndGet();
}
}

}
1 change: 0 additions & 1 deletion src/test/java/org/jsoup/select/SelectorTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

import org.jsoup.Jsoup;
import org.jsoup.MultiLocaleExtension.MultiLocaleTest;
import org.jsoup.TextUtil;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.parser.Parser;
Expand Down

0 comments on commit 374ded2

Please sign in to comment.