Skip to content

Commit

Permalink
consumeSubQuery should not drop leading combinators
Browse files Browse the repository at this point in the history
Refactored so that it eats until a combinator is seen after non-combinator content, and returns it all.

And corrected unit tests that were incorrectly relying on that behavior.

Note that a leading combinator will combine against the root element, which is either the Document, or the context element.

Fixes #1707
  • Loading branch information
jhy committed Oct 30, 2023
1 parent 2a4a9cf commit d126488
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 6 deletions.
4 changes: 4 additions & 0 deletions CHANGES
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ Release 1.17.1 [PENDING]
elements to be returned when used on elements other than the root document.
<https://github.com/jhy/jsoup/issues/2018>

* Bugfix: in a sub-query such as `p:has(> span, > i)`, combinators following the `,` Or combinator would be
incorrectly skipped, such that the sub-query was parsed as `i` instead of `> i`.
<https://github.com/jhy/jsoup/issues/1707>

Release 1.16.2 [20-Oct-2023]
* Improvement: optimized the performance of complex CSS selectors, by adding a cost-based query planner. Evaluators
are sorted by their relative execution cost, and executed in order of lower to higher cost. This speeds the
Expand Down
9 changes: 6 additions & 3 deletions src/main/java/org/jsoup/select/QueryParser.java
Original file line number Diff line number Diff line change
Expand Up @@ -145,18 +145,21 @@ private void combinator(char combinator) {

private String consumeSubQuery() {
StringBuilder sq = StringUtil.borrowBuilder();
boolean seenNonCombinator = false; // eat until we hit a combinator after eating something else
while (!tq.isEmpty()) {
if (tq.matches("("))
sq.append("(").append(tq.chompBalanced('(', ')')).append(")");
else if (tq.matches("["))
sq.append("[").append(tq.chompBalanced('[', ']')).append("]");
else if (tq.matchesAny(Combinators))
if (sq.length() > 0)
if (seenNonCombinator)
break;
else
tq.consume();
else
sq.append(tq.consume());
else {
seenNonCombinator = true;
sq.append(tq.consume());
}
}
return StringUtil.releaseBuilder(sq);
}
Expand Down
6 changes: 3 additions & 3 deletions src/test/java/org/jsoup/select/QueryParserTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ public class QueryParserTest {
"<a><li><strong>l2</strong></li></a>" +
"<p><strong>yes</strong></p>" +
"</body></html>");
assertEquals("l1 l2 yes", doc.body().select(">p>strong,>*>li>strong").text());
assertEquals("l1 yes", doc.body().select(">p>strong,>li>strong").text()); // selecting immediate from body
assertEquals("l2 yes", doc.select("body>p>strong,body>*>li>strong").text());
assertEquals("l2 yes", doc.select("body>*>li>strong,body>p>strong").text());
assertEquals("l2 yes", doc.select("body>p>strong,body>*>li>strong").text());
assertEquals("yes", doc.select(">body>*>li>strong,>body>p>strong").text());
assertEquals("l2", doc.select(">body>p>strong,>body>*>li>strong").text());
}

@Test public void testImmediateParentRun() {
Expand Down
13 changes: 13 additions & 0 deletions src/test/java/org/jsoup/select/SelectorTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -1206,4 +1206,17 @@ public void wildcardNamespaceMatchesNoNamespace() {
Elements innerLisFromParent = li2.select("ul li");
assertEquals(innerLis, innerLisFromParent);
}

@Test public void rootImmediateParentSubquery() {
// a combinator at the start of the query is applied to the Root selector. i.e. "> p" matches a P immediately parented
// by the Root (which is <html> for a top level query, or the context element in :has)
// in the sub query, the combinator was dropped incorrectly
String html = "<p id=0><span>A</p> <p id=1><b><i><span>B</p> <p id=2><i>C</p>\n";
Document doc = Jsoup.parse(html);

Elements els = doc.select("p:has(> span, > i)"); // should match a p with an immediate span or i
assertEquals(2, els.size());
assertEquals("0", els.get(0).id());
assertEquals("2", els.get(1).id());
}
}

0 comments on commit d126488

Please sign in to comment.