Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple selectors for direct descendants catches indirect descendants as well #1707

Closed
Odepax opened this issue Jan 18, 2022 · 3 comments
Closed
Assignees
Labels
bug Confirmed bug that we should fix fixed
Milestone

Comments

@Odepax
Copy link

Odepax commented Jan 18, 2022

Using org.jsoup:jsoup:1.14.3, it seems like using something like .select("> .direct > .foo, > .direct > .bar") will also select .direct > .bar.

As a work-around: .selectFirst("> .direct")!!.select("> .foo, > .bar") seems to work fine.

package bug

import org.intellij.lang.annotations.*
import org.jsoup.*
import org.junit.*
import org.junit.Assert.*

class JsoupLearningTests {
   @Test
   fun direct_descendant_bug_1() { // Fails.
      @Language("HTML")
      val html = """
         <!DOCTYPE html>
         <html lang="en">
            <head>
               <meta charset="utf-8"/>
            </head>
            <body>
               <div class="entry">
                  <div class="entry__header">
                     <div class="interesting-container">
                        <span class="interesting-item">Y</span>
                        <span class="also-interesting-item">Y</span>
                     </div>
                  </div>
                  <div class="entry__body">
                     <p> ... </p>
                     <p> ... </p>
                     <div class="sub-entry entry">
                        <div class="entry__header">
                           <div class="interesting-container">
                              <span class="interesting-item">N</span>
                              <span class="also-interesting-item">N</span>
                           </div>
                        </div>
                        <div class="entry__body">
                           <p> ... </p>
                           <p> ... </p>
                        </div>
                     </div>
                  </div>
               </div>
            </body>
         </html>
      """

      val document = Jsoup.parse(html)
      val entry = document.selectFirst(".entry")!!
      val interestingItems = entry.select("> .entry__header > .interesting-container > .interesting-item, > .entry__header > .interesting-container > .also-interesting-item")
      val actual = interestingItems.joinToString("") { it.text() }

      assertEquals("YY", actual)
   }

   @Test
   fun direct_descendant_bug_2() { // Passes.
      @Language("HTML")
      val html = """
         <!DOCTYPE html>
         <html lang="en">
            <head>
               <meta charset="utf-8"/>
            </head>
            <body>
               <div class="entry">
                  <div class="entry__header">
                     <div class="interesting-container">
                        <span class="interesting-item">Y</span>
                        <span class="interesting-item">Y</span>
                     </div>
                  </div>
                  <div class="entry__body">
                     <p> ... </p>
                     <p> ... </p>
                     <div class="sub-entry entry">
                        <div class="entry__header">
                           <div class="interesting-container">
                              <span class="interesting-item">N</span>
                              <span class="interesting-item">N</span>
                           </div>
                        </div>
                        <div class="entry__body">
                           <p> ... </p>
                           <p> ... </p>
                        </div>
                     </div>
                  </div>
               </div>
            </body>
         </html>
      """

      val document = Jsoup.parse(html)
      val entry = document.selectFirst(".entry")!!
      val interestingItems = entry.select("> .entry__header > .interesting-container > .interesting-item")
      val actual = interestingItems.joinToString("") { it.text() }

      assertEquals("YY", actual)
   }

   @Test
   fun direct_descendant_bug_3() { // Passes.
      @Language("HTML")
      val html = """
         <!DOCTYPE html>
         <html lang="en">
            <head>
               <meta charset="utf-8"/>
            </head>
            <body>
               <div class="entry">
                  <div class="entry__header">
                     <div class="interesting-container">
                        <span class="interesting-item">Y</span>
                        <span class="also-interesting-item">Y</span>
                     </div>
                  </div>
                  <div class="entry__body">
                     <p> ... </p>
                     <p> ... </p>
                     <div class="sub-entry entry">
                        <div class="entry__header">
                           <div class="interesting-container">
                              <span class="also-interesting-item">N</span>
                           </div>
                        </div>
                        <div class="entry__body">
                           <p> ... </p>
                           <p> ... </p>
                        </div>
                     </div>
                  </div>
               </div>
            </body>
         </html>
      """

      val document = Jsoup.parse(html)
      val entry = document.selectFirst(".entry")!!
      val interestingItems = entry.select("> .entry__header > .interesting-container > .also-interesting-item, > .entry__header > .interesting-container > .interesting-item")
      val actual = interestingItems.joinToString("") { it.text() }

      assertEquals("YY", actual)
   }

   @Test
   fun direct_descendant_bug_4() { // Fails.
      @Language("HTML")
      val html = """
         <!DOCTYPE html>
         <html lang="en">
            <head>
               <meta charset="utf-8"/>
            </head>
            <body>
               <div class="entry">
                  <div class="entry__header">
                     <div class="interesting-container">
                        <span class="interesting-item">Y</span>
                        <span class="also-interesting-item">Y</span>
                     </div>
                  </div>
                  <div class="entry__body">
                     <p> ... </p>
                     <p> ... </p>
                     <div class="sub-entry entry">
                        <div class="entry__header">
                           <div class="interesting-container">
                              <span class="also-interesting-item">N</span>
                           </div>
                        </div>
                        <div class="entry__body">
                           <p> ... </p>
                           <p> ... </p>
                        </div>
                     </div>
                  </div>
               </div>
            </body>
         </html>
      """

      val document = Jsoup.parse(html)
      val entry = document.selectFirst(".entry")!!
      val interestingItems = entry.select("> .entry__header > .interesting-container > .interesting-item, > .entry__header > .interesting-container > .also-interesting-item")
      val actual = interestingItems.joinToString("") { it.text() }

      assertEquals("YY", actual)
   }
}

Not sure if it's a bug or a feature: in comparison, JS's .querySelectorAll(> .direct) throws about an invalid selector.

@QAQGaeBolg
Copy link

I along with my group will be fixing this issue in this semester.

@JeffXiesk
Copy link

JeffXiesk commented Apr 20, 2022

Hi, I may just find the problem.
When dealing with multiple subqueries. The method consumeSubQuery will ignore the '>' of the next subquery, which means the second subquery will become like .select("> .direct > .foo") and .select(".direct > .bar") instead of the one we want like .select("> .direct > .foo") and .select("> .direct > .bar").
Hence, my method is to judge if the next is a subquery and if so, add the '>' back to the query.

JeffXiesk added a commit to JeffXiesk/jsoup that referenced this issue Apr 23, 2022
JeffXiesk added a commit to JeffXiesk/jsoup that referenced this issue Apr 23, 2022
JeffXiesk added a commit to JeffXiesk/jsoup that referenced this issue May 28, 2022
@jhy jhy closed this as completed in d126488 Oct 30, 2023
@jhy jhy added bug Confirmed bug that we should fix fixed labels Oct 30, 2023
@jhy jhy self-assigned this Oct 30, 2023
@jhy jhy added this to the 1.17.1 milestone Oct 30, 2023
@jhy
Copy link
Owner

jhy commented Oct 30, 2023

Thanks, fixed!

Not sure if it's a bug or a feature: in comparison, JS's .querySelectorAll(> .direct) throws about an invalid selector.

In jsoup, if the query starts with a combinator, we combine it against the root element. The root element is the Document or the context element.

jhy added a commit to chibenwa/jsoup that referenced this issue Oct 30, 2023
Refactored so that it eats until a combinator is seen after non-combinator content, and returns it all.

And corrected unit tests that were incorrectly relying on that behavior.

Note that a leading combinator will combine against the root element, which is either the Document, or the context element.

Fixes jhy#1707
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bug that we should fix fixed
Projects
None yet
4 participants