Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

further improvement of pagination query #57694

Closed
mzhang77 opened this issue Nov 26, 2024 · 3 comments · Fixed by #58029
Closed

further improvement of pagination query #57694

mzhang77 opened this issue Nov 26, 2024 · 3 comments · Fixed by #58029
Assignees
Labels
sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.

Comments

@mzhang77
Copy link

customer has a query similar to #56005, but more complicated. In #56005, the query is:

explain SELECT
  dt.*
FROM
  it 
  LEFT JOIN dt ON it.pk = dt.pk
WHERE
  it.a = "a"
  AND (
    (
      it.a > "a"
    )
    OR (
      it.a = "a" AND it.pk > 1
    )
  )
ORDER BY
  it.pk
LIMIT
  240;

Now the query is

explain SELECT
  dt.*
FROM
  it 
  LEFT JOIN dt ON it.pk = dt.pk
WHERE
  it.a in ("a","b")
  AND (
    (
      it.a > "b"
    )
    OR (
      it.a = "b" AND it.pk > 1
    )
  )
ORDER BY
  it.pk
LIMIT
  240;

The optimizer should see that there is no need to scan index for it.a = "a".

@mzhang77 mzhang77 added the type/enhancement The issue or PR belongs to an enhancement. label Nov 26, 2024
@fixdb fixdb added the sig/planner SIG: Planner label Nov 26, 2024
@larat7
Copy link

larat7 commented Nov 26, 2024

it's a bit trickier than that, the optimizer should see that it can pushdown all three where clauses down to the tikv layer and scan fewer rows. for example, the query below is also problematic

explain SELECT
  dt.*
FROM
  it 
  LEFT JOIN dt ON it.pk = dt.pk
WHERE
  it.a in ("b", "c")
  AND (
    (
      it.a > "b"
    )
    OR (
      it.a = "b" AND it.pk > 1
    )
  )
ORDER BY
  it.pk
LIMIT
  240;

@ghazalfamilyusa
Copy link
Contributor

I simplified the problem to the test below. The problem is that the optimizer is picking point ranges on b1 in ( 44, 70, 76) and not intersecting it with the (t1.b1 > 70 OR (t1.b1 = 70 AND t1.a1 > 41)). The optimal result should be the index range (b1=76) OR (b1=70 AND a1>41). This should be covered by the enhancement made by #54166. However, the test shows it is not working and should be fixed as a follow up of 54166.

drop table if exists t1

CREATE TABLE t1 (
  a1 int,
  b1 int,
  c1 varbinary(767) DEFAULT NULL,
  KEY twoColIndex (b1,a1)
)

set @@tidb_opt_fix_control = "54337:ON"

EXPLAIN  format = brief 
SELECT 1
     FROM t1 FORCE INDEX (twoColIndex)
     WHERE 
       t1.b1 IN ( 44, 70, 76)
       AND (t1.b1 > 70 OR (t1.b1 = 70 AND t1.a1 > 41))

id      estRows task    access object   operator info
Projection      10.01   root            1->Column#5
└─IndexReader   10.01   root            index:Selection
  └─Selection   10.01   cop[tikv]               or(gt(test.t1.b1, 70), and(eq(test.t1.b1, 70), gt(test.t1.a1, 41)))
    └─IndexRangeScan    30.00   cop[tikv]       table:t1, index:twoColIndex(b1, a1)     range:[44,44], [70,70], [76,76], keep order:false, stats:pseudo
--------------

@ghazalfamilyusa
Copy link
Contributor

I have a fix that extends the scope of #54166
The result for the above test case get the most selective scan and no filter applied. @mzhang77 and @larat7 : let me know if that covers the use cases you have? To be on the safe side, please add an end to end use case including DDL.

drop table if exists t1
CREATE TABLE t1 (
  a1 int,
  b1 int,
  c1 varbinary(767) DEFAULT NULL,
  KEY twoColIndex (a1,b1)
)
set @@tidb_opt_fix_control = "54337:ON"

EXPLAIN  format = brief 
SELECT 1
     FROM t1 FORCE INDEX (twoColIndex)
     WHERE 
       t1.a1 IN ( 44, 70, 76)
       AND (t1.a1 > 70 OR (t1.a1 = 70 AND t1.b1 > 41))

id      estRows task    access object   operator info
Projection      43.33   root            1->Column#5
└─IndexReader   54.17   root            index:IndexRangeScan
  └─IndexRangeScan      54.17   cop[tikv]       table:t1, index:twoColIndex(a1, b1)     range:(70 41,70 +inf], [76,76], keep order:false, stats:pseudo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants