Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: support vector index in planner #56204

Merged
merged 25 commits into from
Oct 10, 2024

Conversation

winoros
Copy link
Member

@winoros winoros commented Sep 21, 2024

What problem does this PR solve?

Issue Number: ref #54245, close #56510

Problem Summary:

What changed and how does it work?

Support to choose vector index by CBO and USE INDEX hint.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 21, 2024
@winoros winoros force-pushed the support-planner-cbo branch from a475b88 to 2bdb3b6 Compare September 21, 2024 16:31
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 21, 2024
Copy link

codecov bot commented Sep 21, 2024

Codecov Report

Attention: Patch coverage is 77.88462% with 46 lines in your changes missing coverage. Please review.

Project coverage is 76.0249%. Comparing base (5e8bb8e) to head (8b9840f).
Report is 1 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #56204        +/-   ##
================================================
+ Coverage   73.3611%   76.0249%   +2.6637%     
================================================
  Files          1626       1649        +23     
  Lines        449213     457362      +8149     
================================================
+ Hits         329548     347709     +18161     
+ Misses        99482      87974     -11508     
- Partials      20183      21679      +1496     
Flag Coverage Δ
integration 51.7887% <15.3846%> (?)
unit 72.4581% <77.8846%> (-0.0068%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.9478% <ø> (ø)
parser ∅ <ø> (∅)
br 63.1288% <ø> (+17.5649%) ⬆️

@winoros winoros force-pushed the support-planner-cbo branch from 2bdb3b6 to 35bdfee Compare September 22, 2024 13:33
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 22, 2024
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 26, 2024
@zimulala zimulala requested a review from AilinKid September 27, 2024 06:25
Copy link
Contributor

@AilinKid AilinKid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other parts lgtm

TiFlashMem: costusage.CostVer2Factor{Name: "tiflash_mem_factor", Value: 0.05},
TiDBDisk: costusage.CostVer2Factor{Name: "tidb_disk_factor", Value: 200.00},
TiDBRequest: costusage.CostVer2Factor{Name: "tidb_request_factor", Value: 6000000.00},
ANNIndexStart: costusage.CostVer2Factor{Name: "ann_index_start_factor", Value: 0.000144},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this value be decided?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to benchmark of comparing:

  • TiFlash Ann Top K
  • TiFlash Read K rows

}
}
buffer.WriteString(", limit:")
buffer.WriteString(fmt.Sprint(p.AnnIndexExtra.TopK))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strconv.FormatUint()

"Warning 1105 No predicate column has been collected yet for table test.t, so only indexes and the columns composing the indexes will be analyzed",
"Warning 1105 The version 2 would collect all statistics not only the selected indexes",
"Warning 1105 analyzing vector index is not supported, skip idx",
"Warning 1105 analyzing vector index is not supported, skip idx2"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

analyze table t index idx
we specify idx only, while idx2 is checked and throw warnings, it that normal?

// ExtractVectorSearch extracts a VectorSearchExpr from an expression.
// NOTE: not all VectorSearch functions are supported by the index. The caller
// needs to check the distance function name.
func ExtractVectorSearch(expr Expression) (*VectorSearchExpr, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not let VectorSearchExpr be an another expression Expression implementor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we could do this in getFunction verify args

and the handling in pkg/planner/core/task.go xx31 will be more clean

return false
}
ts.AnnIndexExtra.RefVecF32 = vs.Vec.SerializeTo(nil)
ts.AnnIndexExtra.TopK = uint32(p.Count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better add some comments for this function

we could only see: that it checks some elements in topN's by item, then fills some fields in ts. what is it used for? another projection injection logic is not seen here?

@JaySon-Huang

This comment was marked as resolved.

pkg/planner/core/find_best_task.go Outdated Show resolved Hide resolved
pkg/planner/core/find_best_task.go Show resolved Hide resolved
@breezewish breezewish force-pushed the support-planner-cbo branch from f5f94db to afa9ff7 Compare October 8, 2024 07:35
@breezewish breezewish changed the base branch from feature/vector-index to master October 8, 2024 07:35
@ti-chi-bot ti-chi-bot bot added the component/dumpling This is related to Dumpling of TiDB. label Oct 8, 2024
@winoros winoros force-pushed the support-planner-cbo branch from b62ad3d to add4469 Compare October 8, 2024 08:33
@winoros winoros removed the component/dumpling This is related to Dumpling of TiDB. label Oct 8, 2024
breezewish and others added 3 commits October 8, 2024 16:46
Signed-off-by: Wish <breezewish@outlook.com>
tidbcloud/tidb-cse#940
tidbcloud/tidb-cse#920

Signed-off-by: Wish <breezewish@outlook.com>
@winoros winoros force-pushed the support-planner-cbo branch from 2d015d8 to bf3a8e7 Compare October 8, 2024 09:51
…s/tidb into support-planner-cbo

* 'support-planner-cbo' of ssh://ssh.github.com:443/winoros/tidb:
  address comments
Signed-off-by: Wish <breezewish@outlook.com>
Signed-off-by: Wish <breezewish@outlook.com>
go.mod Outdated
@@ -307,7 +307,7 @@ require (
google.golang.org/genproto/googleapis/rpc v0.0.0-20240515191416-fc5f0ca64291 // indirect
google.golang.org/protobuf v1.34.2
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/natefinch/lumberjack.v2 v2.2.1 // indirect
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to revert it.

Copy link

tiprow bot commented Oct 9, 2024

@breezewish: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test fast_test_tiprow
  • /test fast_test_tiprow_ddlargsv1
  • /test tidb_parser_test

Use /test all to run the following jobs that were automatically triggered:

  • fast_test_tiprow
  • tidb_parser_test

In response to this:

/test mysql-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@breezewish
Copy link
Member

/test pull-br-integration-test

Copy link

tiprow bot commented Oct 9, 2024

@breezewish: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test fast_test_tiprow
  • /test fast_test_tiprow_ddlargsv1
  • /test tidb_parser_test

Use /test all to run the following jobs that were automatically triggered:

  • fast_test_tiprow
  • tidb_parser_test

In response to this:

/test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added the lgtm label Oct 10, 2024
@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Oct 10, 2024
Copy link

ti-chi-bot bot commented Oct 10, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-10-09 08:26:15.555359972 +0000 UTC m=+1034530.975572984: ☑️ agreed by AilinKid.
  • 2024-10-10 02:39:36.975351743 +0000 UTC m=+1100132.395564769: ☑️ agreed by XuHuaiyu.

Copy link
Contributor

@tangenta tangenta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm for ddl part

Copy link

ti-chi-bot bot commented Oct 10, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AilinKid, tangenta, XuHuaiyu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Oct 10, 2024
@breezewish
Copy link
Member

/test unit-test

Copy link

tiprow bot commented Oct 10, 2024

@breezewish: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test fast_test_tiprow
  • /test fast_test_tiprow_ddlargsv1
  • /test tidb_parser_test

Use /test all to run the following jobs that were automatically triggered:

  • fast_test_tiprow
  • tidb_parser_test

In response to this:

/test unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

if len(lt.ByItems) != 1 {
return ret
}
vs := expression.ExtractVectorHelper(lt.ByItems[0].Expr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about multiple order by items.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is no longer a vector search.

@zimulala zimulala added the needs-cherry-pick-release-8.4 Should cherry pick this PR to release-8.4 branch. label Oct 10, 2024
@winoros winoros changed the title planner: support vector index by CBO planner: support vector index in planner Oct 10, 2024
@winoros
Copy link
Member Author

winoros commented Oct 10, 2024

/retest

@ti-chi-bot ti-chi-bot bot merged commit 7aefb08 into pingcap:master Oct 10, 2024
25 checks passed
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.4: #56546.

@winoros winoros deleted the support-planner-cbo branch October 10, 2024 14:35
ti-chi-bot bot pushed a commit that referenced this pull request Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-cherry-pick-release-8.4 Should cherry pick this PR to release-8.4 branch. release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support to choose vector index by CBO and USE INDEX hint
8 participants