Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for ordering on character fields for sharded keyspace queries #7678

Merged
merged 20 commits into from
Mar 23, 2021

Conversation

GuptaManan100
Copy link
Member

@GuptaManan100 GuptaManan100 commented Mar 14, 2021

Description

This PR adds support for ordering on character fields for sharded keyspace queries. We now add weighted_strings for all the columns that we do not know the types for in order by clause and use it only in the case for VARCHAR, TEXT and such data types. After this change adding weight_strings will become the default and will support ordering for any columns.

For now, the planner is unable to figure out that MySQL functions like MAX, MIN etc will return integral data if the input column in integral which means that it will add weight_strings for them i.e.

# max aggregate will add weight_string
"select max(col) k, a from user group by a order by k"
{
  "QueryType": "SELECT",
  "Original": "select max(col) k, a from user group by a order by k",
  "Instructions": {
    "OperatorType": "Sort",
    "Variant": "Memory",
    "OrderBy": "0 ASC",
    "Inputs": [
      {
        "OperatorType": "Aggregate",
        "Variant": "Ordered",
        "Aggregates": "max(0)",
        "Distinct": "false",
        "GroupBy": "1",
        "Inputs": [
          {
            "OperatorType": "Route",
            "Variant": "SelectScatter",
            "Keyspace": {
              "Name": "user",
              "Sharded": true
            },
            "FieldQuery": "select max(col) as k, a, weight_string(max(col)), weight_string(a) from `user` where 1 != 1 group by a",
            "OrderBy": "1 ASC",
            "Query": "select max(col) as k, a, weight_string(max(col)), weight_string(a) from `user` group by a order by a asc",
            "Table": "`user`"
          }
        ]
      }
    ]
  }
}

Another shortcoming is that while ordering for all columns is supported, GROUP BY will not work for varchar columns since for now orderedAggregate still uses the original column for merging. i.e

# scatter group by a text column, reuse existing weight_string
"select count(*) k, a, textcol1, b from user group by a, textcol1, b order by k, textcol1"
{
  "QueryType": "SELECT",
  "Original": "select count(*) k, a, textcol1, b from user group by a, textcol1, b order by k, textcol1",
  "Instructions": {
    "OperatorType": "Sort",
    "Variant": "Memory",
    "OrderBy": "0 ASC, 2 ASC",
    "Inputs": [
      {
        "OperatorType": "Aggregate",
        "Variant": "Ordered",
        "Aggregates": "count(0)",
        "Distinct": "false",
        "GroupBy": "1, 4, 3",
        "Inputs": [
          {
            "OperatorType": "Route",
            "Variant": "SelectScatter",
            "Keyspace": {
              "Name": "user",
              "Sharded": true
            },
            "FieldQuery": "select count(*) as k, a, textcol1, b, weight_string(textcol1), weight_string(a), weight_string(b) from `user` where 1 != 1 group by a, textcol1, b",
            "OrderBy": "2 ASC, 1 ASC, 3 ASC",
            "Query": "select count(*) as k, a, textcol1, b, weight_string(textcol1), weight_string(a), weight_string(b) from `user` group by a, textcol1, b order by textcol1 asc, a asc, b asc",
            "Table": "`user`"
          }
        ]
      }
    ]
  }
}

Notice that in the plan output the ordered aggregate is using 1,4 and 3 columns instead of 5,4 and 6. To support this the orderedAggregate will also have to store both the column and the weight_string column. The work on this has been postponed for later.

Related Issue(s)

Checklist

Deployment Notes

Impacted Areas in Vitess

Components that this PR will affect:

  • Query Serving
  • VReplication
  • Cluster Management
  • Build/CI
  • VTAdmin

…ype is known

Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
@GuptaManan100 GuptaManan100 self-assigned this Mar 14, 2021
@GuptaManan100 GuptaManan100 added Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature) labels Mar 14, 2021
Signed-off-by: GuptaManan100 <manan@planetscale.com>
@GuptaManan100 GuptaManan100 marked this pull request as draft March 15, 2021 12:03
Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
@GuptaManan100 GuptaManan100 requested a review from systay March 22, 2021 11:53
…string

Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: GuptaManan100 <manan@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
@systay systay merged commit 24732d9 into vitessio:master Mar 23, 2021
@systay systay deleted the order-by branch March 23, 2021 13:42
@deepthi
Copy link
Member

deepthi commented Mar 23, 2021

Nice work! Does this close #7636?

@askdba askdba added this to the v10.0 milestone Mar 26, 2021
systay pushed a commit to systay/vitess that referenced this pull request Sep 22, 2021
This is a backport of vitessio#8856

The regression in the linked issue was found to be occurring from adding weight_string function due to order by as introduced in vitessio#7678. MySQL however does not support the generated query -

```
select ascii(val1) as a, count(*), weight_string(ascii(val1)) from aggr_test group by a order by a asc
```

In order to fix this, we should also add the weight_string function to the group by clause as follows -

```
select ascii(val1) as a, count(*), weight_string(ascii(val1)) from aggr_test group by a, weight_string(ascii(val1)) order by a asc
```

Fixes vitessio#8855

Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
systay added a commit to planetscale/vitess that referenced this pull request Sep 22, 2021
The regression in the linked issue was found to be occurring from adding weight_string function due to order by as introduced in vitessio#7678.
MySQL does not support the generated query -

```sql
select ascii(val1) as a, count(*), weight_string(ascii(val1)) from aggr_test group by a order by a asc
```

In order to fix this, we also add the weight_string function to the group by clause as follows -

```
select ascii(val1) as a, count(*), weight_string(ascii(val1)) from aggr_test group by a, weight_string(ascii(val1)) order by a asc
```

Co-authored-by: Manan Gupta <manan@planetscale.com>

Signed-off-by: Andres Taylor <andres@planetscale.com>
systay added a commit to planetscale/vitess that referenced this pull request Sep 22, 2021
The regression in the linked issue was found to be occurring from adding weight_string function due to order by as introduced in vitessio#7678.
MySQL does not support the generated query -

```sql
select ascii(val1) as a, count(*), weight_string(ascii(val1)) from aggr_test group by a order by a asc
```

In order to fix this, we also add the weight_string function to the group by clause as follows -

```
select ascii(val1) as a, count(*), weight_string(ascii(val1)) from aggr_test group by a, weight_string(ascii(val1)) order by a asc
```

Co-authored-by: Manan Gupta <manan@planetscale.com>

Signed-off-by: Andres Taylor <andres@planetscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
4 participants