Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Z-Order Implement #7149

Merged
merged 4 commits into from
Dec 2, 2021
Merged

Conversation

xinghuayu007
Copy link
Contributor

@xinghuayu007 xinghuayu007 commented Nov 18, 2021

Proposed changes

For issue: #6359

Background

image

image
image

Z-Order

image
image

Application Situation

image

Grammar

CREATE TABLE table2 (
siteid int(11) NULL DEFAULT "10" COMMENT "",
citycode int(11) NULL COMMENT "",
username varchar(32) NULL DEFAULT "" COMMENT "",
pv bigint(20) NULL DEFAULT "0" COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(siteid, citycode)
COMMENT "OLAP"
DISTRIBUTED BY HASH(siteid) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"data_sort.sort_type" = "ZORDER",
"data_sort.col_num" = "2",
"in_memory" = "false",
"storage_format" = "V2"
);

data_sort.sort_type: support lexical/z-order sort type, default is lexical sort type
data_sort.col_num: take the pre-columns as sort key

Performance Test

Load Performance
Env: ssb scale 100, stream load
Stream Load Performance(s)

Query Performance
Env: TPCH scale 25
Table:

CREATE TABLE LINEITEM (
L_PARTKEY int(11) NOT NULL COMMENT "",
L_SUPPKEY int(11) NOT NULL COMMENT "",
L_ORDERKEY int(11) NOT NULL COMMENT "",
L_LINENUMBER int(11) NOT NULL COMMENT "",
L_QUANTITY decimal(15, 2) NOT NULL COMMENT "",
L_EXTENDEDPRICE decimal(15, 2) NOT NULL COMMENT "",
L_DISCOUNT decimal(15, 2) NOT NULL COMMENT "",
L_TAX decimal(15, 2) NOT NULL COMMENT "",
L_RETURNFLAG char(1) NOT NULL COMMENT "",
L_LINESTATUS char(1) NOT NULL COMMENT "",
L_SHIPDATE date NOT NULL COMMENT "",
L_COMMITDATE date NOT NULL COMMENT "",
L_RECEIPTDATE date NOT NULL COMMENT "",
L_SHIPINSTRUCT char(25) NOT NULL COMMENT "",
L_SHIPMODE char(10) NOT NULL COMMENT "",
L_COMMENT varchar(44) NOT NULL COMMENT "",
L_DEFAULT varchar(44) NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(L_PARTKEY, L_SUPPKEY)
COMMENT "OLAP"
DISTRIBUTED BY HASH(L_PARTKEY) BUCKETS 1
PROPERTIES (
"replication_num" = "3",
"in_memory" = "false",
"storage_format" = "V2"
);

Q1: select count(1) from LINEITEM where L_PARTKEY=999852;
Q2: select count(1) from LINEITEM where L_SUPPKEY=125019;

Query Performance(s)

Percent of Filtered Data Pages

Limitation

  1. only duplicate model supports z-order
  2. short key index is not available for z-order
  3. only V2 storage format supports z-order

Future work

  1. aggregate/unique model supports z-order
  2. try to use hilbert-curve to make data more clustering

Types of changes

What types of changes does your code introduce to Doris?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)
  • Code refactor (Modify the code structure, format the code, etc...)
  • Optimization. Including functional usability improvements and performance improvements.
  • Dependency. Such as changes related to third-party components.
  • Other.

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have created an issue on (Fix #ISSUE) and described the bug/feature there in detail
  • Compiling and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • If these changes need document changes, I have updated the document
  • Any dependent changes have been merged

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@morningman
Copy link
Contributor

Nice work!

@morningman morningman added kind/improvement kind/feature Categorizes issue or PR as related to a new feature. labels Nov 18, 2021
be/src/olap/skiplist.h Outdated Show resolved Hide resolved
be/src/util/tuple_row_zorder_compare.h Outdated Show resolved Hide resolved
be/src/util/tuple_row_zorder_compare.h Outdated Show resolved Hide resolved
@hf200012
Copy link
Contributor

apache carbondata also has this min/max index,nice

@@ -268,7 +271,7 @@ class ConcurrentTest {
ConcurrentTest()
: _mem_tracker(new MemTracker(-1)),
_mem_pool(new MemPool(_mem_tracker.get())),
_list(TestComparator(), _mem_pool.get(), false) {}
_list(new TestComparator(), _mem_pool.get(), false) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add ~ConcurrentTest() to delete _list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_list is an object, is it needs to delete it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you pass a raw pointer to the skiplist, and nowhere to delete this TestComparator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return true;
}

public String getProperties() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toString() maybe better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getProperties keeps the same style with before, like dynamic partition

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, So I think we should change the dynamic partition too...
It is more like toString() or toSql()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modify it to toSql()

@@ -137,6 +151,11 @@ public void modifyTableProperties(Map<String, String> modifyProperties) {
properties.putAll(modifyProperties);
}

public void modifyTableProperties(DataSortInfo dataSortInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the method name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modifyTableProperties keeps the same style with before

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not same, the origin 2 modifyTableProperties() method pass the key-value, which means it can modify any of table properties.
But here you just want to modify the data sort info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modify it to modifyDataSortInfoProperties()

@@ -165,6 +202,10 @@ public TCreateTabletReq toThrift() {
tSchema.setSchemaHash(schemaHash);
tSchema.setKeysType(keysType.toThrift());
tSchema.setStorageType(storageType);
if (dataSortInfo != null) {
tSchema.setSortType(dataSortInfo.getSortType());
tSchema.setSortColNum(dataSortInfo.getColNum());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use default value instead of optional setting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataSortInfo maybe null, will cause toThrift() failed

@morningman
Copy link
Contributor

I tested my own and it works well.
3 more things:

  1. Some litter problem which I have commented.
  2. rebase to resolve the conflict.
  3. Add document for this new feature.

@xinghuayu007 xinghuayu007 force-pushed the new_zorder_index branch 3 times, most recently from 282981f to d677397 Compare November 29, 2021 04:06
return true;
}

public String getProperties() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, So I think we should change the dynamic partition too...
It is more like toString() or toSql()

public boolean isZOrderSort() {
return tableProperty != null
&& tableProperty.getDataSortInfo() != null
&& tableProperty.getDataSortInfo().getSortType() != TSortType.LEXICAL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
&& tableProperty.getDataSortInfo().getSortType() != TSortType.LEXICAL;
&& tableProperty.getDataSortInfo().getSortType() == TSortType.ZORDER;

So that we don't need to modify it if we add some other sort type later.

@@ -1714,6 +1729,13 @@ public TStorageFormat getStorageFormat() {
return tableProperty.getStorageFormat();
}

public DataSortInfo getDataSortInfo() {
if (tableProperty == null) {
return new DataSortInfo();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better assign the default value to the fields of DataSortInfo, so that we can make this return more semantically accurate

@@ -137,6 +151,11 @@ public void modifyTableProperties(Map<String, String> modifyProperties) {
properties.putAll(modifyProperties);
}

public void modifyTableProperties(DataSortInfo dataSortInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not same, the origin 2 modifyTableProperties() method pass the key-value, which means it can modify any of table properties.
But here you just want to modify the data sort info.

@xinghuayu007
Copy link
Contributor Author

I tested my own and it works well. 3 more things:

  1. Some litter problem which I have commented.
  2. rebase to resolve the conflict.
  3. Add document for this new feature.

documents are added in 'create table' doc; Alter table is prohibited for z-order table

morningman
morningman previously approved these changes Nov 30, 2021
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 30, 2021
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Dec 1, 2021
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2021

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. kind/feature Categorizes issue or PR as related to a new feature. kind/improvement reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants