Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support substring #487

Merged
merged 8 commits into from
May 9, 2021
Merged

Conversation

tlightsky
Copy link
Contributor

Summary

support substring

Changelog

  • New Feature

Related Issues

#429

Test Plan

unit tests

@github-actions github-actions bot added the A-query Area: databend query label May 7, 2021
@tlightsky tlightsky force-pushed the dev-support-substring branch from 04b54de to 78ce427 Compare May 7, 2021 12:47
from: args[1].clone(),
len: args[2].clone()
})),
_ => Result::Err(ErrorCodes::BadArguments(format!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need format here, just use to_string()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@github-actions github-actions bot added the C-testing Category: testing label May 7, 2021
@tlightsky tlightsky force-pushed the dev-support-substring branch 3 times, most recently from 15a0315 to d39a89d Compare May 7, 2021 16:05
@tlightsky tlightsky changed the title [WIP] support substring support substring May 7, 2021
@BohuTANG
Copy link
Member

BohuTANG commented May 8, 2021

@tlightsky Hi, is there any way add a substring test to performance suites?
We can measure it on mind.

@tlightsky
Copy link
Contributor Author

tlightsky commented May 8, 2021

@BohuTANG
I'll try to add it,
it seems very slow running on my macbook pro,
how long should it be to run through the performance tests on macbook pro:

./fuse-test --run-dir '1_performance'

@tlightsky tlightsky force-pushed the dev-support-substring branch from 8d62558 to de41b90 Compare May 8, 2021 14:14
@tlightsky
Copy link
Contributor Author

just rebased, and the performance test seems failed on this branch:

[2021-05-08T14:14:38Z INFO  warp::server] listening on http://127.0.0.1:8080 
[2021-05-08T14:14:50Z INFO  fuse_query::pipelines::processors::pipeline_builder] Received for plan:
    Projection: 1:UInt64
      ReadDataSource: scan partitions: [1], scan schema: [dummy:UInt8], statistics: [read_rows: 0, read_bytes: 0]
[2021-05-08T14:14:50Z INFO  fuse_query::pipelines::processors::pipeline_builder] Pipeline:
    ProjectionTransform × 1 processor
      SourceTransform × 1 processor
[2021-05-08T14:14:50Z INFO  fuse_query::pipelines::processors::pipeline_builder] Received for plan:
    AggregatorFinal: groupBy=[[]], aggr=[[avg([number])]]
      RedistributeStage[state: AggregatorMerge, id: 0]
        AggregatorPartial: groupBy=[[]], aggr=[[avg([number])]]
          ReadDataSource: scan partitions: [12], scan schema: [number:UInt64], statistics: [read_rows: 100000000000, read_bytes: 800000000000]
[2021-05-08T14:14:50Z INFO  fuse_query::pipelines::processors::pipeline_builder] Pipeline:
    AggregatorFinalTransform × 1 processor
      Merge (AggregatorPartialTransform × 12 processors) to (AggregatorFinalTransform × 1)
        AggregatorPartialTransform × 12 processors
          SourceTransform × 12 processors
thread 'tokio-runtime-worker' panicked at 'attempt to add with overflow', /Users/me/source/rust/datafuse/common/datavalues/src/data_value_arithmetic.rs:118:21
stack backtrace:
   0: rust_begin_unwind
             at /rustc/673d0db5e393e9c64897005b470bfeb6d5aec61b/library/std/src/panicking.rs:493:5
   1: core::panicking::panic_fmt
             at /rustc/673d0db5e393e9c64897005b470bfeb6d5aec61b/library/core/src/panicking.rs:92:14
   2: core::panicking::panic
             at /rustc/673d0db5e393e9c64897005b470bfeb6d5aec61b/library/core/src/panicking.rs:50:5
   3: common_datavalues::data_value_arithmetic::DataValueArithmetic::data_value_arithmetic_op
             at ./common/datavalues/src/data_value_arithmetic.rs:118:21
   4: <common_functions::aggregators::aggregator_avg::AggregatorAvgFunction as common_functions::function::IFunction>::accumulate
             at ./common/functions/src/aggregators/aggregator_avg.rs:70:23
   5: <fuse_query::pipelines::transforms::transform_aggregator_partial::AggregatorPartialTransform as fuse_query::pipelines::processors::processor::IProcessor>::execute::{{closure}}
             at ./fusequery/query/src/pipelines/transforms/transform_aggregator_partial.rs:78:17
   6: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /Users/me/.rustup/toolchains/nightly-2021-03-24-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/future/mod.rs:80:19
   7: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /Users/me/.rustup/toolchains/nightly-2021-03-24-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/future/future.rs:119:9
   8: <fuse_query::pipelines::processors::processor_merge::MergeProcessor as fuse_query::pipelines::processors::processor::IProcessor>::execute::{{closure}}::{{closure}}
             at ./fusequery/query/src/pipelines/processors/processor_merge.rs:60:48

@BohuTANG
Copy link
Member

BohuTANG commented May 8, 2021

Try to change the numbers_mt(100000000000) to numbers_mt(10000000000)

@tlightsky
Copy link
Contributor Author

removed 2 zero for all and the tests can finish after 41s

@tlightsky
Copy link
Contributor Author

tlightsky commented May 8, 2021

took 46s on MacBook Pro

❯ ./fuse-test --run-dir '1_performance'

Running 7 performance tests.

00_performance_avg:                                                     [ OK ]

1 tests passed. 0 tests skipped.
00_performance_count:                                                   [ OK ]

2 tests passed. 0 tests skipped.
00_performance_group:                                                   [ OK ]

3 tests passed. 0 tests skipped.
00_performance_max:                                                     [ OK ]

4 tests passed. 0 tests skipped.
00_performance_sort:                                                    [ OK ]

5 tests passed. 0 tests skipped.
00_performance_substring:                                               [ OK ]

6 tests passed. 0 tests skipped.
00_performance_sum:                                                     [ OK ]

7 tests passed. 0 tests skipped.

@BohuTANG
Copy link
Member

BohuTANG commented May 8, 2021

@tlightsky

Hi, this patch almost done.
For thtis patch, we only change the one case to numbers_mt(1000000000), (sorry for the confusion):

SELECT substring(cast(number as text) from 3) from numbers_mt(1000000000) where number > 10000000 order by number desc limit 10;

Performance tests are used to anchor to a fixed amount of data(normal case is 100bllion, and slow case such as GroupBy is 1000000000) and then see if there is performance degradation, it's mainly for the github CI.

@BohuTANG
Copy link
Member

BohuTANG commented May 8, 2021

PR:
tlightsky#1

@tlightsky
Copy link
Contributor Author

cool,
I think the target branch should be:
tlightsky:dev-support-substring
and then this pr should be OK

Revert performance tests data size
Copy link
Member

@BohuTANG BohuTANG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thank you @tlightsky

@BohuTANG BohuTANG merged commit 43862a7 into databendlabs:master May 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-query Area: databend query C-testing Category: testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants