-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HLL support default value #1825
Conversation
@HangyuanLiu |
be/src/olap/aggregate_func.h
Outdated
@@ -408,7 +408,11 @@ struct AggregateFuncTraits<OLAP_FIELD_AGGREGATION_HLL_UNION, OLAP_FIELD_TYPE_HLL | |||
dst_slice->size = sizeof(HyperLogLog); | |||
// use 'placement new' to allocate HyperLogLog on arena, so that we can control the memory usage. | |||
char* mem = arena->Allocate(dst_slice->size); | |||
dst_slice->data = (char*) new (mem) HyperLogLog(src_slice->data); | |||
if (src_slice->empty()) { | |||
dst_slice->data = (char*) new (mem) HyperLogLog(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the slice size is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When HLL columns are NULL in the LOAD data
Update example @imay |
I think we don't support this usage. When load data with HLL column, we must use hll_hash function, hll_hash function will handle null value. |
This PR will support this capability. |
@HangyuanLiu Hi, I See. |
@kangkaisen LOAD LABEL test.uv LOAD LABEL test.uv uv2 may be another business stream |
@HangyuanLiu I think using If user assign nothing to HLL column, HLL column's content will be undefined. Because HLL is not like other types of column, which can be filled by default value or null.
And also, should will add a function |
@morningman I agree with you. |
I agree too Will you add a hll_empty() function? And it will be better if you can create an issue to record this improvement. |
OK,I can add a function |
…to hll_default Conflicts: be/src/olap/aggregate_func.h
be/src/exprs/hll_function.cpp
Outdated
const int HLL_EMPTY_SIZE = 1; | ||
std::string buf; | ||
std::unique_ptr<HyperLogLog> hll; | ||
hll.reset(new HyperLogLog()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A stack variable is enough
6ed3dfb
to
c8de048
Compare
c8de048
to
7105674
Compare
# Conflicts: # be/src/exprs/hll_function.cpp
be/src/olap/hll.h
Outdated
@@ -103,6 +103,14 @@ class HyperLogLog { | |||
|
|||
int64_t estimate_cardinality(); | |||
|
|||
std::string empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not make this function static?
@@ -18,6 +18,9 @@ | |||
|
|||
HLL_HASH(column_name) | |||
生成HLL列类型,用于insert或导入的时候,导入的使用见相关说明 | |||
|
|||
EMPTY_HLL() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
english doc
fe966b0
to
25f3a6a
Compare
When load date don't have HLL value , HLL value is a empty HyperLogLog
eg:
CREATE TABLE
test_uv_9
(pin_id
bigint(20) NOT NULL COMMENT "",id
bigint(20) NULL COMMENT "",uv1
hll HLL_UNION NOT NULL COMMENT "",uv2
hll HLL_UNION NOT NULL COMMENT "") ENGINE=OLAP
AGGREGATE KEY(
pin_id
,id
)DISTRIBUTED BY HASH(
pin_id
) BUCKETS 16PROPERTIES (
"storage_type" = "COLUMN"
)
1、curl --location-trusted -u root:123456 -H column_separator:, -H label:test_uv_14 -H "columns:pin_id,idx,u1,u2,id=12 ,uv1=hll_hash(u1)" -T uv_test http://11.40.166.162:8030/api/test/test_uv_9/_stream_load
Result :hll_union_agg(uv2) should be 0,And hll_union_agg(uv1) has values > 0
2、curl --location-trusted -u root:123456 -H column_separator:, -H label:test_uv_15 -H "columns:pin_id,idx,u1,u2,id=12" -T uv_test http://11.40.166.162:8030/api/test/test_uv_10/_stream_load
Result :hll_union_agg(uv1) and hll_union_agg(uv2) should be 0