Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hll_raw_agg in Aggregate Function #832

Merged
merged 1 commit into from
Apr 1, 2019

Conversation

zhaidongbo
Copy link
Contributor

Support hll_raw_agg in Aggregate Function
hll_raw_agg Function aggregates the HLL type value, and return the HLL type value

Copy link
Contributor

@imay imay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some doc/help for this function?

DCHECK(!src.is_null);
DCHECK_EQ(src.len, HLL_SETS_BYTES_NUM);

StringVal result(ctx, src.len+1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
StringVal result(ctx, src.len+1);
StringVal result(ctx, src.len + 1);

FunctionContext* ctx,
StringVal* dst) {
hll_union_agg_init(ctx, dst);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All calculations can be based on full type, so dst will be initted to a hll full type .

@@ -25,6 +23,7 @@
#include "runtime/string_value.h"
#include "runtime/datetime_value.h"
#include "exprs/anyval_util.h"
#include "exprs/aggregate_functions.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd better not move #include "exprs/aggregate_functions.h", we put this first to make sure that all need includes is written in header files

select HLL_CARDINALITY(uv_set) from test_uv;
select dt, HLL_CARDINALITY(uv_set) from test_uv;
select dt, HLL_CARDINALITY(uv) from (select dt, HLL_RAW_AGG(set1) as uv from test group by dt) tmp;
select dt, HLL_UNION_AGG(set1) as uv from test group by dt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that these three queries don't lead to same result.

1 would return many rows for one dt value
2 and 3 queries would group by dt column and return one row for each value

So, you should split them into different examples

be/src/udf/udf.h Outdated
return (doris::HllDataType)(ptr[0]);
}

uint8_t &getHllData(int idx) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't return a reference value, you can add a function named setHllData

return hll_algorithm(dst.ptr, dst.len);
}
static int64_t hll_algorithm(const HllVal &dst) {
return hll_algorithm(dst.ptr+1, dst.len-1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return hll_algorithm(dst.ptr+1, dst.len-1);
return hll_algorithm(dst.ptr + 1, dst.len - 1);

be/src/udf/udf.h Outdated
@@ -762,6 +766,45 @@ struct LargeIntVal : public AnyVal {
}
};

struct HllVal : public StringVal {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think HllFullVal is a better name, because your HllVal only represent the Full type.

}

void AggregateFunctions::hll_union_parse_and_cal(HllSetResolver& resolver, StringVal* dst) {
void AggregateFunctions::hll_union_parse_and_cal(HllSetResolver& resolver, HllVal* dst) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hll_union_parse_and_cal can be a member of HllVal, and it support a function that merge other type or src StringVal, so we will get a better encapsulation for hll.


for (int i = 0; i < src.len; ++i) {
dst->ptr[i] = std::max(dst->ptr[i], src.ptr[i]);
dst->setHllData(i, std::max(dst->getHllData(i), src.getHllData(i)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HLLVal can support a member of merge other HllVal.

be/src/udf/udf.h Outdated
DCHECK(!is_null);

return (doris::HllDataType)(ptr[0]);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return doris::HllDataType::HLL_DATA_FULL.

hll_raw_agg Function aggregates the HLL type value, and return the HLL type value
Copy link
Contributor

@chenhao7253886 chenhao7253886 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTMG

@imay imay merged commit a1bfc90 into apache:master Apr 1, 2019
platoneko pushed a commit to platoneko/doris that referenced this pull request Oct 27, 2022
…he#832)

Also, change some config variable names of file cache.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants