Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pg_approx_hll #601

Merged
merged 3 commits into from
Apr 20, 2020
Merged

pg_approx_hll #601

merged 3 commits into from
Apr 20, 2020

Conversation

milanbella
Copy link
Contributor

Check List

  • Tests has been run in packages where changes made if available
  • Linter has been run for changed code
  • Tests for the changes have been added if not covered yet
  • Docs have been added / updated if required

Issue Reference this PR resolves

[For example #12]

Description of Changes Made (if issue reference is not provided)

[Description goes here]

@norbertbede
Copy link
Contributor

xref:
DisctinctCountApprox Postgres HLL support #563

@norbertbede
Copy link
Contributor

norbertbede commented Apr 19, 2020

manual testing result

tested schema:
Orders.txt

cubejs create valid hll column included table when hit distinct apprx - preaggregation see valid DDL command
`-- Table: stb_pre_aggregations.orders__distinct_approx_3yrotyzx_gymlzk4i_1587213010940

DROP TABLE stb_pre_aggregations.orders__distinct_approx_3yrotyzx_gymlzk4i_1587213010940;

CREATE TABLE stb_pre_aggregations.orders__distinct_approx_3yrotyzx_gymlzk4i_1587213010940
(
orders__dateordered timestamp(6) without time zone,
orders__ordersource character varying COLLATE pg_catalog."default",
orders__dateordered_day timestamp without time zone,
orders__order_distinct_approx hll
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;

ALTER TABLE stb_pre_aggregations.orders__distinct_approx_3yrotyzx_gymlzk4i_1587213010940
OWNER to cubejsrole;`

then generate valid sql

` Executing SQL: 03601c10-804d-4a40-8c8b-f2894ddb5b21-span-1
select max(("orders".dateordered::timestamptz AT TIME ZONE 'UTC')) from (
SELECT
o.c_order_id,
ol.c_orderline_id,
o.dateordered,
ol.ad_client_id,
ol.QtyOrdered,
COALESCE(sr.lastname,'Empty') order_salesrep_name,

COALESCE(ords.name,'Empty') as order_source_name,
o.issotrx
FROM c_orderline ol
JOIN c_order o ON (o.c_order_id=ol.c_order_id)
LEFT JOIN ad_user sr ON o.salesrep_id = sr.ad_user_id
LEFT JOIN c_ordersource ords ON o.c_ordersource_id = ords.c_ordersource_id

WHERE ol.ad_client_id=1000026
AND (o.docstatus = ANY (ARRAY['CO'::text, 'CL'::text]))
AND o.issotrx='Y'
AND o.dateordered = '2020-04-01'

) AS "orders"

Performing query completed: 03601c10-804d-4a40-8c8b-f2894ddb5b21-span-1 (242ms)

Executing Load Pre Aggregation SQL: 03601c10-804d-4a40-8c8b-f2894ddb5b21-span-1

CREATE TABLE stb_pre_aggregations.orders__distinct_approx_3yrotyzx_gymlzk4i_1587213010940 AS SELECT
"orders".dateordered "orders__dateordered", "orders".order_source_name "orders__ordersource", date_trunc('day', ("orders".dateordered::tim$
stamptz AT TIME ZONE 'UTC')) "orders__dateordered_day", hll_add_agg(hll_hash_any("orders".c_order_id)) "orders__order_distinct_approx"
FROM
(
SELECT
o.c_order_id,
ol.c_orderline_id,
o.dateordered,
ol.ad_client_id,
ol.QtyOrdered,
COALESCE(sr.lastname,'Empty') order_salesrep_name,

COALESCE(ords.name,'Empty') as order_source_name,
o.issotrx
FROM c_orderline ol
JOIN c_order o ON (o.c_order_id=ol.c_order_id)
LEFT JOIN ad_user sr ON o.salesrep_id = sr.ad_user_id
LEFT JOIN c_ordersource ords ON o.c_ordersource_id = ords.c_ordersource_id

WHERE ol.ad_client_id=1000026
AND (o.docstatus = ANY (ARRAY['CO'::text, 'CL'::text]))
AND o.issotrx='Y'
AND o.dateordered = '2020-04-01'

) AS "orders"
GROUP BY 1, 2, 3

Executing SQL: 03601c10-804d-4a40-8c8b-f2894ddb5b21-span-1

SELECT "orders__dateordered_day" "orders__dateordered_day", hll_cardinality(hll_union_agg("orders__order_distinct_approx")) "orders__order_dist
inct_approx" FROM stb_pre_aggregations.orders__distinct_approx_3yrotyzx_gymlzk4i_1587213010940 GROUP BY 1 ORDER BY 1 ASC LIMIT 10000

Performing query completed: 03601c10-804d-4a40-8c8b-f2894ddb5b21-span-1 (147ms)
Load Request Success: 03601c10-804d-4a40-8c8b-f2894ddb5b21-span-1 (2431ms)

{
"measures": [
"Orders.orderDistinctApprox"
],
"timeDimensions": [
{
"dimension": "Orders.dateordered",
"granularity": "day"
}
],
"filters": []
}

Load Request Success: d8a71e17-4965-43fc-bf71-b1abc7fdd6de-span-1 (6ms)
`


hllMerge(sql) {
return `hll_cardinality(hll_union_agg(${sql}))`;
return sql;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this one is redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove that 'return'

}

countDistinctApprox(sql) {
return `hll_cardinality(${sql})`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems incorrect. Could you please test it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked for our test schema. I will change it for hll_cardinality(hll_add_agg(hll_hash_any(${sql}))) .

@paveltiunov
Copy link
Member

Hey @milanbella Hey Milan! Overall looks great! Could you please check comments I provided.

@paveltiunov paveltiunov merged commit be85ac6 into cube-js:master Apr 20, 2020
@paveltiunov
Copy link
Member

@milanbella Looks good! Thanks for contributing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants