-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vector/Formula Aggregation Inconsistency #4023
Comments
Some of these differences or issues are indeed deliberate. Others are worth discussing. |
Contents of the example file: from deephaven.column import int_col, long_col, double_col, float_col, string_col
from deephaven import agg, new_table
tbl = new_table([
string_col('group', ['1B', '1B']),
int_col('values', [1_000_000_000, 2_000_000_000])
])
grouped = tbl.group_by(['group'])
results = grouped.update([
'int_vector_sum = sum(values)',
'int_formula_sum = values[0] + values[1]',
'long_formula_sum = (long)values[0] + values[1]',
'int_vector_avg = avg(values)',
'int_formula_avg = (values[0] + values[1]) / 2',
'long_formula_avg = ((long)values[0] + values[1]) / 2'
])
results2 = tbl.agg_by([
agg.avg('int_aggby_avg=values'), agg.sum_('int_aggby_sum=values')],
['group']) |
More ... from deephaven.column import int_col, long_col, double_col, float_col, string_col
from deephaven import agg, new_table
tbl = new_table([
string_col('group', ['1B', '1B']),
int_col('values', [1_000_000_000, 2_000_000_000])
])
grouped = tbl.group_by(['group'])
results = grouped.update([
'int_vector_sum = sum(values)',
'int_formula_sum = values[0] + values[1]',
'long_formula_sum = (long)values[0] + values[1]',
'int_vector_avg = avg(values)',
'int_formula_avg = (values[0] + values[1]) / 2',
'long_formula_avg = ((long)values[0] + values[1]) / 2'
])
results2 = tbl.agg_by([
agg.avg('int_aggby_avg=values'), agg.sum_('int_aggby_sum=values')],
['group'])
m = results.meta_table
m2 = results2.meta_table |
Regarding:
I encountered the type-preserving behavior in these numeric functions when using them to verify Pointing out more inconsistencies,
|
|
Aggregation operations in query library functions and built-in query aggregations are inconsistent. This PR makes them consistent. Query library functions were changed. * `percentile` now returns the primitive type. * `sum` returns a widened type of `double` for floating point inputs or `long` for integer inputs. * `product` returns a widened type of `double` for floating point inputs or `long` for integer inputs. * `cumsum` returns a widened type of `double[]` for floating point inputs or `long[]` for integer inputs. * `cumprod` returns a widened type of `double[]` for floating point inputs or `long[]` for integer inputs. * `wsum` returns a widened type of `long` for all integer inputs and `double` for inputs containing floating points. Note: Because the types have changed, the NULL return values have changed as well. Resolves #4023
The behavior of vector aggregations compared to in-line formulas or agg_by aggregations that do the same thing is wildly different in the handling of return types and type overflows.
Here are some observations:
This could be confusing users. Fixing the Inline Formula is simple; just cast the operands to long. But even if there are workarounds for all scenarios, is it not possible to handle overflows consistently?
vector_function_inconsistency.py.txt
The text was updated successfully, but these errors were encountered: