-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: cumsum function does not work as expected in Bigquery backend due to ibis default window frame logic #10699
Comments
/take |
Do you have a simple reproducible failure case where Ibis is producing incorrect results, that doesn't depend on anything in your bigquery project? For example, using The following code behaves correctly for me on t = ibis.memtable({"ranking": [1, 2, 3, 4], "rewards": [10, 20, 30, 40]})
expr = t.rewards.cumsum(order_by="ranking")
result = con.to_pyarrow(expr)
expected = [10, 30, 60, 100]
assert result.to_pylist() == expected |
Hi @cpcloud As described in this issue, this issue is specific to Bigquery backend. And to reproduce the problem, I would like to modify the data bit as And my testing code snippet is:
The last statement printed the result:
To conclude, the last two records are having incorrect I am using the corporate resources to debug this issue as it occurred in our company projects, so can't create a public accessible table for you to try. Sorry about that. |
The root cause is when ibis ignore the Furthermore, I don't think it is correct for Bigquery backend implementation to simply drop any user defined window frames when they are defined as |
You don't need to do that. Just provide a dataframe with literal values that reproduces the problem, right here on GitHub and we can work from there. |
Github does not support CSV so I packaged the exported CSV file in a zip file. Hope this would be helpful |
What happened?
What happened?
I would like to calculate the cumulative sum of a target column by rows by using ibis. Here is a sample code snippet used:
The generated SQL is
According to the paper searched in google here the Bigquery logic with default window frame is :
I did a experiment to prove this Bigquery behavior with different window frames by the calculation of the cumulative sum by rows:
From above result, It can tell that only the
rows_window_frame_running_rewards
with explict rows window frame giving out the correct running sum result.Therefore, an explicit rows window frame must be respected by ibis instead of dropping the window spec and using the default window frame when the range is
BETWEEN UNBOUND PRECEDING AND CURRENT ROW
.The Bigquery default window frame behavior shouldn't be used by ibis to implement the
cumusum
function and this logic further impacts thecumulative_window
,rows_window
and et al.What version of ibis are you using?
As limited testing, the problem exists:
What backend(s) are you using, if any?
big query
Relevant log output
(venv) ➜ ibis-debug pip list | grep ibis ibis-framework 9.5.0 (venv) ➜ ibis-debug python -V Python 3.10.16 (venv) ➜ ibis-debug
Code of Conduct
The text was updated successfully, but these errors were encountered: