Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap rolling sum updateBy operation #3399

Merged
merged 4 commits into from
Feb 3, 2023

Conversation

jmao-denver
Copy link
Contributor

Fixes #3388

  1. added rolling_sum_tick() and rolling_sum_time()
  2. added test_rolling_sum and test_rolling_proxy in test_updateby.py
  3. some housekeeping in test_updateby and ugp.auto_locking (default true now) related code in tests, should make it easier to add test cases for new updateBy operations in the near future

py/server/deephaven/updateby.py Outdated Show resolved Hide resolved
py/server/deephaven/updateby.py Show resolved Hide resolved
py/server/tests/test_updateby.py Outdated Show resolved Hide resolved
lbooker42
lbooker42 previously approved these changes Feb 2, 2023
Copy link
Contributor

@lbooker42 lbooker42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@@ -267,3 +266,85 @@ def forward_fill(cols: Union[str, List[str]]) -> UpdateByOperation:
return UpdateByOperation(j_updateby_op=_JUpdateByOperation.Fill(cols))
except Exception as e:
raise DHError(e, "failed to create a forward fill UpdateByOperation.") from e


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this made me wonder if we made an API mistake. For the rolling sum, ema, etc, we have separate methods for different flavors (e.g. ema_tick_decay and ema_time_decay). It seems like it might be more pythonic to have ema and rolling_sum, where the flavor is determined by the inputs. I'm not sure how strong my thoughts are, so this is more of a discussion item.
@jmao-denver @jjbrosnan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the benefits of folding them into one function, but I prefer to have separate functions (especially when the parameter sets are quite different), as it is more explicit, cleaner, less chance for users to make mistake as far as arguments are concerned, also easier for us to document/maintain the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind the explicitness of the ticks/time signatures but I have near-zero python experience in a professional role.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the parameter sets need to be that different. If these were combined into one function, you could have the first argument ts_col be optional with a default value of None. The rest of the arguments could look like they do in the time-based rolling sum functions, where each argument must be an int or string. In a tick case, you give ints. In the time case, if you give an int, it assumes seconds, and if you give a string, it can be whatever.

Then, the try/except block at the end can be separated into two parts, which depend on whether or not a timestamp column is given.

I think the benefits of combining each into one function outweigh the drawbacks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially, for the rolling sum, it would be (without comments):

def rolling_sum(ts_col: str = None, cols: Union[str, List[str]], rev_time: Union[int, str],
                     fwd_time: Union[int, str] = 0) -> UpdateByOperation:
    if ts_col:
        try:
            cols = to_sequence(cols)
            rev_time = _JDateTimeUtils.expressionToNanos(rev_time) if isinstance(rev_time, str) else rev_time
            fwd_time = _JDateTimeUtils.expressionToNanos(fwd_time) if isinstance(fwd_time, str) else fwd_time
            return UpdateByOperation(j_updateby_op=_JUpdateByOperation.RollingSum(ts_col, rev_time, fwd_time, *cols))
        except Exception as e:
            raise DHError(e, "failed to create a rolling sum (time) UpdateByOperation.") from e
    else:
        try:
            cols = to_sequence(cols)
            return UpdateByOperation(j_updateby_op=_JUpdateByOperation.RollingSum(rev_ticks, fwd_ticks, *cols))
        except Exception as e:
            raise DHError(e, "failed to create a rolling sum (tick) UpdateByOperation.") from e

Copy link
Contributor

@lbooker42 lbooker42 Feb 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjbrosnan some extra testing in the ticks case will be needed. Time strings need to be rejected as they make no sense for row-based computations. You will get an exception from the call, so as long as the error is clear this looks good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is doable (we'll need to add checks for invalid argument combinations) but to me the extra cost and the potential cognitive burden on a user to understand the difference outweighs the gains.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also having ts_col as optional pretty much forces the users to use keyword arguments in the tick case, which may not be a bad thing but then it is not consistent with the time case.

@@ -267,3 +266,85 @@ def forward_fill(cols: Union[str, List[str]]) -> UpdateByOperation:
return UpdateByOperation(j_updateby_op=_JUpdateByOperation.Fill(cols))
except Exception as e:
raise DHError(e, "failed to create a forward fill UpdateByOperation.") from e


def rolling_sum_tick(cols: Union[str, List[str]], rev_ticks: int, fwd_ticks: int = 0) -> UpdateByOperation:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmao-denver thoughts on the argument ordering vs other methods in this module? e.g. did we do ema_tick_decay in the wrong order?

Copy link
Contributor Author

@jmao-denver jmao-denver Feb 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter ordering of ema_tick_decay/ema_time_decay matches the Java ones more closely, which is nice. But it is not something we can do with rolling sums. I have no objection to making them consistent if you feel strongly about it. Even though it is a breaking change, I doubt we'll affect users so much as our own docs/examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also would prefer a closer match with Java/Groovy for rolling_sum, but I think the argument is about optional parameter(s). If we want to harmonize, it makes as much sense to me to update the Java signatures and move these optional parameters to the end of the signature.

py/server/deephaven/updateby.py Outdated Show resolved Hide resolved
py/server/deephaven/updateby.py Outdated Show resolved Hide resolved
py/server/deephaven/updateby.py Outdated Show resolved Hide resolved
py/server/deephaven/updateby.py Outdated Show resolved Hide resolved
py/server/deephaven/updateby.py Show resolved Hide resolved
@jmao-denver jmao-denver merged commit 3bef77a into deephaven:main Feb 3, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Feb 3, 2023
@deephaven-internal
Copy link
Contributor

@jmao-denver jmao-denver deleted the 3388-wrap-RollingSum branch March 31, 2023 03:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Python wrapping for UpdateBy::RollingSum feature
5 participants