Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support remove_blink to remove blink table semantics in server-side Python #5958

Merged
merged 4 commits into from
Sep 4, 2024

Conversation

alexpeters1208
Copy link
Contributor

removeBlink is used by the Java API to disable the specialized aggregation semantics that blink tables implement. This wrapper adds the functionality to disable those semantics from the Python API.

except Exception as e:
raise DHError(e, "failed to remove blink table semantics.") from e
else:
raise RuntimeError("Table is not a blink table, so blink table semantics cannot be removed.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this:

  1. Error (like here)?
  2. return self?

I'm not sure which provides the most natural behavior in the wild or what is most consistent with other API operations we have. I might guess 2 is more consistent, but I'm not sure.
@rcaudy @jmao-denver @rbasralian

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should just return self.j_table.removeBlink() without 'thinking about it' (i.e. remove the if and lean on the behavior of the Java implementation, which the javadocs say happens to be option 2: @return A non-blink child table, or this table if it is not a blink table)

@jmao-denver
Copy link
Contributor

My understanding of a blink-table is a refreshing one that discards all the previous-cycle-added rows in the current cycle. Does remove_blink change this behavior? If not, then the name is 'misleading'.

@alexpeters1208
Copy link
Contributor Author

alexpeters1208 commented Aug 22, 2024

I spoke with @jmao-denver at length about this naming / behavior, and I want to summarize that discussion here.

It seems that there are three possible competing definitions of "blink table":

  1. t is blink table if and only if t obeys blink aggregation semantics
    Per this definition, if t does not obey blink aggregation semantics, it is not a blink table. So, calling remove_blink to disable those semantics means that the result is not a blink table, even though it appears to still be blinking.

  2. t is a blink table if and only if t obeys blink aggregation semantics and t dispenses of all of its existing rows every update cycle
    Per this definition, if t fails to satisfy either property, it is not a blink table. So again, calling remove_blink results in a non-blink table.

  3. t is a blink table if and only if t dispenses of all of its existing rows every update cycle
    Per this definition, calling remove_blink results in another blink table without specialized aggregation semantics. Any table that dispenses of all of its rows every update cycle is a blink table, regardless of how it aggregates.

Now, here's the difficulty / confusion:
The existing code and naming suggests that the engine is employing the first definition. IE, calling removeBlink on the underlying Java table sets the isBlink attribute to false. Anyone reading this code would conclude that the result is not a blink table.

I would argue that the third definition is the most intuitive. It seems that the key property of a "blink" table would be the "blinking" property, not how such a table handles aggregations. In fact, I didn't know about these specialized aggregation semantics until very recently.

@jmao-denver called me to discuss confusion over this, and we agreed that while definitions 2 or 3 might make sense, defintion 1 does not. One possible partial solution is to consider renaming remove_blink to something like standard_aggs, no_mem, no_agg, remove_blink_aggs, no_blink_agg, etc. This does not do anything to address the apparent tension in definitions of "blink", but maybe if we don't expose an API that uses those words, we don't have to talk about that.

Comment on lines 819 to 820
def remove_blink(self) -> Table:
"""Returns a new version of this table without specialized blink table aggregation semantics."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to document what happens if this table is not a blink table?

@chipkent
Copy link
Member

There is a lot of discussion above, but it seems to be overkill. The Javadocs are fairly straightforward:
https://deephaven.io/core/javadoc/io/deephaven/engine/table/Table.html#removeBlink()

If this table is a blink table, i.e. it has [BLINK_TABLE_ATTRIBUTE](https://deephaven.io/core/javadoc/io/deephaven/engine/table/Table.html#BLINK_TABLE_ATTRIBUTE) set to true, return a child without the attribute, restoring standard semantics for aggregation operations.
Returns:
A non-blink child table, or this table if it is not a blink table

Is there any reason to make this PR more complex than the javadocs?

@alexpeters1208 alexpeters1208 merged commit ea0dcb2 into deephaven:main Sep 4, 2024
16 checks passed
@alexpeters1208 alexpeters1208 deleted the add-remove-blink branch September 4, 2024 19:50
@github-actions github-actions bot locked and limited conversation to collaborators Sep 4, 2024
@deephaven-internal
Copy link
Contributor

Labels indicate documentation is required. Issues for documentation have been opened:

Community: deephaven/deephaven-docs-community#303

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants