Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python wrapper for MultiJoin feature #4356

Merged
merged 9 commits into from
Sep 1, 2023

Conversation

lbooker42
Copy link
Contributor

@lbooker42 lbooker42 commented Aug 22, 2023

The call syntax is the following:

from deephaven.table import MultiJoinInput, MultiJoinTable, multi_join

# complex join
mj_input = [
    MultiJoinInput(table=t1, on="key"), # all columns added
    MultiJoinInput(table=t2, on="key=otherKey", joins=["col1", "col2"]), # specific columns added
]
multitable = multi_join(input=mj_input)

#simple joins
multitable = multi_join(tables=[t1,t2], on="common_key") # all columns from t1,t2 included in output
multitable = multi_join(tables=[t1,t2], on=["common_key1", "common_key2"]) # all columns from t1,t2 included

Closes #4280

py/server/deephaven/jcompat.py Outdated Show resolved Hide resolved
py/server/deephaven/jcompat.py Show resolved Hide resolved
py/server/deephaven/multijoin.py Outdated Show resolved Hide resolved
py/server/deephaven/multijoin.py Outdated Show resolved Hide resolved
py/server/deephaven/multijoin.py Outdated Show resolved Hide resolved
py/server/deephaven/multijoin.py Outdated Show resolved Hide resolved
py/server/deephaven/multijoin.py Outdated Show resolved Hide resolved
py/server/tests/test_multijoin.py Outdated Show resolved Hide resolved
py/server/deephaven/multijoin.py Outdated Show resolved Hide resolved
py/server/tests/test_multijoin.py Show resolved Hide resolved
py/server/tests/test_multijoin.py Outdated Show resolved Hide resolved
@@ -0,0 +1,126 @@
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question!

  1. Based on the precedents of PartitionedTable, TreeTable, RollupTable, it does seem to make sense to place MultiJoinTable in table.py
  2. I am not sure the level of 'table-ness' MultiJoinTable is. My guess is that it is at the same level as the above mentioned ones. The only difference is that those are created by methods on Table, not unbound functions. In a way, it is more similar to table_factory.merge().
  3. One of the reasons we put PartitionedTable etc. in table.py is to avoid cyclical imports. MultiJoinTable doesn't seem to have this problem.

Although a bit torn, my personal preference is for consistency by having MultiJoinTable in table.py. Maybe we can move the function multi_join' to table_factory`?

py/server/deephaven/multijoin.py Outdated Show resolved Hide resolved
py/server/deephaven/multijoin.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jmao-denver jmao-denver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move multi_join to table.py

@lbooker42 lbooker42 changed the title Initial commit of python wrapper for MultiJoin. Python wrapper for MultiJoin feature Aug 25, 2023
Co-authored-by: Jianfeng Mao <4297243+jmao-denver@users.noreply.github.com>
jmao-denver
jmao-denver previously approved these changes Aug 25, 2023
Copy link
Contributor

@jmao-denver jmao-denver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

py/server/deephaven/table.py Show resolved Hide resolved
py/server/deephaven/table.py Outdated Show resolved Hide resolved
py/server/deephaven/table.py Outdated Show resolved Hide resolved
MultiJoinTable: the result of the multi-table natural join operation. To access the underlying Table, use the
table() method.
"""
return MultiJoinTable(input, on)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General API question. I don't know the answer, but I'm asking it....

Is there any reason the user would need to know about MultiJoinInputTable? Right now, the query syntax looks like:

t = multi_join([t1,t2,t3]).table()

Any reason it shouldn't just be

t = multi_join([t1,t2,t3])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but only because in the future we intend to allow addition (and potentially removal) of tables from this join.

mt = multi_join([t1,t2,t3]);
mt_new = mt.add_table(t4) # returns Immutable MultiJoinTable

@lbooker42 lbooker42 merged commit 5b8dee9 into deephaven:main Sep 1, 2023
10 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Sep 1, 2023
@deephaven-internal
Copy link
Contributor

Labels indicate documentation is required. Issues for documentation have been opened:

How-to: https://github.com/deephaven/deephaven.io/issues/3127
Reference: https://github.com/deephaven/deephaven.io/issues/3128

@lbooker42 lbooker42 deleted the lab-mj-python branch June 26, 2024 19:58
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python support for MultiJoin feature
4 participants