Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_apply_agg_function uses to_pandas functions that are not allowed with omnisci backend #2328

Closed
gshimansky opened this issue Oct 28, 2020 · 4 comments
Assignees
Labels
bug 🦗 Something isn't working

Comments

@gshimansky
Copy link
Collaborator

gshimansky commented Oct 28, 2020

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):

Ubuntu 20.04.1 LTS

  • Modin version (modin.__version__):

0.8.1.1+41.ge2f628c

  • Python version:

Python 3.7.8

  • Code we can use to reproduce:
diff --git a/modin/experimental/engines/omnisci_on_ray/test/test_dataframe.py b/modin/experimental/engines/omnisci_on_ray/test/test_dataframe.py
index 22a471f..9445e6a 100644
--- a/modin/experimental/engines/omnisci_on_ray/test/test_dataframe.py
+++ b/modin/experimental/engines/omnisci_on_ray/test/test_dataframe.py
@@ -547,6 +547,14 @@ class TestGroupby:

         run_and_compare(groupby_count, data=self.data, cols=cols, as_index=as_index)

+    @pytest.mark.parametrize("cols", cols_value)
+    @pytest.mark.parametrize("as_index", bool_arg_values)
+    def test_groupby_mean(self, cols, as_index):
+        def groupby_mean(df, cols, as_index, **kwargs):
+            return df.groupby(cols, as_index=as_index).mean()
+
+        run_and_compare(groupby_mean, data=self.data, cols=cols, as_index=as_index)
+
     @pytest.mark.parametrize("cols", cols_value)
     @pytest.mark.parametrize("as_index", bool_arg_values)
     def test_groupby_proj_sum(self, cols, as_index):

Describe the problem

Test above calls groupby aggregation where function _apply_agg_function is used. This function calls self._default_to_pandas and self._by.to_pandas().squeeze() and call to _default_to_pandas and to_pandas breaks omnisci backend with RuntimeError: unexpected to_pandas triggered on lazy frame:

Trackback for to_pandas
Traceback (most recent call last):
  File "/localdisk/gashiman/modin/modin/experimental/engines/omnisci_on_ray/test/test_dataframe.py", line 556, in test_groupby_mean
    run_and_compare(groupby_mean, data=self.data, cols=cols, as_index=as_index)
  File "/localdisk/gashiman/modin/modin/experimental/engines/omnisci_on_ray/test/test_dataframe.py", line 104, in run_and_compare
    **kwargs
  File "/localdisk/gashiman/modin/modin/experimental/engines/omnisci_on_ray/test/test_dataframe.py", line 70, in run_modin
    exp_res = fn(lib=pd, **kwargs)
  File "/localdisk/gashiman/modin/modin/experimental/engines/omnisci_on_ray/test/test_dataframe.py", line 554, in groupby_mean
    return df.groupby(cols, as_index=as_index).mean()
  File "/localdisk/gashiman/modin/modin/pandas/groupby.py", line 131, in mean
    return self._apply_agg_function(lambda df: df.mean(*args, **kwargs))
  File "/localdisk/gashiman/modin/modin/pandas/groupby.py", line 842, in _apply_agg_function
    by = self._by.to_pandas().squeeze()
  File "/localdisk/gashiman/modin/modin/experimental/backends/omnisci/query_compiler.py", line 92, in to_pandas
    return self._modin_frame.to_pandas()
  File "/localdisk/gashiman/modin/modin/experimental/engines/omnisci_on_ray/frame/data.py", line 1239, in to_pandas
    self._execute()
  File "/localdisk/gashiman/modin/modin/experimental/engines/omnisci_on_ray/frame/data.py", line 991, in _execute
    raise RuntimeError("unexpected execution triggered on lazy frame")
RuntimeError: unexpected execution triggered on lazy frame
Trackback for _default_to_pandas
Traceback (most recent call last):
  File "/localdisk/gashiman/modin/modin/experimental/engines/omnisci_on_ray/test/test_dataframe.py", line 556, in test_groupby_mean
    run_and_compare(groupby_mean, data=self.data, cols=cols, as_index=as_index)
  File "/localdisk/gashiman/modin/modin/experimental/engines/omnisci_on_ray/test/test_dataframe.py", line 104, in run_and_compare
    **kwargs
  File "/localdisk/gashiman/modin/modin/experimental/engines/omnisci_on_ray/test/test_dataframe.py", line 70, in run_modin
    exp_res = fn(lib=pd, **kwargs)
  File "/localdisk/gashiman/modin/modin/experimental/engines/omnisci_on_ray/test/test_dataframe.py", line 554, in groupby_mean
    return df.groupby(cols, as_index=as_index).mean()
  File "/localdisk/gashiman/modin/modin/pandas/groupby.py", line 131, in mean
    return self._apply_agg_function(lambda df: df.mean(*args, **kwargs))
  File "/localdisk/gashiman/modin/modin/pandas/groupby.py", line 839, in _apply_agg_function
    return self._default_to_pandas(f, *args, **kwargs)
  File "/localdisk/gashiman/modin/modin/pandas/groupby.py", line 902, in _default_to_pandas
    return self._df._default_to_pandas(groupby_on_multiple_columns, *args, **kwargs)
  File "/localdisk/gashiman/modin/modin/pandas/base.py", line 369, in _default_to_pandas
    pandas_obj = self._to_pandas()
  File "/localdisk/gashiman/modin/modin/pandas/dataframe.py", line 2254, in _to_pandas
    return self._query_compiler.to_pandas()
  File "/localdisk/gashiman/modin/modin/experimental/backends/omnisci/query_compiler.py", line 92, in to_pandas
    return self._modin_frame.to_pandas()
  File "/localdisk/gashiman/modin/modin/experimental/engines/omnisci_on_ray/frame/data.py", line 1242, in to_pandas
    raise RuntimeError("unexpected to_pandas triggered on lazy frame")
RuntimeError: unexpected to_pandas triggered on lazy frame
@gshimansky gshimansky added the bug 🦗 Something isn't working label Oct 28, 2020
@gshimansky gshimansky self-assigned this Oct 28, 2020
@gshimansky gshimansky changed the title _apply_agg_function uses to_pandas function that is not allowed with omnisci backend _apply_agg_function uses to_pandas functions that are not allowed with omnisci backend Oct 28, 2020
@gshimansky
Copy link
Collaborator Author

This bug is related to #2269 because _default_to_pandas should be moved into backend. Additionally to_pandas should be moved there too.

@gshimansky
Copy link
Collaborator Author

Bug #2269 has been fixed, but this issue should remain open while test in bug still fails. It now fails because a lambda function is passed into omnisci backend groupby_agg. It should be fixed in #2317. But then test fails because index name cannot be assigned because of #2363.

@pyrito
Copy link
Collaborator

pyrito commented Aug 22, 2022

@gshimansky what's the status of this issue? What would the priority of this be ?

@gshimansky
Copy link
Collaborator Author

I think it is safe to close this. The problems described here apply to code that has changed since then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants