Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug that failed to execute query when there are multiple arguments #2490

Merged
merged 5 commits into from
Sep 29, 2021
Merged

Fix bug that failed to execute query when there are multiple arguments #2490

merged 5 commits into from
Sep 29, 2021

Conversation

perfumescent
Copy link
Contributor

@perfumescent perfumescent commented Sep 28, 2021

What do these changes do?

Fix

[BUG] Failed to execute query when there are multiple arguments #2463

By supporting SERIES_TYPE for dataframe.base.eval.CollectionVisitor.visit()

Before

Bug happens when querying with more than two atomic condition expressions, as below

import numpy as np
import mars.dataframe as md
df = md.DataFrame({'a': np.random.rand(100),
                   'b': np.random.rand(100),
                   'c c': np.random.rand(100)})
df.query('a < 0.5 and a != 0.1 and b != 0.2').execute()



Traceback (most recent call last):
  File "/Users/wenjun.swj/Code/mars/mars/dataframe/base/eval.py", line 131, in visit
    visitor = getattr(self, method)
AttributeError: 'CollectionVisitor' object has no attribute 'visit_Series'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/Users/wenjun.swj/miniconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-f0b7eeac5829>", line 1, in <module>
    df.query('a < 0.5 and a != 0.1 and b != 0.2').execute()
  File "/Users/wenjun.swj/Code/mars/mars/dataframe/base/eval.py", line 773, in df_query
    predicate = mars_eval(expr, resolvers=(df,), level=level + 1, **kwargs)
  File "/Users/wenjun.swj/Code/mars/mars/dataframe/base/eval.py", line 507, in mars_eval
    result = visitor.eval(expr)
  File "/Users/wenjun.swj/Code/mars/mars/dataframe/base/eval.py", line 112, in eval
    return self.visit(node)
  File "/Users/wenjun.swj/Code/mars/mars/dataframe/base/eval.py", line 134, in visit
    return visitor(node)
  File "/Users/wenjun.swj/Code/mars/mars/dataframe/base/eval.py", line 141, in visit_Module
    result = self.visit(expr)
  File "/Users/wenjun.swj/Code/mars/mars/dataframe/base/eval.py", line 134, in visit
    return visitor(node)
  File "/Users/wenjun.swj/Code/mars/mars/dataframe/base/eval.py", line 145, in visit_Expr
    return self.visit(node.value)
  File "/Users/wenjun.swj/Code/mars/mars/dataframe/base/eval.py", line 134, in visit
    return visitor(node)
  File "/Users/wenjun.swj/Code/mars/mars/dataframe/base/eval.py", line 178, in visit_BoolOp
    return reduce(func, node.values)
  File "/Users/wenjun.swj/Code/mars/mars/dataframe/base/eval.py", line 177, in func
    return self.visit(binop)
  File "/Users/wenjun.swj/Code/mars/mars/dataframe/base/eval.py", line 134, in visit
    return visitor(node)
  File "/Users/wenjun.swj/Code/mars/mars/dataframe/base/eval.py", line 148, in visit_BinOp
    left = self.visit(node.left)
  File "/Users/wenjun.swj/Code/mars/mars/dataframe/base/eval.py", line 133, in visit
    raise SyntaxError('Query string contains unsupported syntax: {}'.format(node_name))
SyntaxError: Query string contains unsupported syntax: Series

The key reason is that dataframe.base.eval.CollectionVisitor.visit() does not support SERIES_TYPE

SERIES_TYPE =(Series, SeriesData)

Now

After adding type support, this bug is well addressed, as the below shows.

>>> import numpy as np
>>> import mars.dataframe as md
>>> df = md.DataFrame({'a': np.random.rand(100),'b': np.random.rand(100),'c c': np.random.rand(100)})
>>> df.query('a < 0.5 and a != 0.1 and b != 0.2').execute()
           a         b       c c
0   0.385509  0.441026  0.950278
1   0.412703  0.386704  0.002776
4   0.280908  0.098562  0.309283
5   0.164744  0.364552  0.292891
6   0.195790  0.944170  0.653790
8   0.322010  0.095338  0.163584
10  0.129747  0.162292  0.904699
11  0.351739  0.124027  0.098137
13  0.213469  0.912870  0.229358
15  0.480980  0.154703  0.330692
16  0.268900  0.084565  0.167768
20  0.144316  0.679124  0.544623
22  0.156903  0.077582  0.335868
24  0.488973  0.564780  0.878692
28  0.271576  0.978732  0.007744
31  0.446813  0.671235  0.103683
33  0.323263  0.358730  0.864071
34  0.123714  0.758012  0.974905
35  0.231321  0.042523  0.260384
36  0.161210  0.948433  0.569217
37  0.311590  0.338948  0.354738
39  0.312912  0.829889  0.446416
40  0.301717  0.018264  0.310472
42  0.315535  0.792631  0.202715
45  0.272704  0.192104  0.119337
46  0.032126  0.595038  0.380832
48  0.308186  0.788221  0.080091
49  0.266853  0.108976  0.492379
51  0.416537  0.585269  0.982781
59  0.368765  0.880367  0.554242
61  0.246360  0.109812  0.377478
62  0.183949  0.609077  0.890214
64  0.318250  0.512868  0.608051
65  0.459107  0.376621  0.253770
66  0.237597  0.379776  0.827282
68  0.402874  0.956666  0.441957
70  0.263144  0.901552  0.381242
72  0.218650  0.623446  0.773795
75  0.314948  0.181935  0.801919
76  0.214923  0.157466  0.493052
77  0.378646  0.562853  0.852832
79  0.074559  0.843526  0.936090
81  0.173659  0.872561  0.950733
82  0.340242  0.256600  0.014353
86  0.322746  0.987032  0.210265
89  0.391583  0.692540  0.583078
90  0.096801  0.466157  0.361595
91  0.241045  0.452441  0.174794
92  0.008451  0.075798  0.820568
93  0.021548  0.364346  0.880776
99  0.273657  0.143548  0.349023

Related issue number

Fixes #2463

@qinxuye qinxuye changed the title Fix [BUG] Failed to execute query when there are multiple arguments #2463 Fix bug Failed to execute query when there are multiple arguments #2463 Sep 28, 2021
@qinxuye qinxuye changed the title Fix bug Failed to execute query when there are multiple arguments #2463 Fix bug that failed to execute query when there are multiple arguments #2463 Sep 28, 2021
@hekaisheng
Copy link
Contributor

hekaisheng commented Sep 28, 2021

Could you add some cases in test_eval_query_execution(https://github.com/mars-project/mars/blob/master/mars/dataframe/base/tests/test_base_execution.py#L1831) to cover your code?

mars/dataframe/base/eval.py Outdated Show resolved Hide resolved
@wjsi wjsi changed the title Fix bug that failed to execute query when there are multiple arguments #2463 Fix bug that failed to execute query when there are multiple arguments Sep 28, 2021
@qinxuye
Copy link
Collaborator

qinxuye commented Sep 28, 2021

Could you add some cases in test_eval_query_execution(https://github.com/mars-project/mars/blob/master/mars/dataframe/base/tests/test_base_execution.py#L1831) to cover your code?

@perfumescent please refer to this to add a unit test, example in the description can be used.

@qinxuye qinxuye added mod: dataframe to be backported Indicate that the PR need to be backported to stable branch type: bug Something isn't working labels Sep 28, 2021
@qinxuye qinxuye added this to the v0.8.0b2 milestone Sep 28, 2021
@perfumescent
Copy link
Contributor Author

Please remove unused imports gioven hints in https://dev.azure.com/mars-project/mars/_build/results?buildId=1647&view=logs&jobId=435ca956-9126-505f-566f-fff31072e2ba&j=435ca956-9126-505f-566f-fff31072e2ba&t=456eced4-5f9f-56bd-1d85-873f530e72f3.

Done. I didn't realize when i pushed unfinished code to my forked repo, the commit would directly go into this pr.

Copy link
Member

@wjsi wjsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@qinxuye qinxuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

hekaisheng pushed a commit that referenced this pull request Sep 30, 2021
…le arguments (#2490) (#2491)

* Fix wrong translation in cluster deployment. (#2489)

* Fix bug that failed to execute query when there are multiple arguments (#2490)

Co-authored-by: Alexander Yi <31856209+perfumescent@users.noreply.github.com>
@hekaisheng hekaisheng added backported already PR has been backported and removed to be backported Indicate that the PR need to be backported to stable branch labels Sep 30, 2021
chaokunyang added a commit to chaokunyang/mars that referenced this pull request May 31, 2022
Merge branch merge_github_2524 of git@gitlab.alipay-inc.com:ray-project/mars.git into master
https://code.alipay.com/ray-project/mars/pull_requests/58?tab=diff

Signed-off-by: 捕牛 <hejialing.hjl@antgroup.com>


* [Ray] Support reconstructing worker (mars-project#2413)


* Make cmdline support third party modules (mars-project#2454)

Co-authored-by: hanguang <zhusiyuan.zsy@alibaba-inc.com>
* Support visualizing subtask graphs on Mars Web (mars-project#2426)


* Fix timeout error when waiting for a submitted task (mars-project#2457)


* Print the error message when error happens in `TaskProcessor` (mars-project#2458)


* Add nightly builds for docker images (mars-project#2456)


* Fix misuse of `name` parameter in DataFrame align (mars-project#2469)


* Fix hang when start sub pool fails (mars-project#2468)


* Refine and unify subtask detail APIs (mars-project#2465)


* Fix coverage for Azure pipeline (mars-project#2474)


* Split tileable information and subtask graph into two tabs (mars-project#2480)


* Support specified vineyard socket and skip the launching vineyardd process (mars-project#2481)


* Basic reschedule subtask (mars-project#2467)


* Compatible with scikit-learn 1.0 (mars-project#2486)

Co-authored-by: hekaisheng <kaisheng.hks@alibaba-inc.com>
* Fix wrong translation in cluster deployment. (mars-project#2489)


* Fix bug that failed to execute query when there are multiple arguments (mars-project#2490)


* Include tileable property in detail api (mars-project#2493)


* Fix version of statsmodels to pass CI (mars-project#2497)


* Implements `glm.LogisticRegression` (mars-project#2466)


* Implements bagging sampling (mars-project#2496)


* Refine MarsDMatrix & support more parameters for XGB classifier and regressor (mars-project#2498)


* Fix output of df.groupby(as_index=False).size() (mars-project#2507)


* Add preliminary implementations for ufunc methods (mars-project#2510)


* Add doc for reading csv in oss (mars-project#2514)


* [Ray] Fix serializing lambdas in web (mars-project#2512)


* Add `make_regression` support for learn module (mars-project#2515)


* Fix reduction result on empty series (mars-project#2520)


* Fix df.loc when df is empty (mars-project#2524)


* fix start subpool

* fix test_kill_and_wait_timeout

* fix autoscale timeout

* fix ray larger clsuter fixture

* Update ci ray package to 1.2.2

* remove python3.6 3.8 .39 ut and upgrade ray 3.7 image

* echo python path

* fix json decode error

* fix bundle release timeout

* fix remove placement group timeout

* fix no_restart

* fix ci

* fix autoscale
chaokunyang added a commit to chaokunyang/mars that referenced this pull request May 31, 2022
Merge branch merge_github_2524 of git@gitlab.alipay-inc.com:ray-project/mars.git into master
https://code.alipay.com/ray-project/mars/pull_requests/58?tab=diff

Signed-off-by: 捕牛 <hejialing.hjl@antgroup.com>

* [Ray] Support reconstructing worker (mars-project#2413)

* Make cmdline support third party modules (mars-project#2454)

Co-authored-by: hanguang <zhusiyuan.zsy@alibaba-inc.com>
* Support visualizing subtask graphs on Mars Web (mars-project#2426)

* Fix timeout error when waiting for a submitted task (mars-project#2457)

* Print the error message when error happens in `TaskProcessor` (mars-project#2458)

* Add nightly builds for docker images (mars-project#2456)

* Fix misuse of `name` parameter in DataFrame align (mars-project#2469)

* Fix hang when start sub pool fails (mars-project#2468)

* Refine and unify subtask detail APIs (mars-project#2465)

* Fix coverage for Azure pipeline (mars-project#2474)

* Split tileable information and subtask graph into two tabs (mars-project#2480)

* Support specified vineyard socket and skip the launching vineyardd process (mars-project#2481)

* Basic reschedule subtask (mars-project#2467)

* Compatible with scikit-learn 1.0 (mars-project#2486)

Co-authored-by: hekaisheng <kaisheng.hks@alibaba-inc.com>
* Fix wrong translation in cluster deployment. (mars-project#2489)

* Fix bug that failed to execute query when there are multiple arguments (mars-project#2490)

* Include tileable property in detail api (mars-project#2493)

* Fix version of statsmodels to pass CI (mars-project#2497)

* Implements `glm.LogisticRegression` (mars-project#2466)

* Implements bagging sampling (mars-project#2496)

* Refine MarsDMatrix & support more parameters for XGB classifier and regressor (mars-project#2498)

* Fix output of df.groupby(as_index=False).size() (mars-project#2507)

* Add preliminary implementations for ufunc methods (mars-project#2510)

* Add doc for reading csv in oss (mars-project#2514)

* [Ray] Fix serializing lambdas in web (mars-project#2512)

* Add `make_regression` support for learn module (mars-project#2515)

* Fix reduction result on empty series (mars-project#2520)

* Fix df.loc when df is empty (mars-project#2524)

* fix start subpool

* fix test_kill_and_wait_timeout

* fix autoscale timeout

* fix ray larger clsuter fixture

* Update ci ray package to 1.2.2

* remove python3.6 3.8 .39 ut and upgrade ray 3.7 image

* echo python path

* fix json decode error

* fix bundle release timeout

* fix remove placement group timeout

* fix no_restart

* fix ci

* fix autoscale
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported already PR has been backported mod: dataframe type: bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Failed to execute query when there are multiple arguments
4 participants