-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Expr architecture #2562
Comments
Hi @st-pasha, I think it would be helpful as well if a documentation could be created for code contributors , to know what the building blocks are, so to speak. That way when dabbling in and writing the code, I can sort of know which parts to use and what. Admittedly, my knowledge of C++ is limited, so maybe the documentation is clear and the challenge is with me. Also, with this new refactoring, is it advisable for code contributors to wait, till the refactoring is complete? |
Hi @samukweku, this new refactoring effort aims at simplifying code inside the |
- Created C++ class `py::FExpr`, which is equivalent to current pure-python class `datatable.expr.Expr`; - Created C++ class `py::Namespace`, replacing the current pure-python class `datatable.expr.FrameProxy`; - Class `dt::expr::Expr` renamed into `OldExpr`, in anticipation that it will be removed in the near future; - Added C++ class `dt::expr::FExpr`, which will eventually replace `dt::expr::Expr` (and `dt::expr::Head`), but currently serves as its base class; - Extended class `py::XObject` so that it now supports adding numeric methods (such as `__add__()`, `__mul__()`, etc) and comparison methods (`__lt__()`, `__eq__()`, etc); - The evaluation methods were altered so that the boolean flag `allow_new` is no longer necessary. The new `py::FExpr` classes is still "dormant" in the sense that it is not used by the library anywhere. Thus, the current functionality was not changed (mostly) just yet. Instead, I'm laying down a foundation where the new classes can gradually supplant the current python class `Expr` without causing any large-scale disturbances. WIP for #2562
New classes `FExpr_ColumnAsAttr` and `FExpr_ColumnAsArg` (derived directly from `FExpr`) replace the previous `Head_Func_Column`. With these changes the new FExpr-based internal API is now actually usable and can be used to implement any new functionality. WIP for #2562
Old `Expr`s corresponding to various python literals are hereby converted into the new `FExpr` format. WIP for #2562
Converted `ifelse()` function into the new `FExpr` format. WIP for #2562
- Implemented standard arithmetic operators such as `+`, `-`, `*`, `/`, `//`, `%` and `**` in the new FExpr API; - Added tentative documentation for `__add__` into the docs build; - Class py::FExpr renamed into `PyFExpr` and moved into the `dt::expr` namespace; - The old `datatable.expr.Expr` now uses the new `FExpr` approach; - `FExpr` objects can now be constructed from python. Overall there were more substantial changes when implementing the `FExpr__add__`, `FExpr__sub__`, etc. classes, compared to previous PRs. This is because our current implementation of binary operators is overly complicated, and I'm trying to simplify it by removing all the layers of indirection. WIP for #2562
FExpr implementation for `<`, `>`, `<=`, `>=`, `==` and `!=`. WIP for #2562
This is a tutorial-style detailed documentation, which replaces old `!readme.md` (whose information is now completely out-of-date). WIP for #2562
While #2562 is still WIP, we need to have proper a documentation for both `Expr` and `FExpr` input/return types. This PR fixes it.
While #2562 is still WIP, we need to have proper a documentation for both `Expr` and `FExpr` input/return types. This PR fixes it.
While #2562 is still WIP, we need to have proper a documentation for both `Expr` and `FExpr` input/return types. This PR fixes it.
While #2562 is still WIP, we need to have proper a documentation for both `Expr` and `FExpr` input/return types. This PR fixes it.
While #2562 is still WIP, we need to have proper a documentation for both `Expr` and `FExpr` input/return types. This PR fixes it.
- refactor existing unary `FExpr` reducers to inherit from `ReduceUnary_ColumnImpl`; - make `is_grouped` to be a template parameter. WIP for #2562
Our current approach for creating new
Expr
s is overly complicated, as outlined here: https://github.com/h2oai/datatable/blob/master/src/core/expr/!readme.md. This complexity stems mostly from the fact that theExpr
class which performs arithmetic onf-expressions
is defined in pure python, and then needs to be bridged into the C++ core.A more sane approach would be to define everything in C++, eliminating most of the "middle-man" code. In particular, the following architecture is proposed:
py::FExpr
to replace current pythonExpr
class;py::ColumnNamespace
to replace current pythonFrameProxy
class;dt::expr::FExpr
is a merged version of currentdt::expr::Expr
anddt::expr::Head
. The class is virtual, with the hierarchy following that of theHead
class;py::FExpr
contains ashared_ptr<dt::expr::FExpr>
;dt::expr::FExpr
class defines virtual methods for evaluation and reproing;Op
enum is removed.subtasks
py::XObject<C>
;py::FExpr
(which will eventually replace the pure-pythondatatable.expr.Expr
);dt::expr::FExpr
which is a backend forpy::FExpr
;FExpr
s can be used alongside old pure-pythonExpr
s;py::Namespace
to replace pure-pythondatatable.expr.FrameProxy
;OldExpr
-based functionality into FExprs:f.A
/f[0]
;f.extend()
;f.remove()
;shift()
;ifelse()
;cut()
;qcut()
;+
;-
;*
;/
;//
;%
;**
;&
;|
;^
;<<
;>>
;+
;-
;~
;<
;>
;<=
;>=
;==
;!=
;len()
re_match()
;mean
,min
,max
,stdev
,first
,last
,sum
,count
,count0
,median
,cov
,corr
;sin
,cos
,tan
,arcsin
,arccos
,arctan
,arctan2
,hypot
,deg2rad
,rad2deg
;sinh
,cosh
,tanh
,arsinh
,arcosh
,arcosh
;cbrt
,exp
,exp2
,expm1
,log
,log10
,log1p
,log2
,logaddexp
,logaddexp2
,pow
,sqrt
,square
;erf
,erfc
,gamma
,lgamma
;abs
,ceil
,copysign
,fabs
,floor
,frexp
,isclose
,isfinite
,isinf
,isna
,ldexp
,modf
,rint
,sign
,signbit
,trunc
;clip
,divmod
,fmod
,maximum
,minimum
;rowall
,rowany
,rowcount
,rowfirst
,rowlast
,rowmin
,rowmax
,rowmean
,rowsum
,rowsd
;py::Namespace
class;py::FExpr
class;datatable.expr.FrameProxy
;datatable.expr.Expr
;datatable.expr.OpCodes
;dt::expr::Op
enum;dt::expr::OldExpr
class;args_registry
.The text was updated successfully, but these errors were encountered: