-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT-#3603: add experimental read_custom_text
function that can read custom line-by-line text files
#3441
FEAT-#3603: add experimental read_custom_text
function that can read custom line-by-line text files
#3441
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3441 +/- ##
==========================================
+ Coverage 86.96% 89.97% +3.01%
==========================================
Files 207 208 +1
Lines 17049 17095 +46
==========================================
+ Hits 14827 15382 +555
+ Misses 2222 1713 -509
Continue to review full report at Codecov.
|
42cf3ae
to
9acee0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really interesting, what problem is it solving?
This implementation allows us to work with the source data formats that are used in DIEN workload, in a broader sense - any text line-by-line format that does not fit standard parsers. P.S. added tests showing how this function works. |
383d52c
to
d211d9a
Compare
d211d9a
to
4521a98
Compare
4521a98
to
e98afd7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anmyachev Will this be delayed to 0.13.0?
I don't think we should delay it, my understanding is that #3540 being merged makes it much easier to implement custom parser now. |
I restarted CI, please let me know when it is ready for review! |
e98afd7
to
6c69160
Compare
Why cannot it re-use the same mechanism |
6c69160
to
cd88814
Compare
…ustom libe-by-line text files Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
d409c97
to
9c24116
Compare
@@ -152,7 +152,7 @@ def generic_parse(fname, **kwargs): | |||
num_splits = kwargs.pop("num_splits", None) | |||
start = kwargs.pop("start", None) | |||
end = kwargs.pop("end", None) | |||
header_size = kwargs.pop("header_size", None) | |||
header_size = kwargs.pop("header_size", 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise, we are getting TypeError: 'NoneType' object cannot be interpreted as an integer
error
f452341
to
ef8e4cb
Compare
Co-authored-by: Vasily Litvinov <vasilij.n.litvinov@intel.com>
08e723f
to
b3a5395
Compare
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall much cleaner, thanks
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
@@ -955,7 +951,7 @@ def _read(cls, filepath_or_buffer, **kwargs): | |||
new_query_compiler : BaseQueryCompiler | |||
Query compiler with imported data for further processing. | |||
""" | |||
filepath_or_buffer_md = ( | |||
filepath = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same about filepath_or_buffer_md
variable.
read_custom_text
function that can read custom line-by-line text files
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
12a16b8
to
5b9b591
Compare
@vnlitvinov ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@vnlitvinov ready to merge |
"PickleExperimentalDispatcher", | ||
"CustomTextExperimentalDispatcher", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anmyachev, why do we put experimental features in the base core instead of experimental one?
from modin.config import NPartitions | ||
|
||
|
||
class CustomTextExperimentalDispatcher(TextFileDispatcher): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am in favor of ExperimentalCustomTextDispatcher. The use of Experimental in the middle of a name looks weird to me.
What do these changes do?
flake8 modin
black --check modin
git commit -s
docs/developer/architecture.rst
is up-to-date