-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add delimiter option for reading CSV files for Feathr #307
Conversation
Signed-off-by: Chang Yong Lik <theeahlag@gmail.com>
2b1c0a6
to
ddb5643
Compare
c764d1e
to
36fe0b3
Compare
Signed-off-by: Chang Yong Lik <theeahlag@gmail.com> (cherry picked from commit bc71fad93c08f6d06e40f7e289456c6a1b4d45e0)
Signed-off-by: Chang Yong Lik <theeahlag@gmail.com>
Signed-off-by: Chang Yong Lik <theeahlag@gmail.com>
Signed-off-by: Chang Yong Lik <theeahlag@gmail.com>
Signed-off-by: Chang Yong Lik <theeahlag@gmail.com>
Signed-off-by: Chang Yong Lik <theeahlag@gmail.com>
@ahlag thanks for the contribution! This PR looks good to me, but I'm not sure why the test fails. Spent a bit time to investigate and feel it might be caused by the newly added tests? |
@xiaoyongzhu
|
Talked with @ahlag offline and asked him to run |
Signed-off-by: changyonglik <theeahlag@gmail.com>
@xiaoyongzhu TestFileFormat.scala val sqlContext = ss.sqlContext
sqlContext.setConf("spark.feathr.inputFormat.csvOptions.sep", "\t") FileFormat.scala val csvDelimiterOption = ss.sparkContext.getConf.get("spark.feathr.inputFormat.csvOptions.sep", ",") |
Hmm it's a bit weird. Is it possible to force set the delimiters? |
Look like there can only be one SparkContext
|
I think I will try a new approach. Since an existing SparkContext cannot be edited nor can another one be created, I will test it with e2e by passing the config from the client. |
I did some research and found this answer:
Maybe you can try sqlContext.sql? |
Ok, I'll give this a shot |
@xiaoyongzhu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also update wiki that we support tsv.
src/main/scala/com/linkedin/feathr/offline/source/dataloader/BatchDataLoader.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/linkedin/feathr/offline/source/dataloader/BatchDataLoader.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/linkedin/feathr/offline/source/dataloader/BatchDataLoader.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/linkedin/feathr/offline/source/dataloader/BatchDataLoader.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/linkedin/feathr/offline/source/dataloader/hdfs/FileFormat.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/linkedin/feathr/offline/source/dataloader/hdfs/FileFormat.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/linkedin/feathr/offline/source/dataloader/hdfs/FileFormat.scala
Show resolved
Hide resolved
src/test/scala/com/linkedin/feathr/offline/util/TestSourceUtils.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/linkedin/feathr/offline/util/SourceUtils.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: changyonglik <theeahlag@gmail.com>
Signed-off-by: changyonglik <theeahlag@gmail.com>
Signed-off-by: changyonglik <theeahlag@gmail.com>
880ea90
to
3ec3fdf
Compare
Signed-off-by: changyonglik <theeahlag@gmail.com>
3842194
to
7711253
Compare
Signed-off-by: changyonglik <theeahlag@gmail.com>
062d916
to
2d41211
Compare
Signed-off-by: changyonglik <theeahlag@gmail.com>
f2ecf06
to
0b01cc6
Compare
Signed-off-by: changyonglik <theeahlag@gmail.com>
78b3cbd
to
b4e55fb
Compare
@xiaoyongzhu @hangfei |
@ahlag would you mind merge latest main and resolve the conflicts so that we can get this merged, thanks for your time! |
@xiaoyongzhu @blrchen |
* Added documentation Signed-off-by: Chang Yong Lik <theeahlag@gmail.com> * Added delimiter to CSVLoader Signed-off-by: Chang Yong Lik <theeahlag@gmail.com> (cherry picked from commit bc71fad93c08f6d06e40f7e289456c6a1b4d45e0) * Added delimiter to BatchDataLoader, FileFormat and SourceUtils Signed-off-by: Chang Yong Lik <theeahlag@gmail.com> * Added test case for BatchDataLoader Signed-off-by: Chang Yong Lik <theeahlag@gmail.com> * Added test case for FileFormat Signed-off-by: Chang Yong Lik <theeahlag@gmail.com> * Added test case for BatchDataLoader Signed-off-by: Chang Yong Lik <theeahlag@gmail.com> * Added test case and fixed indent Signed-off-by: Chang Yong Lik <theeahlag@gmail.com> * Passing failure Signed-off-by: changyonglik <theeahlag@gmail.com> * Removed unused imports from BatchDataLoader Signed-off-by: changyonglik <theeahlag@gmail.com> * Fixed test failures Signed-off-by: changyonglik <theeahlag@gmail.com> * Added release version Signed-off-by: changyonglik <theeahlag@gmail.com> * Removed tailing space Signed-off-by: changyonglik <theeahlag@gmail.com> * Removed wildcard imports Signed-off-by: changyonglik <theeahlag@gmail.com> * Paraphrased comments and docstring Signed-off-by: changyonglik <theeahlag@gmail.com> * Added DelimiterUtils Signed-off-by: changyonglik <theeahlag@gmail.com> * Refactored utils Signed-off-by: changyonglik <theeahlag@gmail.com> * Updated wiki to support both tsv and csv Signed-off-by: changyonglik <theeahlag@gmail.com> * Fixed spelling error Signed-off-by: changyonglik <theeahlag@gmail.com> * trigger GitHub actions Signed-off-by: Chang Yong Lik <theeahlag@gmail.com> Signed-off-by: changyonglik <theeahlag@gmail.com>
Signed-off-by: Chang Yong Lik theeahlag@gmail.com
Description
Resolves #241
How was this patch tested?
Tested locally with
sbt
Progress Tracker
Does this PR introduce any user-facing changes?
Allows users to specify delimiters