-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read and writable partitioned sources #969
Conversation
TypeDelimited sources.
} | ||
} | ||
|
||
// Create the underlying scrooge-parquet scheme and explicitly set the sink fields to be only the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like these comments are out of sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
This looks great! Thanks for sharing this. Once we have tests and address some minor issues we will merge. |
I have added some tests for writing. I'm not sure how to set up tests for reading though. I don't really know how the JobTest works for reading. Does it actually create files on disk? |
I think you will want to use hadoop-platform test: This actually spins up a minicluster and behaves very closely to hadoop. There are methods in HadoopPlatformJobTest to initialize the sources (and feel free to add one of two if you think it is needed). By default, JobTest does mocking of the sources and sinks, so it is only testing job logic. |
Actually, your test looks great. Merging this. Thanks a ton! |
Read and writable partitioned sources
This is an initial implementation of partitioned versions of the TypeDelimited and TextLine sources.
I'll add the tests next.