-
Notifications
You must be signed in to change notification settings - Fork 310
ISSUE #96: Specifying a read schema #109
ISSUE #96: Specifying a read schema #109
Conversation
Current coverage is
|
I'm splitting the build changes into a separate PR, #112, so you may have to rebase this. |
faa69db
to
5a789c4
Compare
ok, rebased. |
We anticipate this change as it would enable Spark DataFrame schema evolution using Avro. It would enable my scenario of reading Parquet defined with Avro schema. val avroSchema = new Schema.Parser().parse(new File("avsc/user.avsc"))
val df = sqlContext.read.schema(avroSchema).format("parquet").load(path) |
When is this PR to be merged or is it abandoned? thanks |
|
||
/** | ||
* Adds a method, `schema`, to DataFrameReader that allows you to specific avro schema | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a test (including scenario with unknown columns)? Thanks for working on this! |
@vlyubin I will fix the code style and add a test. |
5a789c4
to
1558a75
Compare
@bottleimp I just made PR regarding the "try-catch" codes. Hope it helps! |
replace try-exception in buildScan
What's the status of this guy? Really important feature for those of needing schema evolution and who have run into the stackoverflow problem when reading recursive schemas! |
@pauldwolfe, you may check #95 to see if it meets your need. |
This looks like the best solution to this issue, and reading multiple avros with evolved schemas is very important to me. What is the consensus here? |
Just following up to say that this PR has met my schema evolution needs very well, but I'm resistant to utilizing a forked repo in production. Shall we rebase? |
@jamesmatanle, I'll do a rebate today and check what I can do next. |
My PR is outdated, with #155 , I think I would just close it. |
Addressing issue #96 .
My solution is, err, a bit violent, all unknown columns were set to
null
value instead throw a exception, hope some one can improve it. Right now it works for me. Here's the test example: