-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating to spark 3.0.1 and hadoop 3.2.1 #141
Conversation
6aead20
to
5a31e40
Compare
5a31e40
to
3e3a2b7
Compare
I would really like to, depends mostly on how soon AWS EMR and other cloud provider support for Spark 3 shows up. Then as far as this particular issue goes, I will be updating all our Spark 3 related pull requests to use the 3.0 release version this week. I expect to run into other runtime issues, will investigate this along with everything else I find. |
@heuermh Have you gotten a chance to take a look at this at all? |
Thanks for the ping! Yeah, we have released ADAM and downstream cross-building with Scala 2.12 and Spark 3. For Disq, going forward I would be fine with only releasing against Spark 3. I have not had a chance to investigate this issue specifically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lbergelson One comment, otherwise looks good to me
throws IOException { | ||
final FileSystem fileSystem = p.getFileSystem(conf); | ||
if (fileSystem instanceof LocalFileSystem) { | ||
return ((LocalFileSystem) fileSystem).getRawFileSystem(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explaining this special casing of LocalFileSystem
(or, if we can't explain it, at least provide a comment with a reference for where the fix came from).
From the travis failure log
|
Woops! I always forget to run the linter locally. |
I'm going to merge this. If we ever understand it better we should revisit it... |
Thank you, @lbergelson! |
Fixes #130, fixes #142
@heuermh Can you weigh in on this? I needed to make a weird change to use the RawLocalFileSystem in order to avoid a checksum issue. I'm not sure why we're getting the checksum failure though. I suspect it's not getting recomputed correctly after some operation but I don't know why.
If I don't force it to use the raw filesystem we get
There may be other mechanisms to avoid this check. A better solution would be to make the check pass but I'm not sure why it's failing in the first place.
Would you support dropping support for spark 2 and scala 11? I'm in favor because it makes my life easier but I'm not sure what versions you need to support.
This would close #130
@tomwhite If you happen to have any insight into the checksum thing it might be valuable. I believe we had a similar issue in the hadoop-bam days but it went away in disq.