-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix compile error for hadoop CDH 4.4+ #151
Conversation
Using a macro, we work round the difference between hadoop 2.0-alpha and 2.1-beta api, and fix the compilation error when set SPARK_HADOOP_VERSION to 2.0.0-cdh4.4.0. That is, the yarn-alpha project should work with hadoop CDH 4.4+ and later.
Can one of the admins verify this patch? |
@@ -736,7 +736,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])( | |||
} | |||
|
|||
object JavaPairDStream { | |||
implicit def fromPairDStream[K: ClassTag, V: ClassTag](dstream: DStream[(K, V)]) = { | |||
implicit def fromPairDStream[K: ClassTag, V: ClassTag](dstream: DStream[(K, V)]): JavaPairDStream[K, V] = { | |||
new JavaPairDStream[K, V](dstream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I will move it to another pr.
can you please file a jira for this. You can't just use the "yarn" profile for this (instead of yarn-alpha)? @sryza can you confirm? |
@tgravescs FWIW I think something like this is the case, yes. The change happened in https://issues.apache.org/jira/browse/YARN-396 which occurred for the first(?) YARN TLP release with Hadoop 2.1. And CDH 4.4 was the first release I see that picked up this change. I assume it was useful/necessary to float this 'alpha' API ahead. I also would have thought it's possible the Otherwise yeah it looks like a question of supporting another intermediate flavor of YARN here since it did change in breaking ways several times between 0.23.x and 2.2 |
@tgravescs @sryza here is another description of yarn-beta changes from yarn-alpha: http://hortonworks.com/blog/stabilizing-yarn-apis-for-apache-hadoop-2-beta-and-beyond/ |
I think @sryza is off this week but when he's back it would be good to get a sense of the various YARN API's and whether this is something we'll have to deal with over multiple versions of CDH. The fact that there is fragmentation here amongst CDH versions and the upstream project is unfortunate. |
TBC this is an upstream YARN thing, and not anything specific to CDH. There are to my knowledge at least three incompatible versions, and they're all in use out there. It is shipped as non-stable everywhere before 2.2 and is actually a separate project from core Hadoop. It may in truth require 3 separate profiles, or a tweak to make 1 profile work across two versions. I had hoped |
As you say, it is a yarn thing, but yarn was alpha/beta before the 2.2.0 release with no api guarantees. YARN 0.23 was also a stable release. I would definitely hesitate about supporting all the various combinations of releases between those. Any user could have checked out hadoop at any time and built it and ended up with any combination of api changes. I think we should pick them ones we want to support. Even with just the 2 of them the building and testing of things takes a lot of time. |
Can one of the admins verify this patch? |
This is the same issue reported in https://issues.apache.org/jira/browse/SPARK-1479 |
Do we have decision whether to accept this patch, or have a timeline for totally removing all support of un-stable yarn api? |
We currently supported two YARN versions - the stable API's in Hadoop 2.2.0+ and the 0.23 release that Yahoo runs internally. The main reason we support Yahoo 0.23 is that @tgravescs, who is the primary YARN committer on Spark, has offered to maintain it. I'd love to see us moving away from 0.23 support and only supporting YARN's stable API's. It will depend a bit on what timeline Yahoo upgrades, Tom might have information on that timeline (?). In terms of supporting other offshoots of YARN that were packaged by vendors or in other intermediate releases, my feeling is that unless we have a committer come and champion this and agree to support it, we shouldn't do it. |
I'm with Patrick on this. Unfortunately we don't have resources at Cloudera right now to maintain Spark/YARN on CDH4. |
I am hoping to deprecate the hadoop 0.23/yarn-alpha at some point, hopefully late this year, but we'll have to figure out which spark release makes sense to deprecate it and exact timeline on when we get everything off of it. |
Can one of the admins verify this patch? |
In that case let's close this issue. If there are a few users who are really dying to support this they can apply this patch manually. |
## What changes were proposed in this pull request? This patch modifies the DB Spark ACL Client interface to not use `Traversable` and uses `Seq` instead. There's be a corresponding patch on the Databricks side too. ## How was this patch tested? - [x] Existing Tests - [x] Manual Tests Author: Sameer Agarwal <sameerag@cs.berkeley.edu> Author: Srinath Shankar <srinath@databricks.com> Closes apache#151 from sameeragarwal/branch-2.1-acl.
) Currently we execute make build in pre.yaml of k8s integration jobs to compile code, but if compile error happen, that cause job test result fall into RETRY_LIMIT status, that make some confusion, we should move make build into run.yaml to cause exact FAILURE status. Closes apache#151
Co-authored-by: chenliang.lu <chenliang.lu@kyligence.io>
Fix the compilation error when set SPARK_HADOOP_VERSION to 2.0.0-cdh4.4.0, That is, the yarn-alpha project should work with hadoop CDH 4.4.0 and later.
Also pass tests on branch-0.9.
Here is jira (thx @srowen reminding): https://issues.apache.org/jira/browse/SPARK-1479