Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix compile error for hadoop CDH 4.4+ #151

Closed
wants to merge 1 commit into from

Conversation

gzm55
Copy link

@gzm55 gzm55 commented Mar 15, 2014

Fix the compilation error when set SPARK_HADOOP_VERSION to 2.0.0-cdh4.4.0, That is, the yarn-alpha project should work with hadoop CDH 4.4.0 and later.

Also pass tests on branch-0.9.

Here is jira (thx @srowen reminding): https://issues.apache.org/jira/browse/SPARK-1479

Using a macro, we work round the difference between hadoop 2.0-alpha and
2.1-beta api, and fix the compilation error when set
SPARK_HADOOP_VERSION to 2.0.0-cdh4.4.0. That is, the yarn-alpha project
should work with hadoop CDH 4.4+ and later.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@@ -736,7 +736,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
}

object JavaPairDStream {
implicit def fromPairDStream[K: ClassTag, V: ClassTag](dstream: DStream[(K, V)]) = {
implicit def fromPairDStream[K: ClassTag, V: ClassTag](dstream: DStream[(K, V)]): JavaPairDStream[K, V] = {
new JavaPairDStream[K, V](dstream)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I will move it to another pr.

@tgravescs
Copy link
Contributor

can you please file a jira for this. You can't just use the "yarn" profile for this (instead of yarn-alpha)?
Is that because in the cloudera version (CDH 4.4.0+ (2.1.0-beta)) it has some of the api's changed but not all of the ones in the hadoop 2.2 and greater versions?

@sryza can you confirm?

@srowen
Copy link
Member

srowen commented Mar 19, 2014

@tgravescs FWIW I think something like this is the case, yes. The change happened in https://issues.apache.org/jira/browse/YARN-396 which occurred for the first(?) YARN TLP release with Hadoop 2.1. And CDH 4.4 was the first release I see that picked up this change. I assume it was useful/necessary to float this 'alpha' API ahead.

I also would have thought it's possible the yarn profile works with this release, but I do not know. Just making sure that has been tried?

Otherwise yeah it looks like a question of supporting another intermediate flavor of YARN here since it did change in breaking ways several times between 0.23.x and 2.2

@gzm55
Copy link
Author

gzm55 commented Mar 19, 2014

@tgravescs @sryza here is another description of yarn-beta changes from yarn-alpha: http://hortonworks.com/blog/stabilizing-yarn-apis-for-apache-hadoop-2-beta-and-beyond/

@pwendell
Copy link
Contributor

I think @sryza is off this week but when he's back it would be good to get a sense of the various YARN API's and whether this is something we'll have to deal with over multiple versions of CDH. The fact that there is fragmentation here amongst CDH versions and the upstream project is unfortunate.

@srowen
Copy link
Member

srowen commented Mar 19, 2014

TBC this is an upstream YARN thing, and not anything specific to CDH. There are to my knowledge at least three incompatible versions, and they're all in use out there. It is shipped as non-stable everywhere before 2.2 and is actually a separate project from core Hadoop. It may in truth require 3 separate profiles, or a tweak to make 1 profile work across two versions. I had hoped yarn might happen to work across beta/stable work but am doubting that even that is true, because of protobuf. Worth verifying.

@tgravescs
Copy link
Contributor

As you say, it is a yarn thing, but yarn was alpha/beta before the 2.2.0 release with no api guarantees. YARN 0.23 was also a stable release. I would definitely hesitate about supporting all the various combinations of releases between those. Any user could have checked out hadoop at any time and built it and ended up with any combination of api changes. I think we should pick them ones we want to support. Even with just the 2 of them the building and testing of things takes a lot of time.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Apr 13, 2014

This is the same issue reported in https://issues.apache.org/jira/browse/SPARK-1479

@gzm55
Copy link
Author

gzm55 commented Jul 16, 2014

Do we have decision whether to accept this patch, or have a timeline for totally removing all support of un-stable yarn api?

@pwendell
Copy link
Contributor

We currently supported two YARN versions - the stable API's in Hadoop 2.2.0+ and the 0.23 release that Yahoo runs internally. The main reason we support Yahoo 0.23 is that @tgravescs, who is the primary YARN committer on Spark, has offered to maintain it. I'd love to see us moving away from 0.23 support and only supporting YARN's stable API's. It will depend a bit on what timeline Yahoo upgrades, Tom might have information on that timeline (?).

In terms of supporting other offshoots of YARN that were packaged by vendors or in other intermediate releases, my feeling is that unless we have a committer come and champion this and agree to support it, we shouldn't do it.

@sryza
Copy link
Contributor

sryza commented Jul 18, 2014

I'm with Patrick on this. Unfortunately we don't have resources at Cloudera right now to maintain Spark/YARN on CDH4.

@tgravescs
Copy link
Contributor

I am hoping to deprecate the hadoop 0.23/yarn-alpha at some point, hopefully late this year, but we'll have to figure out which spark release makes sense to deprecate it and exact timeline on when we get everything off of it.

@SparkQA
Copy link

SparkQA commented Sep 5, 2014

Can one of the admins verify this patch?

@pwendell
Copy link
Contributor

In that case let's close this issue. If there are a few users who are really dying to support this they can apply this patch manually.

@asfgit asfgit closed this in a48956f Sep 19, 2014
ericl pushed a commit to ericl/spark that referenced this pull request Dec 30, 2016
## What changes were proposed in this pull request?

This patch modifies the DB Spark ACL Client interface to not use `Traversable` and uses `Seq` instead. There's be a corresponding patch on the Databricks side too.

## How was this patch tested?

- [x] Existing Tests
- [x] Manual Tests

Author: Sameer Agarwal <sameerag@cs.berkeley.edu>
Author: Srinath Shankar <srinath@databricks.com>

Closes apache#151 from sameeragarwal/branch-2.1-acl.
Igosuki pushed a commit to Adikteev/spark that referenced this pull request Jul 31, 2018
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
)

Currently we execute make build in pre.yaml of k8s integration jobs to
compile code, but if compile error happen, that cause job test result
fall into RETRY_LIMIT status, that make some confusion, we should move
make build into run.yaml to cause exact FAILURE status.

Closes apache#151
microbearz pushed a commit to microbearz/spark that referenced this pull request Aug 21, 2020
Co-authored-by: chenliang.lu <chenliang.lu@kyligence.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants