Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4759] Fix driver hanging from coalescing partitions #3633

Closed
wants to merge 4 commits into from

Conversation

andrewor14
Copy link
Contributor

The driver hangs sometimes when we coalesce RDD partitions. See JIRA for more details and reproduction.

This is because our use of empty string as default preferred location in CoalescedRDDPartition causes the TaskSetManager to schedule the corresponding task on host "" (empty string). The intended semantics here, however, is that the partition does not have a preferred location, and the TSM should schedule the corresponding task accordingly.

This is causing the TaskSetManager to try to schedule certain
tasks on the host "" (empty string). The intended semantics here,
however, is that the partition does not have preferred location,
and the TSM should schedule the corresponding task in accordance.
@SparkQA
Copy link

SparkQA commented Dec 8, 2014

Test build #24219 has started for PR 3633 at commit 2f7dfb6.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 8, 2014

Test build #24219 has finished for PR 3633 at commit 2f7dfb6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24219/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Dec 8, 2014

Test build #24231 has started for PR 3633 at commit f370a4e.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 8, 2014

Test build #24231 has finished for PR 3633 at commit f370a4e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24231/
Test PASSed.

@andrewor14 andrewor14 changed the title [SPARK-4759] Avoid using empty string as default preferred location [SPARK-4759] Fix driver hanging from coalescing partitions Dec 9, 2014
def size = arr.size
}

private object PartitionGroup {
def apply(prefLoc: String): PartitionGroup = PartitionGroup(Some(prefLoc))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for backwards-compatibility? Can you have a case class with multiple constructors? If so, it might be nice to just add this to the PartitionGroup class as a secondary constructor.

Also, what do you think about adding a require(prefLoc != "") to guard against code that uses the old empty string technique?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly because you can instantiate case classes without the new keyword. In fact this is how we instantiate instances of this particular class in this file. Adding a new constructor means we need to use the new keyword to instantiate it, and I believe many users of case classes don't actually do that. (also this is a private class so this argument probably doesn't even matter at all)

Yeah I'll add the guard against empty string.

@JoshRosen
Copy link
Contributor

Left a couple of small comments, but this looks good to me overall. +1 for refactoring "stringly-typed" fields into better types.

@SparkQA
Copy link

SparkQA commented Dec 10, 2014

Test build #24283 has started for PR 3633 at commit 3ebf8bd.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 10, 2014

Test build #24283 has finished for PR 3633 at commit 3ebf8bd.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24283/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Dec 10, 2014

Test build #24292 has started for PR 3633 at commit e520d6b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 10, 2014

Test build #24292 has finished for PR 3633 at commit e520d6b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24292/
Test PASSed.

@andrewor14
Copy link
Contributor Author

Alright I'm merging this into master and branch-1.1 thanks for the comments. I'll back port this into branch-1.2 later.

asfgit pushed a commit that referenced this pull request Dec 10, 2014
The driver hangs sometimes when we coalesce RDD partitions. See JIRA for more details and reproduction.

This is because our use of empty string as default preferred location in `CoalescedRDDPartition` causes the `TaskSetManager` to schedule the corresponding task on host `""` (empty string). The intended semantics here, however, is that the partition does not have a preferred location, and the TSM should schedule the corresponding task accordingly.

Author: Andrew Or <andrew@databricks.com>

Closes #3633 from andrewor14/coalesce-preferred-loc and squashes the following commits:

e520d6b [Andrew Or] Oops
3ebf8bd [Andrew Or] A few comments
f370a4e [Andrew Or] Fix tests
2f7dfb6 [Andrew Or] Avoid using empty string as default preferred location

(cherry picked from commit 4f93d0c)
Signed-off-by: Andrew Or <andrew@databricks.com>
@asfgit asfgit closed this in 4f93d0c Dec 10, 2014
@andrewor14 andrewor14 deleted the coalesce-preferred-loc branch December 10, 2014 22:50
asfgit pushed a commit that referenced this pull request Jan 21, 2015
The driver hangs sometimes when we coalesce RDD partitions. See JIRA for more details and reproduction.

This is because our use of empty string as default preferred location in `CoalescedRDDPartition` causes the `TaskSetManager` to schedule the corresponding task on host `""` (empty string). The intended semantics here, however, is that the partition does not have a preferred location, and the TSM should schedule the corresponding task accordingly.

Author: Andrew Or <andrew@databricks.com>

Closes #3633 from andrewor14/coalesce-preferred-loc and squashes the following commits:

e520d6b [Andrew Or] Oops
3ebf8bd [Andrew Or] A few comments
f370a4e [Andrew Or] Fix tests
2f7dfb6 [Andrew Or] Avoid using empty string as default preferred location

(cherry picked from commit 4f93d0c)
Signed-off-by: Andrew Or <andrew@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants