-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6304][Streaming] Fix checkpointing doesn't retain driver port issue. #5060
Conversation
Test build #28698 has finished for PR 5060 at commit
|
Hi @tdas , I think this is actually a bug, would you please help to review this one? |
Since it is pending for a long time. |
Test build #30926 has finished for PR 5060 at commit
|
@@ -94,6 +94,11 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli | |||
// contains a map from hostname to a list of input format splits on the host. | |||
private[spark] var preferredNodeLocationData: Map[String, Set[SplitInfo]] = Map() | |||
|
|||
// This is used for Spark Streaming to check whether driver host and port are set by user, | |||
// if these two configurations are set by user, so the recovery mechanism should not remove this. | |||
private[spark] val isDriverHostSetByUser = config.contains("spark.driver.host") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem worth tacking on yet more little fields in SparkContext
just for a niche use case in a submodule. Use the config object in Checkpoint
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure it is a good idea to clutter SparkContext further with such functions, especially when Spark core itself does not use it. Would be good to find a different solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think there has to be a place in Spark Core to judge whether this configuration is set by user or Spark itself before SparkContext is initialized, either in SparkConf or somewhere else. It cannot be gotten from Spark Streaming, where all the SparkContext things have already been initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to track this in Checkpoint
, since SparkEnv
will reset this configuration if user not set it, so in the Checkpoint
how to differentiate whether this is set by user or SparkEnv
?
I thought more about this particular issue. These two lines in the SparkContext actually sets the two parameters explicitly with host = local hostname and port = 0, ONLY IF they are not set from the user-provided conf. For each of them here is my observation.
I think this simplifies the whole idea - never save host, always save port. Isnt it? |
Test build #31052 has started for PR 5060 at commit |
I see, that's the problem. Its not clear from the streaming code what the value that the user had set before the SparkEnv set the port value. Without that this problem cannot be solved. |
bump. What is the status on this PR @jerryshao @tdas? |
retest this please |
Hi @andrewor14 , I think this is indeed a bug for checkpointing in Spark Streaming, my implementation just add two fields in SparkContext to save the snapshot of this two configurations, but this is no so elegant as TD suggested, Currently I cannot figure out any other better place to hold this port number. So any suggestion is greatly appreciated. |
Test build #35191 has finished for PR 5060 at commit
|
The tricky part for this PR is to figure out when the port was specified by On Thu, Jun 18, 2015 at 6:39 PM, UCB AMPLab notifications@github.com
|
@jerryshao let finalize this PR. I think what we should do is. Currently, we remove spark.driver.* from conf used to create the recovered streaming context, ignoring the fact that the user may be explicitly setting those conf in spark-defaults.conf. That is wrong, the general policy should be never recover from spark.driver.* from checkpointed conf. Then if the those properties are set in the defaults, they would be present in the final conf for restarting context, other they wont be. This solves the original problem in the JIRA. If someone wants to set the port explicitly, then they can set if in the spark-defaults.conf. With the above change, it will not be explicitly deleted when recovering and will be automatically used in the recovered context. Sounds good? If so, please update the PR. |
Yeah, that sounds good :), let me update the code. |
@@ -50,7 +50,9 @@ class Checkpoint(@transient ssc: StreamingContext, val checkpointTime: Time) | |||
val propertiesToReload = List( | |||
"spark.master", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you keep these alphabetically sorted. Looks cleaner.
Test build #37341 has finished for PR 5060 at commit
|
val newCpConf = newCp.createSparkConf() | ||
assert(newCpConf.contains("spark.driver.host")) | ||
assert(newCpConf.contains("spark.driver.port")) | ||
assert(newCpConf.get("spark.driver.host") === "localhost") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just tests whether its correctly set in the new conf when it is set as system property. you should also test the other case, where the new conf does not have them when they are not in the property, even though it was present in the original conf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will update the test
Test build #37352 has finished for PR 5060 at commit
|
Test build #37446 has finished for PR 5060 at commit
|
All right merging this. |
👍 Thanks guys! (I was the original JIRA author.) |
No description provided.