Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-1830 Deploy failover, Make Persistence engine and LeaderAgent Pluggable #771

Closed

Conversation

ScrapCodes
Copy link
Member

No description provided.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14979/

@ScrapCodes
Copy link
Member Author

Jenkins, test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15021/

@ScrapCodes
Copy link
Member Author

@aarondav Mind taking a look ?

@aarondav
Copy link
Contributor

@ScrapCodes, I definitely like the idea. I think we need to update the API a little before it's ready to be publicly accessible, though. First, LeaderElectionAgent and PersistenceEngine should be @DeveloperApi and not private[spark].

Second, LeaderElectionAgent should be refactored to not be an Actor (there is a TODO since the Curator refactor, I think it just needs to be done before we can finalize this API). This should be pretty straightforward to do, I would just add a trait like

trait LeaderElectable {
  def electedLeader()
  def revokedLeadership()
}

which Master implements by simply calling self ! ElectedLeader, for instance, to ensure we maintain the thread safety of the actor without leaking the actor abstraction.

@@ -143,6 +146,9 @@ private[spark] class Master(
leaderElectionAgent = RECOVERY_MODE match {
case "ZOOKEEPER" =>
context.actorOf(Props(classOf[ZooKeeperLeaderElectionAgent], self, masterUrl, conf))
case "CUSTOM" =>
val clazz = Class.forName(conf.get("spark.deploy.recoveryMode.leaderAgentActor"))
context.actorOf(Props(clazz, self, masterUrl, conf))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a good idea to abstract out the creation of the persistenceEngine and leaderElectionAgent into something like

abstract class StandaloneRecoveryModeFactory(conf: SparkConf) {
  def createPersistenceEngine(): PersistenceEngine
  def createLeaderElectionAgent(master: LeaderElectable, masterUrl: String): LeaderElectionAgent
}

with 2 subclasses, FileSystemRecoveryModeFactory and ZooKeeperRecoveryModeFactory (maybe the name "RecoveryModeFactory" is not so good).

The advantage of this approach is that we formalize the arguments needed for constructing the 2 APIs, and centralize the creation of everything necessary for one recovery mode. Thoughts?

@AmplabJenkins
Copy link

Merged build triggered.

@ScrapCodes
Copy link
Member Author

@aarondav I am not 100% sure about the idea but for a moment I felt may be we could club PersistenceEngine and LeaderElectionAgent together into one and call it FailoverAgent ? I am not supporting it but it sort of occurred to me so thought of sharing it. Take a look at this updated patch !

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15479/

@aarondav
Copy link
Contributor

aarondav commented Jun 5, 2014

That does sound reasonable, especially since the LeaderElectionAgent relies on the behavior of the PersistenceEngine in order to actually enable recovery.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15496/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15498/

@ScrapCodes
Copy link
Member Author

Hey @aarondav, Do you think its worth having in its current condition ? I can rebase it ofcourse. I was actually unsure of changing it further.

@pwendell
Copy link
Contributor

@aarondav could you give feedback one way or the other on this?

val out = new FileOutputStream(file)
out.write(serialized)
out.write(serialized.array())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could use serializeStream() instead to avoid relying on a ByteBuffer's array()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not understand, what you meant here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just suggesting that we could do

val out = serializer.serializeStream(new FileOutputStream(file))
out.writeObject(value)
out.close()

It's a very minor point about ByteBuffers not always having a valid array().

@DeveloperApi
trait PersistenceEngine {

def persist(name: String, obj: Object)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add documentation to these three methods to make the requirements clear.

…luggable.

Refactored Leader Election agent and added a RecoveryModeFactory.

Implemented new proposal with some convenient modifications.

Added a read method.
@SparkQA
Copy link

SparkQA commented Oct 22, 2014

QA tests have started for PR 771 at commit fef35ec.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 22, 2014

QA tests have finished for PR 771 at commit fef35ec.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait LeaderElectionAgent
    • trait LeaderElectable
    • trait PersistenceEngine
    • abstract class StandaloneRecoveryModeFactory(conf: SparkConf)

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22028/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Oct 22, 2014

QA tests have started for PR 771 at commit 29ba440.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 22, 2014

QA tests have finished for PR 771 at commit 29ba440.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait LeaderElectionAgent
    • trait LeaderElectable
    • trait PersistenceEngine
    • abstract class StandaloneRecoveryModeFactory(conf: SparkConf)

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22030/
Test FAILed.

@ScrapCodes
Copy link
Member Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Oct 24, 2014

Test build #22127 has started for PR 771 at commit 29ba440.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 24, 2014

Test build #22127 has finished for PR 771 at commit 29ba440.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait LeaderElectionAgent
    • trait LeaderElectable
    • trait PersistenceEngine
    • abstract class StandaloneRecoveryModeFactory(conf: SparkConf)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22127/
Test PASSed.

@ScrapCodes
Copy link
Member Author

Did I miss something ?

@aarondav
Copy link
Contributor

LGTM, merging into master. Thanks!

@asfgit asfgit closed this in deefd9d Nov 11, 2014
tianyi pushed a commit to asiainfo/spark that referenced this pull request Dec 4, 2014
…luggable

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes apache#771 from ScrapCodes/deploy-failover-pluggable and squashes the following commits:

29ba440 [Prashant Sharma] fixed a compilation error
fef35ec [Prashant Sharma] Code review
57ee6f0 [Prashant Sharma] SPARK-1830 Deploy failover, Make Persistence engine and LeaderAgent Pluggable.
tianyi pushed a commit to asiainfo/spark that referenced this pull request Dec 4, 2014
…luggable

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes apache#771 from ScrapCodes/deploy-failover-pluggable and squashes the following commits:

29ba440 [Prashant Sharma] fixed a compilation error
fef35ec [Prashant Sharma] Code review
57ee6f0 [Prashant Sharma] SPARK-1830 Deploy failover, Make Persistence engine and LeaderAgent Pluggable.
tianyi pushed a commit to asiainfo/spark that referenced this pull request Dec 4, 2014
…luggable

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes apache#771 from ScrapCodes/deploy-failover-pluggable and squashes the following commits:

29ba440 [Prashant Sharma] fixed a compilation error
fef35ec [Prashant Sharma] Code review
57ee6f0 [Prashant Sharma] SPARK-1830 Deploy failover, Make Persistence engine and LeaderAgent Pluggable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants