Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20540][CORE] Fix unstable executor requests. #17813

Closed

Commits on May 1, 2017

  1. SPARK-20540: Fix unstable executor requests.

    There are two problems fixed in this commit. First, the
    ExecutorAllocationManager sets a timeout to avoid requesting executors
    too often. However, the timeout is always updated based on its value and
    a timeout, not the current time. If the call is delayed by locking for
    more than the ongoing scheduler timeout, the manager will request more
    executors on every run.
    
    The second problem is that the total number of requested executors is
    not tracked by the CoarseGrainedSchedulerBackend. Instead, it calculates
    the value based on the current status of 3 variables: the number of
    known executors, the number of executors that have been killed, and the
    number of pending executors. But, the number of pending executors is
    never less than 0, even though there may be more known than requested.
    When executors are killed and not replaced, this can cause the request
    sent to YARN to be incorrect because there were too many executors due
    to the scheduler's state being slightly out of date.
    rdblue committed May 1, 2017
    Configuration menu
    Copy the full SHA
    3e46f4f View commit details
    Browse the repository at this point in the history