Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost #28848

Closed
wants to merge 10 commits into from

Commits on Jul 15, 2020

  1. [SPARK-32003][CORE] Unregister outputs for executor on fetch failure …

    …after executor is lost
    
    If an executor is lost, the `DAGScheduler` handles the executor loss by
    removing the executor but does not unregister its outputs if the external
    shuffle service is used. However, if the node on which the executor runs
    is lost, the shuffle service may not be able to serve the shuffle files.
    In such a case, when fetches from the executor's outputs fail in the
    same stage, the `DAGScheduler` again removes the executor and by right,
    should unregister its outputs. It doesn't because the epoch used to track
    the executor failure has not increased.
    
    We track the epoch for failed executors that result in lost file output
    separately, so we can unregister the outputs in this scenario.
    wypoon committed Jul 15, 2020
    Configuration menu
    Copy the full SHA
    e36b442 View commit details
    Browse the repository at this point in the history
  2. [SPARK-32003] Update fileLostEpoch.

    I inadvertently left out a line when transferring code. The fileLostEpoch
    needs to be updated with an entry for the failed executor with lost output.
    
    Adopted suggestions from wuyi and attilapiros.
    wypoon committed Jul 15, 2020
    Configuration menu
    Copy the full SHA
    fca8a6e View commit details
    Browse the repository at this point in the history
  3. [SPARK-32003] Clean up test.

    wypoon committed Jul 15, 2020
    Configuration menu
    Copy the full SHA
    973e385 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    b9e55a4 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    06ea411 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    17393eb View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    d450c3e View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    a8e619c View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    1923598 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    0e00862 View commit details
    Browse the repository at this point in the history