Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dr. Elephant Tez Support working patch #313

Merged
merged 6 commits into from
Apr 4, 2018
Merged

Conversation

chinmayms
Copy link
Contributor

@chinmayms chinmayms commented Dec 14, 2017

Dr. Elephant Version with support for Tez based on patch (PR#278).

Summary:
Working Version with Tez Support currently running on Electronic Arts Production Data Infrastructure.

  • Added 10 heuristics to analyze and flag Tez jobs.
  • Added Resources used and Resources wasted logic based on MR version.
  • Added 'flow history' and 'job history' graph support for Tez.
  • Changed logic from choosing the first DAG to considering DAG with highest number of tasks in case of multiple DAGs in one application.
  • Replicated mapreduce test suite for Tez fetcher, heuristics and data structure classes.

Future Work:
Fundamentally one Tez application when returned by RM may contain multiple DAGs, the original Dr. Elephant Interfaces do not allow for us to go any lower than application level as is the case for MapReduce. Need design changes to further allow Vertex, Edge level analysis.

@chinmayms chinmayms changed the title EA Tez Working Commit Dr. Elephant Tez Support working patch Dec 14, 2017
Copy link
Contributor

@shankar37 shankar37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good from Dr. elephant design perspective. Did not look at the accuracy of fetcher logic or heuristics code.
@akshayrai can you take a look as well ?

private List<Long> finishTimes = new ArrayList<Long>();
private List<Long> durations = new ArrayList<Long>();

private static final double MEMORY_BUFFER = 1.5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make these configurable

@@ -29,6 +29,13 @@
</fetcher>
-->
<fetchers>
<!--
REST based fetcher for Tez jobs which pulls job metrics and data from Timeline Server API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this dependency in the docs and setup instructions.

Copy link
Contributor

@akshayrai akshayrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinmayms, I just quickly scanned through the PR and overall it looks good. Couple of issues to discuss.

  • Are all the Heuristics copied over from MR? If so, can we consider maintaining a single copy of the heuristics rather than so much of duplicate code?
  • Please apply and use the Apache license header in all the files.
  • Use 2 space formatting throughout.
  • It would be great if someone with Tez background can also take a look at this PR.

@chinmayms
Copy link
Contributor Author

@akshayrai , thanks for your review. Addressing your issues one by one. Let me know your thoughts.

  1. The heuristics use the same logic as the original MR ones, however, while implementing the Heuristic interface, they internally use "Tez" specific data structures and classes in the "apply" function. We could make design level changes to have standard heursitics for both but even then they would need to implement Tez specific features at some point. I am ready to take this effort forward in future versions if you think it's a good idea.
  2. Applied Apache license for all files.
  3. Converted Indent to 2 space.
  4. Will look for Tez specific changes once @shkhrgpt completes his review.


public TezFetcher(FetcherConfigurationData fetcherConfData) throws IOException {
this._fetcherConfigurationData = fetcherConfData;
final String applicationHistoryAddr = new Configuration().get("yarn.timeline-service.webapp.address");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declare yarn.timeline-service.webapp.address as a private static final variable.

}
}

final class ThreadContextMR2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ThreadContextMR2 is also used by MapReduceFetcherHadoop2. Refactor code here so there is only one instance of ThreadContextMR2.

}
}

if(mapperListAggregate.isEmpty() && reducerListAggregate.isEmpty()){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move this check to after line 113 with a something like

if (state.equals("FAILED") { 
      jobData.setSucceeded(false); 
}

Copy link
Contributor

@shkhrgpt shkhrgpt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
LGTM.
Thanks for writing this change.

@ray-harrison
Copy link

@chinmayms Our team at Comcast also has a Tez PR and we're also working with the Pepperdata folks to roll this functionality in. Would you be willing to have a short call at at a convenient time with myself and a couple of folks here to maybe combine forces?

@chinmayms
Copy link
Contributor Author

@ray-harrison
Sure, sounds good.
Can we do it on Tuesday 03/27 or Wednesday 03/28 between 1:30pm to 3:30 pm?
Let me know if these don't work for you.

@chrevanthreddy
Copy link
Contributor

@chinmayms Can we do it on Wednesday please. 1:30PM EST? I will send an invite accordingly.

@shkhrgpt
Copy link
Contributor

@ray-harrison, and @chrevanthreddy Thanks for initiating the conversation. I would also like to be part of this discussion, can you please also send me an invite too.

@chinmayms
Copy link
Contributor Author

chinmayms commented Mar 26, 2018

@chrevanthreddy Sure. 1:30 EST on Wednesday works. Thanks

@chrevanthreddy
Copy link
Contributor

chrevanthreddy commented Mar 26, 2018

@shkhrgpt Could you please let me know where I can send you the invite? @chinmayms I have sent the invite to your Gmail Id.

@shkhrgpt
Copy link
Contributor

@chrevanthreddy Please send it to sgupta@pepperdata.com

@akshayrai
Copy link
Contributor

+1 LGTM

@chinmayms
Copy link
Contributor Author

@akshayrai @shankar37
As discussed, could we merge this with linkedin:master?

@shkhrgpt
Copy link
Contributor

shkhrgpt commented Apr 3, 2018

+1 for merging.

@akshayrai akshayrai merged commit a0470a3 into linkedin:master Apr 4, 2018
akshayrai added a commit that referenced this pull request Jun 14, 2018
akshayrai pushed a commit that referenced this pull request Jun 14, 2018
…tribution.

This reverts commit e3fd598.

Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com>
arpang pushed a commit to arpang/dr-elephant that referenced this pull request Jul 11, 2018
arpang pushed a commit to arpang/dr-elephant that referenced this pull request Jul 11, 2018
…uding attribution.

This reverts commit e3fd598.

Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com>
pralabhkumar pushed a commit that referenced this pull request Aug 9, 2018
* Revert "Dr. Elephant Tez Support working patch (#313)"

This reverts commit a0470a3.

* Rerevert "Dr. Elephant Tez Support working patch (#313)" including attribution.

This reverts commit e3fd598.

Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com>

* Auto tuning: Support for parameter set multi-try (#386)

* Changes in some of the Spark Heuristics

* Adding test for changes executor gc heuristic and unified memory heuristic

* Update ExecutorGcHeuristic.scala

* Update UnifiedMemoryHeuristic.scala

* Changed some hard coded values to variables

* Due to strict inequality changing the other thereshold levels for executor and driver
varunsaxena added a commit that referenced this pull request Aug 30, 2018
varunsaxena added a commit that referenced this pull request Aug 30, 2018
pralabhkumar pushed a commit to pralabhkumar/dr-elephant that referenced this pull request Aug 31, 2018
pralabhkumar pushed a commit to pralabhkumar/dr-elephant that referenced this pull request Aug 31, 2018
pralabhkumar pushed a commit to pralabhkumar/dr-elephant that referenced this pull request Aug 31, 2018
…uding attribution.

This reverts commit e3fd598.

Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com>
pralabhkumar pushed a commit to pralabhkumar/dr-elephant that referenced this pull request Aug 31, 2018
* Revert "Dr. Elephant Tez Support working patch (linkedin#313)"

This reverts commit a0470a3.

* Rerevert "Dr. Elephant Tez Support working patch (linkedin#313)" including attribution.

This reverts commit e3fd598.

Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com>

* Auto tuning: Support for parameter set multi-try (linkedin#386)

* Changes in some of the Spark Heuristics

* Adding test for changes executor gc heuristic and unified memory heuristic

* Update ExecutorGcHeuristic.scala

* Update UnifiedMemoryHeuristic.scala

* Changed some hard coded values to variables

* Due to strict inequality changing the other thereshold levels for executor and driver
varunsaxena pushed a commit that referenced this pull request Oct 16, 2018
* Revert "Dr. Elephant Tez Support working patch (#313)"

This reverts commit a0470a3.

* Rerevert "Dr. Elephant Tez Support working patch (#313)" including attribution.

This reverts commit e3fd598.

Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com>

* Auto tuning: Support for parameter set multi-try (#386)

* Changes in some of the Spark Heuristics

* Adding test for changes executor gc heuristic and unified memory heuristic

* Update ExecutorGcHeuristic.scala

* Update UnifiedMemoryHeuristic.scala

* Changed some hard coded values to variables

* Due to strict inequality changing the other thereshold levels for executor and driver
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants