Dr. Elephant Tez Support working patch #313

chinmayms · 2017-12-14T18:53:06Z

Dr. Elephant Version with support for Tez based on patch (PR#278).

Summary:
Working Version with Tez Support currently running on Electronic Arts Production Data Infrastructure.

Added 10 heuristics to analyze and flag Tez jobs.
Added Resources used and Resources wasted logic based on MR version.
Added 'flow history' and 'job history' graph support for Tez.
Changed logic from choosing the first DAG to considering DAG with highest number of tasks in case of multiple DAGs in one application.
Replicated mapreduce test suite for Tez fetcher, heuristics and data structure classes.

Future Work:
Fundamentally one Tez application when returned by RM may contain multiple DAGs, the original Dr. Elephant Interfaces do not allow for us to go any lower than application level as is the case for MapReduce. Need design changes to further allow Vertex, Edge level analysis.

shankar37

Overall looks good from Dr. elephant design perspective. Did not look at the accuracy of fetcher logic or heuristics code.
@akshayrai can you take a look as well ?

shankar37 · 2018-02-23T09:50:11Z

app/com/linkedin/drelephant/tez/TezTaskLevelAggregatedMetrics.java

+    private List<Long> finishTimes = new ArrayList<Long>();
+    private List<Long> durations = new ArrayList<Long>();
+
+    private static final double MEMORY_BUFFER = 1.5;


make these configurable

shankar37 · 2018-02-23T10:01:44Z

app-conf/FetcherConf.xml

@@ -29,6 +29,13 @@
  </fetcher>
 -->
 <fetchers>
+  <!--
+     REST based fetcher for Tez jobs which pulls job metrics and data from Timeline Server API


Call this dependency in the docs and setup instructions.

akshayrai

@chinmayms, I just quickly scanned through the PR and overall it looks good. Couple of issues to discuss.

Are all the Heuristics copied over from MR? If so, can we consider maintaining a single copy of the heuristics rather than so much of duplicate code?
Please apply and use the Apache license header in all the files.
Use 2 space formatting throughout.
It would be great if someone with Tez background can also take a look at this PR.

chinmayms · 2018-03-02T19:52:40Z

@akshayrai , thanks for your review. Addressing your issues one by one. Let me know your thoughts.

The heuristics use the same logic as the original MR ones, however, while implementing the Heuristic interface, they internally use "Tez" specific data structures and classes in the "apply" function. We could make design level changes to have standard heursitics for both but even then they would need to implement Tez specific features at some point. I am ready to take this effort forward in future versions if you think it's a good idea.
Applied Apache license for all files.
Converted Indent to 2 space.
Will look for Tez specific changes once @shkhrgpt completes his review.

shkhrgpt · 2018-03-14T06:21:57Z

app/com/linkedin/drelephant/tez/fetchers/TezFetcher.java

+
+  public TezFetcher(FetcherConfigurationData fetcherConfData) throws IOException {
+    this._fetcherConfigurationData = fetcherConfData;
+    final String applicationHistoryAddr = new Configuration().get("yarn.timeline-service.webapp.address");


Declare yarn.timeline-service.webapp.address as a private static final variable.

shkhrgpt · 2018-03-21T20:31:04Z

app/com/linkedin/drelephant/tez/fetchers/TezFetcher.java

+  }
+}
+
+final class ThreadContextMR2 {


ThreadContextMR2 is also used by MapReduceFetcherHadoop2. Refactor code here so there is only one instance of ThreadContextMR2.

shkhrgpt · 2018-03-22T21:17:08Z

app/com/linkedin/drelephant/tez/fetchers/TezFetcher.java

+      }
+    }
+
+    if(mapperListAggregate.isEmpty() && reducerListAggregate.isEmpty()){


Why not move this check to after line 113 with a something like

if (state.equals("FAILED") { jobData.setSucceeded(false); }

…hanges

shkhrgpt

+1
LGTM.
Thanks for writing this change.

ray-harrison · 2018-03-26T19:22:42Z

@chinmayms Our team at Comcast also has a Tez PR and we're also working with the Pepperdata folks to roll this functionality in. Would you be willing to have a short call at at a convenient time with myself and a couple of folks here to maybe combine forces?

chinmayms · 2018-03-26T21:09:26Z

@ray-harrison
Sure, sounds good.
Can we do it on Tuesday 03/27 or Wednesday 03/28 between 1:30pm to 3:30 pm?
Let me know if these don't work for you.

chrevanthreddy · 2018-03-26T21:41:23Z

@chinmayms Can we do it on Wednesday please. 1:30PM EST? I will send an invite accordingly.

shkhrgpt · 2018-03-26T21:44:30Z

@ray-harrison, and @chrevanthreddy Thanks for initiating the conversation. I would also like to be part of this discussion, can you please also send me an invite too.

chinmayms · 2018-03-26T21:51:59Z

@chrevanthreddy Sure. 1:30 EST on Wednesday works. Thanks

chrevanthreddy · 2018-03-26T22:15:44Z

@shkhrgpt Could you please let me know where I can send you the invite? @chinmayms I have sent the invite to your Gmail Id.

shkhrgpt · 2018-03-26T22:18:49Z

@chrevanthreddy Please send it to sgupta@pepperdata.com

akshayrai · 2018-03-29T07:00:15Z

+1 LGTM

chinmayms · 2018-04-03T21:51:06Z

@akshayrai @shankar37
As discussed, could we merge this with linkedin:master?

shkhrgpt · 2018-04-03T21:59:14Z

+1 for merging.

This reverts commit a0470a3.

…tribution. This reverts commit e3fd598. Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com>

This reverts commit a0470a3.

…uding attribution. This reverts commit e3fd598. Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com>

* Revert "Dr. Elephant Tez Support working patch (#313)" This reverts commit a0470a3. * Rerevert "Dr. Elephant Tez Support working patch (#313)" including attribution. This reverts commit e3fd598. Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com> * Auto tuning: Support for parameter set multi-try (#386) * Changes in some of the Spark Heuristics * Adding test for changes executor gc heuristic and unified memory heuristic * Update ExecutorGcHeuristic.scala * Update UnifiedMemoryHeuristic.scala * Changed some hard coded values to variables * Due to strict inequality changing the other thereshold levels for executor and driver

…uding attribution." This reverts commit d5476c1.

This reverts commit 8d7b64d.

This reverts commit a0470a3.

…uding attribution. This reverts commit e3fd598. Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com>

* Revert "Dr. Elephant Tez Support working patch (linkedin#313)" This reverts commit a0470a3. * Rerevert "Dr. Elephant Tez Support working patch (linkedin#313)" including attribution. This reverts commit e3fd598. Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com> * Auto tuning: Support for parameter set multi-try (linkedin#386) * Changes in some of the Spark Heuristics * Adding test for changes executor gc heuristic and unified memory heuristic * Update ExecutorGcHeuristic.scala * Update UnifiedMemoryHeuristic.scala * Changed some hard coded values to variables * Due to strict inequality changing the other thereshold levels for executor and driver

* Revert "Dr. Elephant Tez Support working patch (#313)" This reverts commit a0470a3. * Rerevert "Dr. Elephant Tez Support working patch (#313)" including attribution. This reverts commit e3fd598. Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com> * Auto tuning: Support for parameter set multi-try (#386) * Changes in some of the Spark Heuristics * Adding test for changes executor gc heuristic and unified memory heuristic * Update ExecutorGcHeuristic.scala * Update UnifiedMemoryHeuristic.scala * Changed some hard coded values to variables * Due to strict inequality changing the other thereshold levels for executor and driver

Sumant added 2 commits December 13, 2017 12:45

EA Tez Working Commit

9b9d285

Added Apache License

9cc5264

chinmayms changed the title ~~EA Tez Working Commit~~ Dr. Elephant Tez Support working patch Dec 14, 2017

shankar37 reviewed Feb 23, 2018

View reviewed changes

akshayrai suggested changes Mar 2, 2018

View reviewed changes

Indent Change and License header update

805eb61

shkhrgpt reviewed Mar 14, 2018

View reviewed changes

Tez fetcher variable change

fb6b83c

shkhrgpt reviewed Mar 21, 2018

View reviewed changes

shkhrgpt reviewed Mar 22, 2018

View reviewed changes

Removed default 100 Limit for Vertex URL task Fetch and code review c…

74af5c0

…hanges

shkhrgpt approved these changes Mar 23, 2018

View reviewed changes

shkhrgpt mentioned this pull request Mar 26, 2018

Adding Tez Heuristics to Dr. Elephant #256

Closed

Resolving merge conflicts

9bc14bc

akshayrai approved these changes Apr 4, 2018

View reviewed changes

akshayrai merged commit a0470a3 into linkedin:master Apr 4, 2018

akshayrai added a commit that referenced this pull request Jun 14, 2018

Revert "Dr. Elephant Tez Support working patch (#313)"

e3fd598

This reverts commit a0470a3.

akshayrai pushed a commit that referenced this pull request Jun 14, 2018

Rerevert "Dr. Elephant Tez Support working patch (#313)" including at…

860dbe6

…tribution. This reverts commit e3fd598. Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com>

arpang pushed a commit to arpang/dr-elephant that referenced this pull request Jul 11, 2018

Revert "Dr. Elephant Tez Support working patch (linkedin#313)"

8d7b64d

This reverts commit a0470a3.

varunsaxena added a commit that referenced this pull request Aug 30, 2018

Revert "Rerevert "Dr. Elephant Tez Support working patch (#313)" incl…

5caff67

…uding attribution." This reverts commit d5476c1.

varunsaxena added a commit that referenced this pull request Aug 30, 2018

Revert "Revert "Dr. Elephant Tez Support working patch (#313)""

0004b21

This reverts commit 8d7b64d.

pralabhkumar pushed a commit to pralabhkumar/dr-elephant that referenced this pull request Aug 31, 2018

Dr. Elephant Tez Support working patch (linkedin#313)

4474162

pralabhkumar pushed a commit to pralabhkumar/dr-elephant that referenced this pull request Aug 31, 2018

Revert "Dr. Elephant Tez Support working patch (linkedin#313)"

3c3ddc3

This reverts commit a0470a3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dr. Elephant Tez Support working patch #313

Dr. Elephant Tez Support working patch #313

chinmayms commented Dec 14, 2017 •

edited

Loading

shankar37 left a comment

shankar37 Feb 23, 2018

shankar37 Feb 23, 2018

akshayrai left a comment

chinmayms commented Mar 2, 2018

shkhrgpt Mar 14, 2018

shkhrgpt Mar 21, 2018

shkhrgpt Mar 22, 2018

shkhrgpt left a comment

ray-harrison commented Mar 26, 2018

chinmayms commented Mar 26, 2018

chrevanthreddy commented Mar 26, 2018

shkhrgpt commented Mar 26, 2018

chinmayms commented Mar 26, 2018 •

edited

Loading

chrevanthreddy commented Mar 26, 2018 •

edited

Loading

shkhrgpt commented Mar 26, 2018

akshayrai commented Mar 29, 2018

chinmayms commented Apr 3, 2018

shkhrgpt commented Apr 3, 2018

Dr. Elephant Tez Support working patch #313

Dr. Elephant Tez Support working patch #313

Conversation

chinmayms commented Dec 14, 2017 • edited Loading

shankar37 left a comment

Choose a reason for hiding this comment

shankar37 Feb 23, 2018

Choose a reason for hiding this comment

shankar37 Feb 23, 2018

Choose a reason for hiding this comment

akshayrai left a comment

Choose a reason for hiding this comment

chinmayms commented Mar 2, 2018

shkhrgpt Mar 14, 2018

Choose a reason for hiding this comment

shkhrgpt Mar 21, 2018

Choose a reason for hiding this comment

shkhrgpt Mar 22, 2018

Choose a reason for hiding this comment

shkhrgpt left a comment

Choose a reason for hiding this comment

ray-harrison commented Mar 26, 2018

chinmayms commented Mar 26, 2018

chrevanthreddy commented Mar 26, 2018

shkhrgpt commented Mar 26, 2018

chinmayms commented Mar 26, 2018 • edited Loading

chrevanthreddy commented Mar 26, 2018 • edited Loading

shkhrgpt commented Mar 26, 2018

akshayrai commented Mar 29, 2018

chinmayms commented Apr 3, 2018

shkhrgpt commented Apr 3, 2018

chinmayms commented Dec 14, 2017 •

edited

Loading

chinmayms commented Mar 26, 2018 •

edited

Loading

chrevanthreddy commented Mar 26, 2018 •

edited

Loading