Update #224 to add FSFetcher as a standalone fetcher #232

shankar37 · 2017-04-04T09:40:16Z

No description provided.

…m to the new interface

… fetcher. Change the default fetcher in the config

superbobry · 2017-04-04T11:59:13Z

app-conf/FetcherConf.xml

@@ -29,14 +29,15 @@
  </fetcher>
 -->
 <fetchers>
+  <!--


This disables MR fetcher by default. Is it intentional or committed by accident?

accident. I disabled it to test the Spark Fetcher alone. Will put it back

…erPR

shkhrgpt · 2017-04-06T20:58:16Z

app/com/linkedin/drelephant/spark/fetchers/SparkFetcher.scala

@@ -62,31 +64,31 @@ class SparkFetcher(fetcherConfigurationData: FetcherConfigurationData)
  }

  override def fetchData(analyticJob: AnalyticJob): SparkApplicationData = {
+    doFetchData(analyticJob) match {


Maybe this match block is not needed? Just call doFetchData method which either returns SparkApplicationData or throws an exception if an error occurs.

shkhrgpt · 2017-04-06T21:04:10Z

@shankar37 I am a little confused about the overall goal. Like MapReduce, do we want to have two separate fetchers, REST API based and FileSystem based? Or we just want to have one fetcher with an option to specify whether to use REST or FileSystem?

superbobry · 2017-04-10T12:48:16Z

A related question: if we are to introduce two fetchers, would they be mutually exclusive? If not, how do we ensure there're no data races between them?

Fixed up existing tests. Need to add a couple more tests.

shankar37 · 2017-04-11T05:06:20Z

The REST and FS based fetcher be two mutually exclusive fetcher just like in MapReduce. I have updated the PR accordingly. Please take a look and let me know your comments.

akshayrai · 2017-04-11T10:22:15Z

+1 LGTM

shkhrgpt · 2017-04-11T16:05:50Z

A general note, I think the PR, #225, should be merged first so changes can be made in this PR accordingly.

shkhrgpt · 2017-04-11T16:14:36Z

app/com/linkedin/drelephant/spark/fetchers/FSFetcher.scala

+  lazy val legacyFetcher = new SparkFSFetcher(fetcherConfigurationData)
+
+  override def fetchData(analyticJob: AnalyticJob): SparkApplicationData = {
+    val legacyData = legacyFetcher.fetchData(analyticJob)


I don't think we need the legacy package and other legacy classes in it. Since we will be using the file system fetcher, so it's no longer a legacy code. We should instead have all the relevant classes in this fetchers package.

The way SparkFSFetcher reads the event logs needs to be revisioned to not rely on older API like replaybus. That's why I have kept it as legacy for now. I will fix those and then move it to fetchers in a separate PR

Alright. That makes sense.
Thank you.

shkhrgpt · 2017-04-11T16:16:13Z

app/com/linkedin/drelephant/spark/SparkMetricsAggregator.scala

@@ -63,10 +63,18 @@ class SparkMetricsAggregator(private val aggregatorConfigurationData: Aggregator
      case false => 0.0


This change is not related to fetchers, maybe there should be a separate PR for this change with more context.

…erPR # Conflicts: # app/com/linkedin/drelephant/spark/fetchers/SparkFetcher.scala # app/com/linkedin/drelephant/spark/fetchers/SparkLogClient.scala

shkhrgpt · 2017-04-14T20:15:07Z

@akshayrai @shankar37 I was still reviewing this change, and it is now merged without those reviews. I am sorry but I don't think it is appropriate to merge such a big change (more than 3000 lines) without enough reviews.

superbobry · 2017-04-14T21:40:24Z

app/com/linkedin/drelephant/spark/legacydata/SparkEnvironmentData.java

+ * This data class holds Spark environment data (Spark properties, JVM properties and etc.)
+ */
+public class SparkEnvironmentData {
+  private final Properties _sparkProperties;


I know it is too late, but is there a reason to store it as Properties and not as Map<String, String>?

That's because HadoopApplicationData interface expects it to be Properties. Probably a legacy of what mapreduce's getconf returns. Now, it will be a big change to change it. I am going to leave it as it is for now.

superbobry · 2017-04-14T21:41:21Z

app/com/linkedin/drelephant/spark/legacydata/SparkExecutorData.java

+    public long shuffleRead = 0L;
+    public long shuffleWrite = 0L;
+
+    public String toString() {


Consider using MoreObjects.toStringHelper from Guava.

superbobry · 2017-04-14T21:43:19Z