Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update #224 to add FSFetcher as a standalone fetcher #232

Merged
merged 13 commits into from
Apr 14, 2017

Conversation

shankar37
Copy link
Contributor

No description provided.

@@ -29,14 +29,15 @@
</fetcher>
-->
<fetchers>
<!--
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This disables MR fetcher by default. Is it intentional or committed by accident?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accident. I disabled it to test the Spark Fetcher alone. Will put it back

@@ -62,31 +64,31 @@ class SparkFetcher(fetcherConfigurationData: FetcherConfigurationData)
}

override def fetchData(analyticJob: AnalyticJob): SparkApplicationData = {
doFetchData(analyticJob) match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this match block is not needed? Just call doFetchData method which either returns SparkApplicationData or throws an exception if an error occurs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@shkhrgpt
Copy link
Contributor

shkhrgpt commented Apr 6, 2017

@shankar37 I am a little confused about the overall goal. Like MapReduce, do we want to have two separate fetchers, REST API based and FileSystem based? Or we just want to have one fetcher with an option to specify whether to use REST or FileSystem?

@superbobry
Copy link
Contributor

A related question: if we are to introduce two fetchers, would they be mutually exclusive? If not, how do we ensure there're no data races between them?

shankar added 2 commits April 11, 2017 09:59
Fixed up existing tests. Need to add a couple more tests.
@shankar37
Copy link
Contributor Author

The REST and FS based fetcher be two mutually exclusive fetcher just like in MapReduce. I have updated the PR accordingly. Please take a look and let me know your comments.

@akshayrai
Copy link
Contributor

+1 LGTM

@shkhrgpt
Copy link
Contributor

A general note, I think the PR, #225, should be merged first so changes can be made in this PR accordingly.

lazy val legacyFetcher = new SparkFSFetcher(fetcherConfigurationData)

override def fetchData(analyticJob: AnalyticJob): SparkApplicationData = {
val legacyData = legacyFetcher.fetchData(analyticJob)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the legacy package and other legacy classes in it. Since we will be using the file system fetcher, so it's no longer a legacy code. We should instead have all the relevant classes in this fetchers package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way SparkFSFetcher reads the event logs needs to be revisioned to not rely on older API like replaybus. That's why I have kept it as legacy for now. I will fix those and then move it to fetchers in a separate PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. That makes sense.
Thank you.

@@ -63,10 +63,18 @@ class SparkMetricsAggregator(private val aggregatorConfigurationData: Aggregator
case false => 0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not related to fetchers, maybe there should be a separate PR for this change with more context.

shankar added 4 commits April 12, 2017 09:08
…erPR

# Conflicts:
#	app/com/linkedin/drelephant/spark/fetchers/SparkFetcher.scala
#	app/com/linkedin/drelephant/spark/fetchers/SparkLogClient.scala
@akshayrai akshayrai merged commit c8a7009 into linkedin:master Apr 14, 2017
@shkhrgpt
Copy link
Contributor

@akshayrai @shankar37 I was still reviewing this change, and it is now merged without those reviews. I am sorry but I don't think it is appropriate to merge such a big change (more than 3000 lines) without enough reviews.

* This data class holds Spark environment data (Spark properties, JVM properties and etc.)
*/
public class SparkEnvironmentData {
private final Properties _sparkProperties;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is too late, but is there a reason to store it as Properties and not as Map<String, String>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because HadoopApplicationData interface expects it to be Properties. Probably a legacy of what mapreduce's getconf returns. Now, it will be a big change to change it. I am going to leave it as it is for now.

public long shuffleRead = 0L;
public long shuffleWrite = 0L;

public String toString() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using MoreObjects.toStringHelper from Guava.

}
if (id == -1) {
logger.error("Spark Job id [" + jobId + "] does not contain any stage.");
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider annotating the method as @Nullable, otherwise nothing suggests null is a valid result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for (int stageId : stageIds) {
id = Math.max(id, stageId);
}
if (id == -1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the empty-list case, right? Maybe write it explicitly instead of checking for init value.

Also, for nonempty lists, you could replace the loop with Ordering from Guava:

int id = Ordering.<Integer> natural().max(stageIds);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

public List<String> getFailedJobDescriptions() {
List<String> result = new ArrayList<String>();
for (int id : _failedJobs) {
result.add(getJobDescription(id));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could add null. Is this the desired behaviour?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


@Override
public int hashCode() {
return new Integer(stageId).hashCode() * 31 + new Integer(attemptId).hashCode();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Ints.hashCode from Guava instead of allocating a new Integer on each call, or just the int value as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}

private static String getListString(Collection collection) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using Collection<Integer> instead of a raw generic.

Also, I think collection.toString() would produce the same result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed to use Collection. I dont see collection.toString producing the same formatted output and hence keeping that part as is.

@shankar37
Copy link
Contributor Author

@shkhrgpt I had address all your comments and there were no outstanding comments. And there was no indication that you were still looking at this. We waited couple more days after akshay gave +1 to see if there are more comments before we decided to merge. Having said that, if you have more comments I will be happy to take a look and address them in a different PR. That' what I am planning to do with @superbobry's comments.

@shkhrgpt
Copy link
Contributor

@shankar37 Yeah, my apologies that I didn't submit any reviews for a couple of days. Honestly, I didn't expect that such a big PR will be merged so soon, given that some other recent PRs were open for a while. There was actually one comment from me which left unaddressed.

Anyways, I think the problem is that I am not familiar with the review process of Dr Elephant. In my experience, before merging a PR, it is common to ask initial reviewers whether or not they are satisfied with the current change, which didn't happen here. I think it would be a good idea if you and @akshayrai can provide more details about the review and merging policy of Dr Elephant. That will also be useful for other contributors and reviewers.

Meanwhile, I will add my comments here, and either you or someone else can address in a separate PR.

Thank you.

@shankar37
Copy link
Contributor Author

@shkhrgpt I missed the comment you mentioned, my apologies. Will address that along with others. The PR has been out for a while if you include the original PR#224. In fact, it was increasingly resulting in more and more conflicts. And I thought all comments are done.

The process we follow is to merge only after all outstanding comments are resolved to the commenter's satisfaction and there is at least one +1 on the overall review. We don't wait for +1 from every reviewer as that is impractical in my experience.

@shankar37 shankar37 deleted the FixRaysSparkFetcherPR branch April 27, 2017 09:37
skakker pushed a commit to skakker/dr-elephant that referenced this pull request Dec 14, 2017
…alone fetcher (linkedin#232)

Remove backup for Rest Fetcher and make Legacy FSFetcher as top level fetcher. Change the default fetcher in the config.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants