New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

LIHADOOP-23735: Added new heuristic DistributedCacheLimit heuristic. #187

Merged

akshayrai merged 1 commit into linkedin:master from rajagopr:master

Feb 6, 2017

Contributor

rajagopr commented Jan 15, 2017

Jobs which put large files(> 500MB) in the distributed cache are flagged.
Files as part of the following are considered.
mapreduce.job.cache.files
mapreduce.job.cache.archives

akshayrai reviewed

View reviewed changes

app-conf/HeuristicConf.xml Outdated

+                  <classname>com.linkedin.drelephant.mapreduce.heuristics.DistributedCacheLimitHeuristic</classname>
+                  <viewname>views.html.help.mapreduce.helpDistributedCacheLimit</viewname>
+                  <params>
+                    <distributed.cache.file.size.limit>500000000</distributed.cache.file.size.limit>

Contributor

akshayrai Jan 16, 2017

Can you suffix the property with bytes.
distributed.cache.file.size.limit.bytes

Contributor Author

rajagopr Jan 16, 2017

Done.

app/com/linkedin/drelephant/mapreduce/heuristics/DistributedCacheLimitHeuristic.java Outdated

+                private static final String MAPREDUCE_JOB_CACHE_ARCHIVES_FILESIZES = "mapreduce.job.cache.archives.filesizes";
+                private static final String MAPREDUCE_JOB_CACHE_FILES = "mapreduce.job.cache.files";
+                private static final String MAPREDUCE_JOB_CACHE_ARCHIVES = "mapreduce.job.cache.archives";
+                private static long distributedCacheFileSizeLimit = 500000000; // 500MB default

Contributor

akshayrai Jan 16, 2017

500 * FileUtils.ONE_MB would be cleaner and accurate.

Contributor Author

rajagopr Jan 16, 2017

Done.

app/com/linkedin/drelephant/mapreduce/heuristics/DistributedCacheLimitHeuristic.java Outdated

+                    if (checkFileSizeLimit(result, cacheFileToSizeMap)) {
+                      result.setSeverity(Severity.MODERATE);
+                      logger.warn("Cache file size was found to be exceeding the max limit of " + distributedCacheFileSizeLimit

Contributor

akshayrai Jan 16, 2017

I don't think we should be logging job specific messages.

In fact, any logging in heuristics class will simply increase our log file size without much help.

Contributor

shankar37 Jan 16, 2017

I am not sure why you want to avoid job specific messages. All messages will be in the context of analyzing some job and it's useful to have the job context for better debugging.
However, In this case, I think this message is redundant and has the same info that we can find out from DB. Hence, suggest removing it.
Also, app_id is a better identifier than job name.

Contributor Author

rajagopr Jan 16, 2017

Ok, removed the message.

app/com/linkedin/drelephant/mapreduce/heuristics/DistributedCacheLimitHeuristic.java Outdated

+                private boolean checkFileSizeLimit(HeuristicResult result, Map<String, String> cacheFileToSizeMap) {
+                  boolean limitViolated = false;
+                  for (String file : cacheFileToSizeMap.keySet()) {
+                    long size = Long.parseLong(cacheFileToSizeMap.get(file));

Contributor

akshayrai Jan 16, 2017

Can you handle the parse exception here.

Contributor Author

rajagopr Jan 16, 2017

Done.

app/com/linkedin/drelephant/mapreduce/heuristics/DistributedCacheLimitHeuristic.java Outdated

+);
+                    }
+                    String[] archiveCacheFilesArray = archiveCacheFiles.split(",");

Contributor

akshayrai Jan 16, 2017

I'd prefer using an ArrayList. (http://stackoverflow.com/questions/716597/array-or-list-in-java-which-is-faster)

List<String> archiveCacheFilesArray = new ArrayList<String>(Arrays.asList(archiveCacheFiles.split(",")));

Contributor Author

rajagopr Jan 16, 2017

Not very convinced about the use case here, there's a cost to convert it. However, I have changed per your suggestion.

shankar37 reviewed

View reviewed changes

app/com/linkedin/drelephant/mapreduce/heuristics/DistributedCacheLimitHeuristic.java Outdated

+                    if (checkFileSizeLimit(result, cacheFileToSizeMap)) {
+                      result.setSeverity(Severity.MODERATE);
+                      logger.warn("Cache file size was found to be exceeding the max limit of " + distributedCacheFileSizeLimit

Contributor

shankar37 Jan 16, 2017

I am not sure why you want to avoid job specific messages. All messages will be in the context of analyzing some job and it's useful to have the job context for better debugging.
However, In this case, I think this message is redundant and has the same info that we can find out from DB. Hence, suggest removing it.
Also, app_id is a better identifier than job name.

app/com/linkedin/drelephant/mapreduce/heuristics/DistributedCacheLimitHeuristic.java Outdated

+                    String[] cacheFileSizesArray = cacheFileSizes.split(",");
+                    if (cacheFilesArray.length != cacheFileSizesArray.length) {
+                      result.setSeverity(Severity.MODERATE);

Contributor

shankar37 Jan 16, 2017

Its either NONE or MODERATE ? I think exceeding 500 MB should be critical ? Also, we should be able to have intermediate limits for severe and moderate. If you are not sure what those limits should be, jack should be able to suggest some.

Contributor Author

rajagopr Jan 16, 2017

I will get this clarified whether to have intermediate limits or not.

app/com/linkedin/drelephant/mapreduce/heuristics/DistributedCacheLimitHeuristic.java Outdated

+                    String[] archiveCacheFileSizesArray = archiveCacheFileSizes.split(",");
+                    if (archiveCacheFilesArray.length != archiveCacheFileSizesArray.length) {
+                      result.setSeverity(Severity.MODERATE);

Contributor

shankar37 Jan 16, 2017

I dont think it should change our heuristic because of this inconsistency. We should just go by the sum of the sizes.

Contributor Author

rajagopr Jan 16, 2017

To be clarified.

app/com/linkedin/drelephant/mapreduce/heuristics/DistributedCacheLimitHeuristic.java Outdated

+                      result.setSeverity(Severity.MODERATE);
+                      logger
+                          .warn("Mismatch in the number of files and their corresponding sizes for " + MAPREDUCE_JOB_CACHE_ARCHIVES);
+                      result.addResultDetail(MAPREDUCE_JOB_CACHE_ARCHIVES, Integer.toString(archiveCacheFilesArray.length));

Contributor

shankar37 Jan 16, 2017

How big can this get ? I think we have a limit on this field of 255 chars. This can easily exceed that. You should cut it off at that to avoid db update errors.

Contributor Author

rajagopr Jan 16, 2017

Not sure what you mean. Just representing a number as string here.

When there's a mismatch in the number of files and their corresponding sizes, it will looks as follows.

mapreduce.job.cache.archives : 10
mapreduce.job.cache.archives.filesizes : 9

app/com/linkedin/drelephant/mapreduce/heuristics/DistributedCacheLimitHeuristic.java Outdated

+                      cacheFileToSizeMap.put(cacheFilesArray[i], cacheFileSizesArray[i]);
+                    }
+                    if (checkFileSizeLimit(result, cacheFileToSizeMap)) {

Contributor

shankar37 Jan 16, 2017

You are checking if files and archives are separately > 500 MB. I think the combined size should not exceed 500 MB is the intent. Can you get that clarified from jack ? Also, some comments what the difference between the two would be good.

Contributor Author

rajagopr Jan 16, 2017

ok, to be clarified.

app/com/linkedin/drelephant/mapreduce/heuristics/DistributedCacheLimitHeuristic.java Outdated

+                  boolean limitViolated = false;
+                  for (String file : cacheFileToSizeMap.keySet()) {
+                    long size = Long.parseLong(cacheFileToSizeMap.get(file));
+                    if (size > distributedCacheFileSizeLimit) {

Contributor

shankar37 Jan 16, 2017

You are check each jar size limit to be > 500MB. Shouldn't it be the sum of all sizes > 500 MB?

Contributor Author

rajagopr Jan 16, 2017

Again, this is to be clarified.

test/com/linkedin/drelephant/mapreduce/heuristics/DistributedCacheLimitHeuristicTest.java

+                @Test
+                public void testHeuristicResult() {
+                  jobConf.setProperty("mapreduce.job.cache.files.filesizes", "100,200,300");
+                  jobConf.setProperty("mapreduce.job.cache.archives.filesizes", "400,500,600");

Contributor

shankar37 Jan 16, 2017

Are these tests passing ? you are specifying 500 here but I dont see it get multiplied with FILEUTILS*100MB befor checking against the limit

Contributor Author

rajagopr Jan 16, 2017

Travis check would have failed if tests don't pass. There's no need to multiply with 100MB, as a file can be of any size. Here 500 means 500 bytes which is less than 500MB.


          Added new heuristic DistributedCacheLimit heuristic.

dd99246

Jobs which put large files(> 500MB) in the distributed cache are flagged.
Files as part of the following are considered.
  mapreduce.job.cache.files
  mapreduce.job.cache.archives

rajagopr force-pushed the master branch from b731d62 to dd99246 Compare

February 1, 2017 04:53

Contributor

akshayrai commented Feb 2, 2017

+1, LGTM except the things you wanted to get clarified from Shankar.

Contributor Author

rajagopr commented Feb 2, 2017 •

edited

Loading

@akshayrai : Everything stands clarified now, Jack did mean a single file size.

akshayrai merged commit 4df9ba9 into linkedin:master

skakker pushed a commit to skakker/dr-elephant that referenced this pull request


          Added new heuristic DistributedCacheLimit heuristic. (linkedin#187)

0fc5ac2

Jobs which put large files(> 500MB) in the distributed cache are flagged.
Files as part of the following are considered.
  mapreduce.job.cache.files
  mapreduce.job.cache.archives

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet