-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
126 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
--- | ||
title: "bebida-optimization-result-analysis" | ||
author: "Michael Mercier" | ||
date: '`r Sys.Date()`' | ||
output: html_document | ||
--- | ||
|
||
```{r setup, include=FALSE} | ||
knitr::opts_chunk$set(echo = TRUE) | ||
install.packages('rjson') | ||
``` | ||
|
||
## Experiment Design | ||
|
||
Running on Gros cluster on Nancy on 16 nodes (288 cores) with the following heuristics: | ||
- NoHPC: In order to have a control experiment, run the Bebida app without HPC workload | ||
- None: Raw bebida implementation without optimization | ||
- Punch: Create jobs at submission time with arbitrary resource requests | ||
- Annotated: Create jobs on submission using resource annotations to shape the job | ||
- TimeCritical: Use resource quota to dedicate a dynamic set of resources to Bebida application | ||
|
||
The Deadline heuristic is similar to Punch Annotated but a deadline provided. It's goal is to run before a given deadline. Because it does not follow the same objective it is not compared to the others. | ||
|
||
## Execution time | ||
|
||
Initialize Libraries | ||
```{r} | ||
library(tidyverse) | ||
library(viridisLite) | ||
library(rjson) | ||
``` | ||
|
||
Get data from the experiment result directory: | ||
```{r bdaDf hpcDf metadata} | ||
resultDir = "/home/mmercier/Projects/bebida-optimization-service/result-all-2/" | ||
expeDirs = Sys.glob(sprintf("%s/*", resultDir)) | ||
expeDirs | ||
``` | ||
|
||
Extract BDA applications metrics: | ||
```{r} | ||
bdaDf <- data.frame(heuristic = character(), execution_time = integer()) | ||
for (expeDir in expeDirs) { | ||
execTime = array() | ||
metadata <- fromJSON(file=sprintf("%s/metadata.json", expeDir)) | ||
for (iteration in 1:metadata$nb_app_run) { | ||
podState <- fromJSON(file=sprintf("%s/spark-app-pi-%d-pod.json", expeDir, iteration)) | ||
startTime = parse_datetime(podState$status$containerStatuses[[1]]$state$terminated$startedAt) | ||
endTime = parse_datetime(podState$status$containerStatuses[[1]]$state$terminated$finishedAt) | ||
bdaDf <- bdaDf %>% add_row(heuristic = metadata$heuristic, execution_time = as.numeric(difftime(endTime,startTime),units="secs")) | ||
} | ||
} | ||
bdaDf | ||
``` | ||
|
||
Extract HPC Jobs metrics. | ||
|
||
WARNING: Some job give a submission time after the starting time. which gives a negative waiting time. These jobs are excluded. | ||
```{r} | ||
hpcDf <- data.frame(heuristic = character(), job_id = character(), job_name = character(), waiting_time = integer()) | ||
for (expeDir in expeDirs) { | ||
hpcJobs <- fromJSON(file=sprintf("%s/oar-jobs.json", expeDir)) | ||
metadata <- fromJSON(file=sprintf("%s/metadata.json", expeDir)) | ||
for (job in hpcJobs) { | ||
submitTime = as.POSIXct(job$submission_time, origin="1970-01-01") | ||
startTime = as.POSIXct(job$start_time, origin="1970-01-01") | ||
job_id = as.character(job$id) | ||
waiting_time = as.numeric(difftime(startTime,submitTime),units="secs") | ||
if (waiting_time > 0 && ! grepl("BEBIDA_NOOP", job$name, fixed = TRUE)) { | ||
hpcDf <- hpcDf %>% add_row( | ||
heuristic=metadata$heuristic, job_id=job_id, job_name=job$name, waiting_time=waiting_time) | ||
} | ||
} | ||
} | ||
hpcDf | ||
``` | ||
|
||
## Plots of BDA app | ||
|
||
```{r} | ||
bdaDf %>% | ||
ggplot() + | ||
geom_boxplot(aes(x=heuristic, y=execution_time, fill=heuristic, group=heuristic, ymin=0)) | ||
``` | ||
|
||
Transform raw data to stats | ||
```{r} | ||
bdaDfStats <- bdaDf %>% | ||
group_by(heuristic) %>% | ||
summarise( | ||
n=n(), | ||
mean=mean(execution_time), | ||
sd=sd(execution_time) | ||
) %>% | ||
mutate( se=sd/sqrt(n)) %>% | ||
mutate( ic=se * qt((1-0.05)/2 + .5, n-1)) | ||
bdaDfStats | ||
``` | ||
|
||
|
||
```{r} | ||
bdaDfStats %>% | ||
ggplot() + | ||
geom_bar( aes(x=heuristic, y=mean, fill=heuristic), stat="identity", alpha=0.5) + | ||
geom_errorbar(aes(x=heuristic, ymin=mean-sd, ymax=mean+sd), width=0.4, colour="orange", alpha=0.9, size=1.3) + | ||
ggtitle("Mean execution time in seconds (with standard deviation)") | ||
``` | ||
## Plot for HPC | ||
|
||
```{r} | ||
hpcDf %>% | ||
ggplot() + | ||
geom_boxplot(aes(x=heuristic, y=waiting_time, fill=heuristic, group=heuristic, ymin=0)) | ||
``` |