Skip to content

Table 3 & Figure 7 Comparing Performance and Cost of Dorylus Variants

Yifan Qiao edited this page May 10, 2021 · 6 revisions

Back to Artifact (Home)

Goal

The goal of these experiments is to compare the performance and cost of the different variants of Dorylus, specifically the serverless, CPU, and GPU backends. Table 3 contains all of these raw results of these two metrics. Figure 7 combines them to show the relation between them and how they translate into value which measures the performance per dollar of a certain backend and compares the performance per dollar of each variants normalized to that of the GPU-only version. Therefore, note that Figure 7 values can be calculated using Table 3.

Note from the paper that we chose to use the Dorylus async (s=0) variant for the majority of tests as it provided the best performance while still maintaining accuracy (except on reddit-large). We therefore only run the Dorylus version here that we reported in the paper.

Because cost is a major focus for this section we provide the cost of the different instances we used in our clusters (link to source). Use this info for determining the Cost column of Table 3.

  • c5.2xlarge: $0.34
  • c5n.2xlarge: $0.432
  • c5n.4xlarge: $0.864
  • p3.2xlarge: $3.06

NOTE: These experiments can be quite time consuming as each of them involves running a neural network to convergence. In addition, we have noticed that the script to calculate Lambda costs can take a very long time to run when running for many epochs as this creates a huge amount of logs to parse through to get the runtimes. We welcome you to run the full benchmark if time allows, however, we also suggest extrapolating about how much time it takes to run to convergence as a time saver seeing as the epoch times are highly consistent.

GCN

Reddit-small

To run this experiment, the command is similar to that which we ran before for Figure 5:

./benchmarks/run-reddit-gcn --st 0 --s 0 --e 100

The graphserver-out.txt file will have the average epoch time. We only report the average runtime for each run, so to approximate the total time we can take the average epoch time and multiply it by the number of epochs

Update 05/09/2021: Refer to "Understand the ouput" section for reproducing figures and tables.

To run the GPU version and CPU version we can repeat the same process but we pass the "mode" flag to let Dorylu know to run with a different backend. Note that flags "st" and "s" are not required here as they are related to asynchrony parameters for lambdas. At each step, record the run times and the costs to reproduce the entries in the table.

./benchmarks/run-reddit-gcn --e 100 -mode cpu
./benchmarks/run-reddit-gcn --e 100 -mode gpu

Amazon

The process for all the following graphs is similar. Run the corresponding benchmark script, for the different backends (commands below) and then record the runtime from each in graphserver-out.txt. Also run calculate-price.py to get the cost of training. Update 05/09/2021: Refer to "understand the ouput" section and Run parse-graphserver.py to get the "Time" and "Cost" columns.

To avoid redundancy, I will report only the benchmark commands that need to be run from this point on, however the process is the same as mentioned here. Don't hesitate to reach out if there are questions.

./benchmarks/run-amazon-gcn --e 100 --st 0 --s 0
./benchmarks/run-amazon-gcn --e 100 -mode cpu
./benchmarks/run-amazon-gcn --e 100 -mode gpu

Reddit-large

./benchmarks/run-reddit-large-gcn --e 100 --st 0 --s 0
./benchmarks/run-reddit-large-gcn --e 100 -mode cpu
./benchmarks/run-reddit-large-gcn --e 100 -mode gpu

Friendster

./benchmarks/run-friendster-gcn --e 100 --st 0 --s 0
./benchmarks/run-friendster-gcn --e 100 -mode cpu
./benchmarks/run-friendster-gcn --e 100 -mode gpu

GAT

For GAT there is no real difference in the process except that the benchmarks run are now running the GAT model. Therefore, the process is the same as above:

  1. Run the benchmark script
  2. Note down its runtime for the "Time" column of the table
  3. Run calculate-price.py to get the "Cost" column Update 05/09/2021: Refer to "understand the ouput" section and Run parse-graphserver.py to get the "Time" and "Cost" columns.

Reddit-small

./benchmarks/run-reddit-gat --e 100 --st 0 --s 0
./benchmarks/run-reddit-gat --e 100 -mode cpu
./benchmarks/run-reddit-gat --e 100 -mode gpu

Amazon

./benchmarks/run-amazon-gat --e 100 --st 0 --s 0
./benchmarks/run-amazon-gat --e 100 -mode cpu
./benchmarks/run-amazon-gat --e 100 -mode gpu

Understand the output

Reproduce Table 3

We need the graph server log files to reproduce Table 3. The graph server log graphserver-out.txt should be straightforward like:

...
<till the end of the file>
...
[ Node   0 ]  FINISHED epoch 100. Total finished 2
[ Node   0 ]  <EM>: Run start time: Sun May  9 20:02:02 2021
[ Node   0 ]  <EM>: Run end time: Sun May  9 20:14:33 2021
[ Node   0 ]  <EM>: Backend LAMBDA:gcn
[ Node   0 ]  <EM>: Dataset: /filepool/reddit/parts_2/ (602, 128, 41)
[ Node   0 ]  <EM>: staleness: 0
[ Node   0 ]  <EM>: 1 sync epochs and 99 async epochs
[ Node   0 ]  <EM>: Using 80 lambdas
[ Node   0 ]  <EM>: Initialization takes 4982.981 ms
[ Node   0 ]  Relaunched Lambda Cnt: 1
[ Node   0 ]  <EM>: Average  sync epoch time 8923.699 ms
[ Node   1 ]  <EM>: Average  sync epoch time 8923.588 ms
[ Node   1 ]  <EM>: Average async epoch time 7492.999 ms
[ Node   0 ]  <EM>: Average async epoch time 7495.777 ms

But we also provide a parse-graphserver.py script to parse the log file and get the total training time and the cost.

python parse-graphserver.py <path to graphserver-out file>

The example output of the script looks like:

1. Calculate server running time...
Training time in Table 3: 751.01s
Total summed graph server running time: 1501.74s
Look up the instance price on https://aws.amazon.com/ec2/pricing/on-demand/, and the server cost is the secondly price rate times summed running time.

2. Calculating total lambda running time and cost...
Lambda function name: gcn
Start time: Sun May 9 20:02:02 2021
End time: Sun May 9 20:14:33 2021
Lambda memory size: 192.0 MB
awslogs get /aws/lambda/gcn --filter-pattern="Billed Duration" --start='Sun May 9 22:51:17 2021' --end='Sun May 9 23:01:03 2021'
Total time of lambdas for this run (ms): 161884575
Price of this lambda (based on memory): 3.12500625e-09
Total cost of lambdas: $0.51

Here the "Training time in Table 3" corresponds to the "Time (s)" column in Table 3.

The "Cost ($)$ column in Table 3 consists of two parts: server cost and lambda cost (for lambda backend only):

  • For the server cost, we will need to look up the AWS instance price rate at https://aws.amazon.com/ec2/pricing/on-demand/, and multiply the "Total summed graph server running time" with the price. Note that what AWS provides is hourly rate and we will need to convert it to secondly rate.
  • For the lambda cost, the script will collect running times of all lambdas during the whole training procedure. The lambda cost is shown as "Total cost of lambdas" in the script output.

NOTE: Collecting running time of all lambda functions can take several minutes since we invoked thousands of lambdas. So it is normal if the script runs for minutes before printing out the lambda cost.

For CPU and GPU backend, no lambda is invoked and the total training cost is the server cost.

Reproduce Figure 7

To reproduce Figure 7 we can use the metrics from Table 3 to compute the value for each variant of Dorylus on each dataset. Value is computed as

V = 1 / (T x C)

where 'T' is the Time in seconds and 'C' is the cost.

In the figure, all of the values are reported relative to the baseline of the GPU, so the first step in creating the figure is to compute the value of the GPU variant for a given dataset.

Then compute the value of the CPU and serverless backends and compute the final normalized score, for example Value(λ) / Value(GPU) to get the values on the graph in Figure 7.