Skip to content

Commit

Permalink
Update all files for the migration of CloudSuite 3. (#319)
Browse files Browse the repository at this point in the history
  • Loading branch information
xusine committed Aug 1, 2021
1 parent 329cac5 commit eb8224b
Show file tree
Hide file tree
Showing 12 changed files with 147 additions and 140 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# CloudSuite 3.0 #

**This branch is an archive where all CloudSuite 3.0 benchmarks are stored. All prebuilt images are available at [cloudsuite3][old] at dockerhub. If you're searching for CloudSuite 4.0, please checkout [master][master] branch.**

[CloudSuite][csp] is a benchmark suite for cloud services. The third release consists of eight applications that have
been selected based on their popularity in today's datacenters. The benchmarks are based on real-world software
stacks and represent real-world setups.
Expand All @@ -26,3 +28,5 @@ We encourage CloudSuite users to use GitHub issues for requests for enhancements
[csl]: http://cloudsuite.ch/pages/license/ "CloudSuite License"
[csb]: http://cloudsuite.ch/#download "CloudSuite Benchmarks"
[pkb]: https://github.com/GoogleCloudPlatform/PerfKitBenchmarker "Google's PerfKit Benchmarker"
[old]: https://hub.docker.com/orgs/cloudsuite3/repositories "CloudSuite3 on Dockerhub"
[master]: https://github.com/parsa-epfl/cloudsuite "CloudSuite Master"
16 changes: 8 additions & 8 deletions docs/benchmarks/data-analytics.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ The benchmark consists of running a Naive Bayes classifier on a Wikimedia datase
To obtain the images:

```bash
$ docker pull cloudsuite/hadoop
$ docker pull cloudsuite/data-analytics
$ docker pull cloudsuite3/hadoop
$ docker pull cloudsuite3/data-analytics
```

## Running the benchmark ##
Expand All @@ -30,16 +30,16 @@ Start the master with:

```bash
$ docker run -d --net hadoop-net --name master --hostname master \
cloudsuite/data-analytics master
cloudsuite3/data-analytics master
```

Start a number of slaves with:

```bash
$ docker run -d --net hadoop-net --name slave01 --hostname slave01 \
cloudsuite/hadoop slave
cloudsuite3/hadoop slave
$ docker run -d --net hadoop-net --name slave02 --hostname slave02 \
cloudsuite/hadoop slave
cloudsuite3/hadoop slave
...
```

Expand All @@ -51,6 +51,6 @@ Run the benchmark with:
$ docker exec master benchmark
```

[dhrepo]: https://hub.docker.com/r/cloudsuite/data-analytics/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/data-analytics.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite/data-analytics.svg "Go to DockerHub Page"
[dhrepo]: https://hub.docker.com/r/cloudsuite3/data-analytics/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/data-analytics.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite3/data-analytics.svg "Go to DockerHub Page"
26 changes: 13 additions & 13 deletions docs/benchmarks/data-caching.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,32 +31,32 @@ We will attach the launched containers to this newly created docker network.
### Starting the Server ####
To start the server you have to first `pull` the server image and then run it. To `pull` the server image use the following command:

$ docker pull cloudsuite/data-caching:server
$ docker pull cloudsuite3/data-caching:server

It takes some time to download the image, but this is only required the first time.
The following command will start the server with four threads and 4096MB of dedicated memory, with a minimal object size of 550 bytes listening on port 11211 as default:

$ docker run --name dc-server --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550
$ docker run --name dc-server --net caching_network -d cloudsuite3/data-caching:server -t 4 -m 4096 -n 550

We assigned a name to this server to facilitate linking it with the client. We also used `--net` option to attach the container to our prepared network.
As mentioned before, you can have multiple instances of the Memcached server, just remember to give each of them a unique name. For example, the following commands create four Memcached server instances:

$ docker run --name dc-server1 --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550
$ docker run --name dc-server2 --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550
$ docker run --name dc-server3 --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550
$ docker run --name dc-server4 --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550
$ docker run --name dc-server1 --net caching_network -d cloudsuite3/data-caching:server -t 4 -m 4096 -n 550
$ docker run --name dc-server2 --net caching_network -d cloudsuite3/data-caching:server -t 4 -m 4096 -n 550
$ docker run --name dc-server3 --net caching_network -d cloudsuite3/data-caching:server -t 4 -m 4096 -n 550
$ docker run --name dc-server4 --net caching_network -d cloudsuite3/data-caching:server -t 4 -m 4096 -n 550

### Starting the Client ####

To start the client you have to first `pull` the client image and then run it. To `pull` the server image use the following command:

$ docker pull cloudsuite/data-caching:client
$ docker pull cloudsuite3/data-caching:client

It takes some time to download the image, but this is only required the first time.

To start the client container use the following command:

$ docker run -it --name dc-client --net caching_network cloudsuite/data-caching:client bash
$ docker run -it --name dc-client --net caching_network cloudsuite3/data-caching:client bash

This boots up the client container and you'll be logged in as the `memcache` user. Note that by using the `--net` option, you can easily make these containers visible to each other.

Expand Down Expand Up @@ -133,11 +133,11 @@ and the client on different sockets of the same machine

[memcachedWeb]: http://memcached.org/ "Memcached Website"

[serverdocker]: https://github.com/parsa-epfl/cloudsuite/blob/master/benchmarks/data-caching/server/Dockerfile "Server Dockerfile"
[serverdocker]: https://github.com/parsa-epfl/cloudsuite/blob/CSv3/benchmarks/data-caching/server/Dockerfile "Server Dockerfile"

[clientdocker]: https://github.com/parsa-epfl/cloudsuite/blob/master/benchmarks/data-caching/client/Dockerfile "Client Dockerfile"
[clientdocker]: https://github.com/parsa-epfl/cloudsuite/blob/CSv3/benchmarks/data-caching/client/Dockerfile "Client Dockerfile"

[repo]: https://github.com/parsa-epfl/cloudsuite "GitHub Repo"
[dhrepo]: https://hub.docker.com/r/cloudsuite/data-caching/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/data-caching.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite/data-caching.svg "Go to DockerHub Page"
[dhrepo]: https://hub.docker.com/r/cloudsuite3/data-caching/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/data-caching.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite3/data-caching.svg "Go to DockerHub Page"
16 changes: 8 additions & 8 deletions docs/benchmarks/data-serving.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,22 +21,22 @@ We will attach the launched containers to this newly created docker network.
Start the server container that will run cassandra server and installs a default keyspace usertable:

```bash
$ docker run --name cassandra-server --net serving_network cloudsuite/data-serving:server cassandra
$ docker run --name cassandra-server --net serving_network cloudsuite3/data-serving:server cassandra
```
### Multiple Server Containers

For a cluster setup with multiple servers, we need to instantiate a seed server:

```bash
$ docker run --name cassandra-server-seed --net serving_network cloudsuite/data-serving:server
$ docker run --name cassandra-server-seed --net serving_network cloudsuite3/data-serving:server
```

Then we prepare the server as previously.

The other server containers are instantiated as follows:

```bash
$ docker run --name cassandra-server(id) --net serving_network -e CASSANDRA_SEEDS=cassandra-server-seed cloudsuite/data-serving:server
$ docker run --name cassandra-server(id) --net serving_network -e CASSANDRA_SEEDS=cassandra-server-seed cloudsuite3/data-serving:server
```

You can find more details at the websites: http://wiki.apache.org/cassandra/GettingStarted and https://hub.docker.com/_/cassandra/.
Expand All @@ -46,7 +46,7 @@ After successfully creating the aforementioned schema, you are ready to benchmar
Start the client container specifying server name(s), or IP address(es), separated with commas, as the last command argument:

```bash
$ docker run --name cassandra-client --net serving_network cloudsuite/data-serving:client "cassandra-server-seed,cassandra-server1"
$ docker run --name cassandra-client --net serving_network cloudsuite3/data-serving:client "cassandra-server-seed,cassandra-server1"
```

More detailed instructions on generating the dataset can be found in Step 5 at [this](http://github.com/brianfrankcooper/YCSB/wiki/Running-a-Workload) link. Although Step 5 in the link describes the data loading procedure, other steps (e.g., 1, 2, 3, 4) are very useful to understand the YCSB settings.
Expand All @@ -71,9 +71,9 @@ Running the benchmark
---------------------
The benchmark is run automatically with the client container. One can modify the record count in the database and/or the number of operations performed by the benchmark specifying the corresponding variables when running the client container:
```bash
$ docker run -e RECORDCOUNT=<#> -e OPERATIONCOUNT=<#> --name cassandra-client --net serving_network cloudsuite/data-serving:client "cassandra-server-seed,cassandra-server1"
$ docker run -e RECORDCOUNT=<#> -e OPERATIONCOUNT=<#> --name cassandra-client --net serving_network cloudsuite3/data-serving:client "cassandra-server-seed,cassandra-server1"
```

[dhrepo]: https://hub.docker.com/r/cloudsuite/data-serving/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/data-serving.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite/data-serving.svg "Go to DockerHub Page"
[dhrepo]: https://hub.docker.com/r/cloudsuite3/data-serving/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/data-serving.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite3/data-serving.svg "Go to DockerHub Page"
30 changes: 15 additions & 15 deletions docs/benchmarks/graph-analytics.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,13 @@ The Graph Analytics benchmark relies the Spark framework to perform graph analyt

Current version of the benchmark is 3.0. To obtain the image:

$ docker pull cloudsuite/graph-analytics
$ docker pull cloudsuite3/graph-analytics

### Datasets

The benchmark uses a graph dataset generated from Twitter. To get the dataset image:

$ docker pull cloudsuite/twitter-dataset-graph
$ docker pull cloudsuite3/twitter-dataset-graph

More information about the dataset is available at
[cloudsuite/twitter-dataset-graph][ml-dhrepo].
Expand All @@ -30,8 +30,8 @@ spark-submit.

To run a benchmark with the Twitter dataset:

$ docker create --name data cloudsuite/twitter-dataset-graph
$ docker run --rm --volumes-from data cloudsuite/graph-analytics
$ docker create --name data cloudsuite3/twitter-dataset-graph
$ docker run --rm --volumes-from data cloudsuite3/graph-analytics

### Tweaking the Benchmark

Expand All @@ -41,7 +41,7 @@ has enough memory allocated to be able to execute the benchmark
in-memory, supply it with --driver-memory and --executor-memory
arguments:

$ docker run --rm --volumes-from data cloudsuite/graph-analytics \
$ docker run --rm --volumes-from data cloudsuite3/graph-analytics \
--driver-memory 1g --executor-memory 4g

### Multi-node deployment
Expand All @@ -54,30 +54,30 @@ with Docker look at [cloudsuite/spark][spark-dhrepo].
First, create a dataset image on every physical node where Spark
workers will be running.

$ docker create --name data cloudsuite/twitter-dataset-graph
$ docker create --name data cloudsuite3/twitter-dataset-graph

Start Spark master and Spark workers. They should all run within the
same Docker network, which we call spark-net here. The workers get
access to the datasets with --volumes-from data.

$ docker run -dP --net spark-net --hostname spark-master --name spark-master \
cloudsuite/spark master
cloudsuite3/spark master
$ docker run -dP --net spark-net --volumes-from data --name spark-worker-01 \
cloudsuite/spark worker spark://spark-master:7077
cloudsuite3/spark worker spark://spark-master:7077
$ docker run -dP --net spark-net --volumes-from data --name spark-worker-02 \
cloudsuite/spark worker spark://spark-master:7077
cloudsuite3/spark worker spark://spark-master:7077
$ ...

Finally, run the benchmark as the client to the Spark master:

$ docker run --rm --net spark-net --volumes-from data \
cloudsuite/graph-analytics \
cloudsuite3/graph-analytics \
--driver-memory 1g --executor-memory 4g \
--master spark://spark-master:7077

[dhrepo]: https://hub.docker.com/r/cloudsuite/graph-analytics/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/graph-analytics.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite/graph-analytics.svg "Go to DockerHub Page"
[ml-dhrepo]: https://hub.docker.com/r/cloudsuite/twitter-dataset-graph/
[spark-dhrepo]: https://hub.docker.com/r/cloudsuite/spark/
[dhrepo]: https://hub.docker.com/r/cloudsuite3/graph-analytics/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/graph-analytics.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite3/graph-analytics.svg "Go to DockerHub Page"
[ml-dhrepo]: https://hub.docker.com/r/cloudsuite3/twitter-dataset-graph/
[spark-dhrepo]: https://hub.docker.com/r/cloudsuite3/spark/

43 changes: 23 additions & 20 deletions docs/benchmarks/in-memory-analytics.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,17 +22,17 @@ squares (ALS) algorithm which is provided by Spark MLlib.

Current version of the benchmark is 3.0. To obtain the image:

$ docker pull cloudsuite/in-memory-analytics
$ docker pull cloudsuite3/in-memory-analytics

### Datasets

The benchmark uses user-movie ratings datasets provided by Movielens. To get
the dataset image:

$ docker pull cloudsuite/movielens-dataset
$ docker pull cloudsuite3/movielens-dataset

More information about the dataset is available at
[cloudsuite/movielens-dataset][ml-dhrepo].
[cloudsuite3/movielens-dataset][ml-dhrepo].

### Running the Benchmark

Expand All @@ -41,14 +41,14 @@ distributed with Spark. It takes two arguments: the dataset to use for
training, and the personal ratings file to give recommendations for. Any
remaining arguments are passed to spark-submit.

The cloudsuite/movielens-dataset image has two datasets (one small and one
The cloudsuite3/movielens-dataset image has two datasets (one small and one
large), and a sample personal ratings file.

To run a benchmark with the small dataset and the provided personal ratings
file:

$ docker create --name data cloudsuite/movielens-dataset
$ docker run --rm --volumes-from data cloudsuite/in-memory-analytics \
$ docker create --name data cloudsuite3/movielens-dataset
$ docker run --rm --volumes-from data cloudsuite3/in-memory-analytics \
/data/ml-latest-small /data/myratings.csv

### Tweaking the Benchmark
Expand All @@ -58,7 +58,7 @@ be used to tweak execution. For example, to ensure that Spark has enough memory
allocated to be able to execute the benchmark in-memory, supply it with
--driver-memory and --executor-memory arguments:

$ docker run --rm --volumes-from data cloudsuite/in-memory-analytics \
$ docker run --rm --volumes-from data cloudsuite3/in-memory-analytics \
/data/ml-latest /data/myratings.csv \
--driver-memory 2g --executor-memory 2g

Expand All @@ -67,32 +67,35 @@ allocated to be able to execute the benchmark in-memory, supply it with
This section explains how to run the benchmark using multiple Spark workers
(each running in a Docker container) that can be spread across multiple nodes
in a cluster. For more information on running Spark with Docker look at
[cloudsuite/spark][spark-dhrepo].
[cloudsuite3/spark][spark-dhrepo].

First, create a dataset image on every physical node where Spark workers will
be running.

$ docker create --name data cloudsuite/movielens-dataset
$ docker create --name data cloudsuite3/movielens-dataset

Then, create dedicated network for spark workers:

$ docker network create spark-net

Start Spark master and Spark workers. They should all run within the same
Docker network, which we call spark-net here. The workers get access to the
datasets with --volumes-from data.
Docker network, which we call spark-net here. The workers get access to the datasets with --volumes-from data.

$ docker run -dP --net spark-net --hostname spark-master --name spark-master cloudsuite/spark master
$ docker run -dP --net spark-net --volumes-from data --name spark-worker-01 cloudsuite/spark worker \
$ docker run -dP --net spark-net --hostname spark-master --name spark-master cloudsuite3/spark master
$ docker run -dP --net spark-net --volumes-from data --name spark-worker-01 cloudsuite3/spark worker \
spark://spark-master:7077
$ docker run -dP --net spark-net --volumes-from data --name spark-worker-02 cloudsuite/spark worker \
$ docker run -dP --net spark-net --volumes-from data --name spark-worker-02 cloudsuite3/spark worker \
spark://spark-master:7077
$ ...

Finally, run the benchmark as the client to the Spark master:

$ docker run --rm --net spark-net --volumes-from data cloudsuite/in-memory-analytics \
$ docker run --rm --net spark-net --volumes-from data cloudsuite3/in-memory-analytics \
/data/ml-latest-small /data/myratings.csv --master spark://spark-master:7077

[dhrepo]: https://hub.docker.com/r/cloudsuite/in-memory-analytics/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/in-memory-analytics.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite/in-memory-analytics.svg "Go to DockerHub Page"
[ml-dhrepo]: https://hub.docker.com/r/cloudsuite/movielens-dataset/
[spark-dhrepo]: https://hub.docker.com/r/cloudsuite/spark/
[dhrepo]: https://hub.docker.com/r/cloudsuite3/in-memory-analytics/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/in-memory-analytics.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite3/in-memory-analytics.svg "Go to DockerHub Page"
[ml-dhrepo]: https://hub.docker.com/r/cloudsuite3/movielens-dataset/
[spark-dhrepo]: https://hub.docker.com/r/cloudsuite3/spark/

Loading

0 comments on commit eb8224b

Please sign in to comment.