Update all files for the migration of CloudSuite 3. (#319)

parsa-epfl · Aug 1, 2021 · eb8224b · eb8224b
1 parent 329cac5
commit eb8224b
Show file tree

Hide file tree

Showing 12 changed files with 147 additions and 140 deletions.
diff --git a/README.md b/README.md
@@ -1,5 +1,7 @@
 # CloudSuite 3.0 #
 
+**This branch is an archive where all CloudSuite 3.0 benchmarks are stored. All prebuilt images are available at [cloudsuite3][old] at dockerhub. If you're searching for CloudSuite 4.0, please checkout [master][master] branch.**
+
 [CloudSuite][csp] is a benchmark suite for cloud services. The third release consists of eight applications that have 
 been selected based on their popularity in today's datacenters. The benchmarks are based on real-world software 
 stacks and represent real-world setups.
@@ -26,3 +28,5 @@ We encourage CloudSuite users to use GitHub issues for requests for enhancements
 [csl]: http://cloudsuite.ch/pages/license/ "CloudSuite License"
 [csb]: http://cloudsuite.ch/#download "CloudSuite Benchmarks"
 [pkb]: https://github.com/GoogleCloudPlatform/PerfKitBenchmarker "Google's PerfKit Benchmarker"
+[old]: https://hub.docker.com/orgs/cloudsuite3/repositories "CloudSuite3 on Dockerhub"
+[master]: https://github.com/parsa-epfl/cloudsuite "CloudSuite Master"
diff --git a/docs/benchmarks/data-analytics.md b/docs/benchmarks/data-analytics.md
@@ -12,8 +12,8 @@ The benchmark consists of running a Naive Bayes classifier on a Wikimedia datase
 To obtain the images:
 
 ```bash
-$ docker pull cloudsuite/hadoop
-$ docker pull cloudsuite/data-analytics
+$ docker pull cloudsuite3/hadoop
+$ docker pull cloudsuite3/data-analytics
 ```
 
 ## Running the benchmark ##
@@ -30,16 +30,16 @@ Start the master with:
 
 ```bash
 $ docker run -d --net hadoop-net --name master --hostname master \
-             cloudsuite/data-analytics master
+             cloudsuite3/data-analytics master
 ```
 
 Start a number of slaves with:
 
 ```bash
 $ docker run -d --net hadoop-net --name slave01 --hostname slave01 \
-             cloudsuite/hadoop slave
+             cloudsuite3/hadoop slave
 $ docker run -d --net hadoop-net --name slave02 --hostname slave02 \
-             cloudsuite/hadoop slave
+             cloudsuite3/hadoop slave
 ...
 ```
 
@@ -51,6 +51,6 @@ Run the benchmark with:
 $ docker exec master benchmark
 ```
 
-[dhrepo]: https://hub.docker.com/r/cloudsuite/data-analytics/ "DockerHub Page"
-[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/data-analytics.svg "Go to DockerHub Page"
-[dhstars]: https://img.shields.io/docker/stars/cloudsuite/data-analytics.svg "Go to DockerHub Page"
+[dhrepo]: https://hub.docker.com/r/cloudsuite3/data-analytics/ "DockerHub Page"
+[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/data-analytics.svg "Go to DockerHub Page"
+[dhstars]: https://img.shields.io/docker/stars/cloudsuite3/data-analytics.svg "Go to DockerHub Page"
diff --git a/docs/benchmarks/data-caching.md b/docs/benchmarks/data-caching.md
@@ -31,32 +31,32 @@ We will attach the launched containers to this newly created docker network.
 ### Starting the Server ####
 To start the server you have to first `pull` the server image and then run it. To `pull` the server image use the following command:
 
-    $ docker pull cloudsuite/data-caching:server
+    $ docker pull cloudsuite3/data-caching:server
 
 It takes some time to download the image, but this is only required the first time.
 The following command will start the server with four threads and 4096MB of dedicated memory, with a minimal object size of 550 bytes listening on port 11211 as default:
 
-    $ docker run --name dc-server --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550
+    $ docker run --name dc-server --net caching_network -d cloudsuite3/data-caching:server -t 4 -m 4096 -n 550
 
 We assigned a name to this server to facilitate linking it with the client. We also used `--net` option to attach the container to our prepared network.
 As mentioned before, you can have multiple instances of the Memcached server, just remember to give each of them a unique name. For example, the following commands create four Memcached server instances:
 
-    $ docker run --name dc-server1 --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550
-    $ docker run --name dc-server2 --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550
-    $ docker run --name dc-server3 --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550
-    $ docker run --name dc-server4 --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550
+    $ docker run --name dc-server1 --net caching_network -d cloudsuite3/data-caching:server -t 4 -m 4096 -n 550
+    $ docker run --name dc-server2 --net caching_network -d cloudsuite3/data-caching:server -t 4 -m 4096 -n 550
+    $ docker run --name dc-server3 --net caching_network -d cloudsuite3/data-caching:server -t 4 -m 4096 -n 550
+    $ docker run --name dc-server4 --net caching_network -d cloudsuite3/data-caching:server -t 4 -m 4096 -n 550
 
 ### Starting the Client ####
 
 To start the client you have to first `pull` the client image and then run it. To `pull` the server image use the following command:
 
-    $ docker pull cloudsuite/data-caching:client
+    $ docker pull cloudsuite3/data-caching:client
 
 It takes some time to download the image, but this is only required the first time.
 
 To start the client container use the following command:
 
-    $ docker run -it --name dc-client --net caching_network cloudsuite/data-caching:client bash
+    $ docker run -it --name dc-client --net caching_network cloudsuite3/data-caching:client bash
 
 This boots up the client container and you'll be logged in as the `memcache` user. Note that by using the `--net` option, you can easily make these containers visible to each other.
 
@@ -133,11 +133,11 @@ and the client on different sockets of the same machine
 
   [memcachedWeb]: http://memcached.org/ "Memcached Website"
 
-  [serverdocker]: https://github.com/parsa-epfl/cloudsuite/blob/master/benchmarks/data-caching/server/Dockerfile "Server Dockerfile"
+  [serverdocker]: https://github.com/parsa-epfl/cloudsuite/blob/CSv3/benchmarks/data-caching/server/Dockerfile "Server Dockerfile"
 
-  [clientdocker]: https://github.com/parsa-epfl/cloudsuite/blob/master/benchmarks/data-caching/client/Dockerfile "Client Dockerfile"
+  [clientdocker]: https://github.com/parsa-epfl/cloudsuite/blob/CSv3/benchmarks/data-caching/client/Dockerfile "Client Dockerfile"
 
   [repo]: https://github.com/parsa-epfl/cloudsuite "GitHub Repo"
-  [dhrepo]: https://hub.docker.com/r/cloudsuite/data-caching/ "DockerHub Page"
-  [dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/data-caching.svg "Go to DockerHub Page"
-  [dhstars]: https://img.shields.io/docker/stars/cloudsuite/data-caching.svg "Go to DockerHub Page"
+  [dhrepo]: https://hub.docker.com/r/cloudsuite3/data-caching/ "DockerHub Page"
+  [dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/data-caching.svg "Go to DockerHub Page"
+  [dhstars]: https://img.shields.io/docker/stars/cloudsuite3/data-caching.svg "Go to DockerHub Page"
diff --git a/docs/benchmarks/data-serving.md b/docs/benchmarks/data-serving.md
@@ -21,22 +21,22 @@ We will attach the launched containers to this newly created docker network.
 Start the server container that will run cassandra server and installs a default keyspace usertable:
 
 ```bash
-$ docker run --name cassandra-server --net serving_network cloudsuite/data-serving:server cassandra
+$ docker run --name cassandra-server --net serving_network cloudsuite3/data-serving:server cassandra
 ```
 ### Multiple Server Containers
 
 For a cluster setup with multiple servers, we need to instantiate a seed server:
 
 ```bash
-$ docker run --name cassandra-server-seed --net serving_network cloudsuite/data-serving:server
+$ docker run --name cassandra-server-seed --net serving_network cloudsuite3/data-serving:server
 ```
 
 Then we prepare the server as previously.
 
 The other server containers are instantiated as follows:
 
 ```bash
-$ docker run --name cassandra-server(id) --net serving_network -e CASSANDRA_SEEDS=cassandra-server-seed cloudsuite/data-serving:server
+$ docker run --name cassandra-server(id) --net serving_network -e CASSANDRA_SEEDS=cassandra-server-seed cloudsuite3/data-serving:server
 ```
 
 You can find more details at the websites: http://wiki.apache.org/cassandra/GettingStarted and https://hub.docker.com/_/cassandra/.
@@ -46,7 +46,7 @@ After successfully creating the aforementioned schema, you are ready to benchmar
 Start the client container specifying server name(s), or IP address(es), separated with commas, as the last command argument:
 
 ```bash
-$ docker run --name cassandra-client --net serving_network cloudsuite/data-serving:client "cassandra-server-seed,cassandra-server1"
+$ docker run --name cassandra-client --net serving_network cloudsuite3/data-serving:client "cassandra-server-seed,cassandra-server1"
 ```
 
 More detailed instructions on generating the dataset can be found in Step 5 at [this](http://github.com/brianfrankcooper/YCSB/wiki/Running-a-Workload) link. Although Step 5 in the link describes the data loading procedure, other steps (e.g., 1, 2, 3, 4) are very useful to understand the YCSB settings.
@@ -71,9 +71,9 @@ Running the benchmark
 ---------------------
 The benchmark is run automatically with the client container. One can modify the record count in the database and/or the number of operations performed by the benchmark specifying the corresponding variables when running the client container:
 ```bash
-$ docker run -e RECORDCOUNT=<#> -e OPERATIONCOUNT=<#> --name cassandra-client --net serving_network cloudsuite/data-serving:client "cassandra-server-seed,cassandra-server1"
+$ docker run -e RECORDCOUNT=<#> -e OPERATIONCOUNT=<#> --name cassandra-client --net serving_network cloudsuite3/data-serving:client "cassandra-server-seed,cassandra-server1"
 ```
 
-[dhrepo]: https://hub.docker.com/r/cloudsuite/data-serving/ "DockerHub Page"
-[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/data-serving.svg "Go to DockerHub Page"
-[dhstars]: https://img.shields.io/docker/stars/cloudsuite/data-serving.svg "Go to DockerHub Page"
+[dhrepo]: https://hub.docker.com/r/cloudsuite3/data-serving/ "DockerHub Page"
+[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/data-serving.svg "Go to DockerHub Page"
+[dhstars]: https://img.shields.io/docker/stars/cloudsuite3/data-serving.svg "Go to DockerHub Page"
diff --git a/docs/benchmarks/graph-analytics.md b/docs/benchmarks/graph-analytics.md
@@ -11,13 +11,13 @@ The Graph Analytics benchmark relies the Spark framework to perform graph analyt
 
 Current version of the benchmark is 3.0. To obtain the image:
 
-    $ docker pull cloudsuite/graph-analytics
+    $ docker pull cloudsuite3/graph-analytics
 
 ### Datasets
 
 The benchmark uses a graph dataset generated from Twitter. To get the dataset image:
 
-    $ docker pull cloudsuite/twitter-dataset-graph
+    $ docker pull cloudsuite3/twitter-dataset-graph
 
 More information about the dataset is available at
 [cloudsuite/twitter-dataset-graph][ml-dhrepo].
@@ -30,8 +30,8 @@ spark-submit.
 
 To run a benchmark with the Twitter dataset:
 
-    $ docker create --name data cloudsuite/twitter-dataset-graph
-    $ docker run --rm --volumes-from data cloudsuite/graph-analytics
+    $ docker create --name data cloudsuite3/twitter-dataset-graph
+    $ docker run --rm --volumes-from data cloudsuite3/graph-analytics
 
 ### Tweaking the Benchmark
 
@@ -41,7 +41,7 @@ has enough memory allocated to be able to execute the benchmark
 in-memory, supply it with --driver-memory and --executor-memory
 arguments:
 
-    $ docker run --rm --volumes-from data cloudsuite/graph-analytics \
+    $ docker run --rm --volumes-from data cloudsuite3/graph-analytics \
                  --driver-memory 1g --executor-memory 4g
 
 ### Multi-node deployment
@@ -54,30 +54,30 @@ with Docker look at [cloudsuite/spark][spark-dhrepo].
 First, create a dataset image on every physical node where Spark
 workers will be running.
 
-    $ docker create --name data cloudsuite/twitter-dataset-graph
+    $ docker create --name data cloudsuite3/twitter-dataset-graph
 
 Start Spark master and Spark workers. They should all run within the
 same Docker network, which we call spark-net here. The workers get
 access to the datasets with --volumes-from data.
 
     $ docker run -dP --net spark-net --hostname spark-master --name spark-master \
-                 cloudsuite/spark master
+                 cloudsuite3/spark master
     $ docker run -dP --net spark-net --volumes-from data --name spark-worker-01 \
-                 cloudsuite/spark worker spark://spark-master:7077
+                 cloudsuite3/spark worker spark://spark-master:7077
     $ docker run -dP --net spark-net --volumes-from data --name spark-worker-02 \
-                 cloudsuite/spark worker spark://spark-master:7077
+                 cloudsuite3/spark worker spark://spark-master:7077
     $ ...
 
 Finally, run the benchmark as the client to the Spark master:
 
     $ docker run --rm --net spark-net --volumes-from data \
-                 cloudsuite/graph-analytics \
+                 cloudsuite3/graph-analytics \
                  --driver-memory 1g --executor-memory 4g \
                  --master spark://spark-master:7077
 
-[dhrepo]: https://hub.docker.com/r/cloudsuite/graph-analytics/ "DockerHub Page"
-[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/graph-analytics.svg "Go to DockerHub Page"
-[dhstars]: https://img.shields.io/docker/stars/cloudsuite/graph-analytics.svg "Go to DockerHub Page"
-[ml-dhrepo]: https://hub.docker.com/r/cloudsuite/twitter-dataset-graph/
-[spark-dhrepo]: https://hub.docker.com/r/cloudsuite/spark/
+[dhrepo]: https://hub.docker.com/r/cloudsuite3/graph-analytics/ "DockerHub Page"
+[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/graph-analytics.svg "Go to DockerHub Page"
+[dhstars]: https://img.shields.io/docker/stars/cloudsuite3/graph-analytics.svg "Go to DockerHub Page"
+[ml-dhrepo]: https://hub.docker.com/r/cloudsuite3/twitter-dataset-graph/
+[spark-dhrepo]: https://hub.docker.com/r/cloudsuite3/spark/
 
diff --git a/docs/benchmarks/in-memory-analytics.md b/docs/benchmarks/in-memory-analytics.md
@@ -22,17 +22,17 @@ squares (ALS) algorithm which is provided by Spark MLlib.
 
 Current version of the benchmark is 3.0. To obtain the image:
 
-    $ docker pull cloudsuite/in-memory-analytics
+    $ docker pull cloudsuite3/in-memory-analytics
 
 ### Datasets
 
 The benchmark uses user-movie ratings datasets provided by Movielens. To get
 the dataset image:
 
-    $ docker pull cloudsuite/movielens-dataset
+    $ docker pull cloudsuite3/movielens-dataset
 
 More information about the dataset is available at
-[cloudsuite/movielens-dataset][ml-dhrepo].
+[cloudsuite3/movielens-dataset][ml-dhrepo].
 
 ### Running the Benchmark
 
@@ -41,14 +41,14 @@ distributed with Spark. It takes two arguments: the dataset to use for
 training, and the personal ratings file to give recommendations for. Any
 remaining arguments are passed to spark-submit.
 
-The cloudsuite/movielens-dataset image has two datasets (one small and one
+The cloudsuite3/movielens-dataset image has two datasets (one small and one
 large), and a sample personal ratings file.
 
 To run a benchmark with the small dataset and the provided personal ratings
 file:
 
-    $ docker create --name data cloudsuite/movielens-dataset
-    $ docker run --rm --volumes-from data cloudsuite/in-memory-analytics \
+    $ docker create --name data cloudsuite3/movielens-dataset
+    $ docker run --rm --volumes-from data cloudsuite3/in-memory-analytics \
         /data/ml-latest-small /data/myratings.csv
 
 ### Tweaking the Benchmark
@@ -58,7 +58,7 @@ be used to tweak execution. For example, to ensure that Spark has enough memory
 allocated to be able to execute the benchmark in-memory, supply it with
 --driver-memory and --executor-memory arguments:
 
-    $ docker run --rm --volumes-from data cloudsuite/in-memory-analytics \
+    $ docker run --rm --volumes-from data cloudsuite3/in-memory-analytics \
         /data/ml-latest /data/myratings.csv \
         --driver-memory 2g --executor-memory 2g
 
@@ -67,32 +67,35 @@ allocated to be able to execute the benchmark in-memory, supply it with
 This section explains how to run the benchmark using multiple Spark workers
 (each running in a Docker container) that can be spread across multiple nodes
 in a cluster. For more information on running Spark with Docker look at
-[cloudsuite/spark][spark-dhrepo].
+[cloudsuite3/spark][spark-dhrepo].
 
 First, create a dataset image on every physical node where Spark workers will
 be running.
 
-    $ docker create --name data cloudsuite/movielens-dataset
+    $ docker create --name data cloudsuite3/movielens-dataset
+
+Then, create dedicated network for spark workers:
+
+    $ docker network create spark-net
 
 Start Spark master and Spark workers. They should all run within the same
-Docker network, which we call spark-net here. The workers get access to the
-datasets with --volumes-from data.
+Docker network, which we call spark-net here. The workers get access to the datasets with --volumes-from data.
 
-    $ docker run -dP --net spark-net --hostname spark-master --name spark-master cloudsuite/spark master
-    $ docker run -dP --net spark-net --volumes-from data --name spark-worker-01 cloudsuite/spark worker \
+    $ docker run -dP --net spark-net --hostname spark-master --name spark-master cloudsuite3/spark master
+    $ docker run -dP --net spark-net --volumes-from data --name spark-worker-01 cloudsuite3/spark worker \
         spark://spark-master:7077
-    $ docker run -dP --net spark-net --volumes-from data --name spark-worker-02 cloudsuite/spark worker \
+    $ docker run -dP --net spark-net --volumes-from data --name spark-worker-02 cloudsuite3/spark worker \
         spark://spark-master:7077
     $ ...
 
 Finally, run the benchmark as the client to the Spark master:
 
-    $ docker run --rm --net spark-net --volumes-from data cloudsuite/in-memory-analytics \
+    $ docker run --rm --net spark-net --volumes-from data cloudsuite3/in-memory-analytics \
         /data/ml-latest-small /data/myratings.csv --master spark://spark-master:7077
 
-[dhrepo]: https://hub.docker.com/r/cloudsuite/in-memory-analytics/ "DockerHub Page"
-[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/in-memory-analytics.svg "Go to DockerHub Page"
-[dhstars]: https://img.shields.io/docker/stars/cloudsuite/in-memory-analytics.svg "Go to DockerHub Page"
-[ml-dhrepo]: https://hub.docker.com/r/cloudsuite/movielens-dataset/ 
-[spark-dhrepo]: https://hub.docker.com/r/cloudsuite/spark/
+[dhrepo]: https://hub.docker.com/r/cloudsuite3/in-memory-analytics/ "DockerHub Page"
+[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/in-memory-analytics.svg "Go to DockerHub Page"
+[dhstars]: https://img.shields.io/docker/stars/cloudsuite3/in-memory-analytics.svg "Go to DockerHub Page"
+[ml-dhrepo]: https://hub.docker.com/r/cloudsuite3/movielens-dataset/ 
+[spark-dhrepo]: https://hub.docker.com/r/cloudsuite3/spark/