Skip to content

Measure, view and share data-rich, interactive timeseries charts of system performance while running a single- or multi-node linux workload.

License

Notifications You must be signed in to change notification settings

jschaub30/viz_workload

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

viz_workload

Measure, view and share data-rich, interactive timeseries charts of system performance metrics while running single- or multi-node linux workloads.

This tool:

  1. gathers system information from each node
  2. starts up system monitors (cpu, mem, etc.)
  3. executes a given workload
  4. stops the system monitors
  5. gathers data and creates interactive web pages displaying system performance while the workload was running.

Steps 2-4 above can also be repeated while sweeping a workload parameter, and the pages created in step 5 will include the parameter sweep.

Here's a blog post describing a mixed workload that featured a high IO and CPU.

Here's an example page created by this tool. This workload featured:

  • 2 nodes, each with 4 threads and 14-16GB RAM
  • high CPU workload
  • sweeping a parameter (number of threads 1/2/4)
  • CPU heatmap measurement enabled

This tool is useful for identifying performance bottlenecks for, say, setting the sort benchmark record.

Verson 1.1.1

Setup password-less SSH

These scripts use ssh to start/stop monitors and run the workload, even when only running on 1 host. To avoid having to type a password each time, setup password-less ssh. Here are simple instructions for 1 host:

ssh-keygen -t rsa  # Press enter at the prompts (no passphrase!)
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

To copy the public key to another server, use the following command:

ssh-copy-id <user_name>@<server_name>

Now verify that you don't need to type a password:

ssh localhost
ssh $(hostname)

Try it out

If you are running a multi-node/cluster workload, follow the installation instructions below on every host in the cluster.

Install prerequisites

Ubuntu / Debian

sudo apt-get install -y time python3  # required
sudo apt-get install -y hwloc         # optional--uses the `lstopo` tool in system summaries

Install dool

The viz_workload tool was originally developed by gathering data via the dstat utility, which stopped development due to a conflict with Redhat.

The current project is has been renamed to dool.

Installing dool

git clone git@github.com:scottchiefbaker/dool.git --branch v1.3.1 /tmp/doolinstall

sudo /tmp/doolinstall/install.py  # to install as root
/tmp/doolinstall/install.py       # install as user--make sure the `dool` script is in your PATH variable

Install viz_workload

git clone https://github.com/jschaub30/viz_workload
cd viz_monitor/scripts
cp example.sh your_workload.sh
## Now edit your_workload.sh
./your_workload.sh
./webserver.sh  # To view/share this measurement

Available measurement groups

Each measurement group enables collection and display of 1 or more charts

Group name Description
sys-summary enabled by default. CPU, memory, IO and network
cpu-heatmap heatmap of CPU usage on each thread
interrupts heatmap of interrupts on each CPU
gpu for systems with Nvidia GPU's and CUDA installed

More details described here.

Example scripts

Optional. Setup your webserver

To permanently share all measurements, enable a web server.

Ubuntu

sudo apt-get install apache2
cd /var/www/html
sudo ln -sf [full path to viz_workload/scripts/rundir directory]

CentOS

sudo yum install httpd
sudo systemctl start httpd
cd /var/www/html
sudo ln -sf [full path to viz_workload/scripts/rundir directory]

About

Measure, view and share data-rich, interactive timeseries charts of system performance while running a single- or multi-node linux workload.

Resources

License

Stars

Watchers

Forks

Packages

No packages published