-
Notifications
You must be signed in to change notification settings - Fork 85
Pdsh
This proposes to compare ClusterShell and the famous pdsh
which clush aims to replace and provide more extended features.
First of all, ClusterShell was developed to be easily used by people previously using pdsh
. As a consequence, the command line tools like clush and clubak supports very similar behaviour and options.
-
clush
standard command line is the same:
$ pdsh -w foo[1-5] echo "Hello World"
is
$ clush -w foo[1-5] echo "Hello World"
- host selection options are supported (
-w -x -g -X
) - ssh related options are supported (
-f -t -u -l
) - File copies are supported. Equivalent to
pdcp
andrpdcp
are available throughclush
options
And other ones. All simple pdsh
command could be adapted simply changing the command name to clush
.
clubak
is a replacement tool fordshbak
, which is commonly used with pdsh
to regroup similar outputs.
clubak
feature is directly available in clush
. You do not have to call another external tool.
If you need it anyway:
clubak
and
clubak -c
are supported.
Pdsh offers possibilities to add plugins to connect nodes or select them. Those plugins should dynamic libraries using pdsh C interface. ClusterShell provides 3 ways to extend its features which can be simply shell commands or Python extensions.
-
NodeGroups provides an easy way to plug
clush
to any external node database. - Softwate used to connect to other nodes could be easily done implementing a new Python class.
Most of pdsh
plugin feature could be available with clush
.
But ClusterShell does not aim to reimplement pdsh
in Python. There is much more features!
ClusterShell introduces the nodeset command and its backend which ables to easily manipulates ranges of nodes.
$ nodeset -c nova[0-7,32-159]
136
$ nodeset -f nova[0-7,32-159] nova[160-163]
nova[0-7,32-163]
$ nodeset -f @oss,@mds
node[2-9]
All details are available in the nodeset, NodeSet and NodeGroups wiki pages.
For some reasons its common to cancel of pdsh
execution because a node is hang. If you are also using dshbak
, due to the pipe, all nodes output will be lost.
$ pdsh -w foo[1-5] ls /remote/nfs/ | dshbak -c
Now hit Ctrl-C. No output will be printed, even if all nodes have successfully run the command.
- Output is not lost even if you hit Ctrl+C
$ clush -b -w foo[1-5] uname -r
Warning: Caught keyboard interrupt!
---------------
foo[2-4] (3)
---------------
2.6.31.6-145.fc11
---------------
foo5
---------------
2.6.18-164.11.1.el5
Keyboard interrupt (foo1 did not complete).
ClusterShell improves administrator experience with several new features like:
- Automatic same output merging
- Stdout and stderr handling
- Nodeset size, colors, ...
$ clush '-o -X' -w foo[1-5] xterm
- Diff
/etc/motd
content with the same file on a group of nodes
$ cat /etc/motd | clush -b -w foo[1-5] diff - /etc/motd
- Binary content is supported:
$ tar -Cf - /tmp | clush -w foo[1-5] tar xfv -
ClusterShell was first intended to be an event-based, distributed, command execution library, in Python. All command line tool features are accessible through the Python API to offer possibilities to easily write sequential or event-based program.
Some of the possibilities are presented in the following topics:
Some could say that as ClusterShell is a Python library, it should be slow.
Here is a short benchmark comparing a clush
command and pdsh
command and compute the time they needed to run a simple command on a lot of nodes.
As you can see, ClusterShell outperforms pdsh
mostly all the time. As soon as more than 100 nodes are involved, ClusterShell is faster and scales better. The more nodes you add the larger the difference is.
There is a very little overhead due to Python interpretor that become insignificant when you are running real commands. Moreover Python language helps a lot in doing easy developing of ClusterShell where raw C could be really a pain.