-
Notifications
You must be signed in to change notification settings - Fork 23
Installation
Installation has been simplified in LMT version 3.
- Install cerebro and lmt-server-agent on Lustre servers (OSS, MDS)
- Install cerebro and lmt-server on management node
- Restart cerebrod on Lustre servers.
- Run ltop (included in lmt-server) on management node.
- Install mysql-server package
- Run mysql_secure_installation or equiv, then msyql -p /usr/share/lmt/mkusers.sql as root, after customizing for your site.
- Set up /etc/lmt/lmt.conf
- Create databases for each file system to be monitored with lmtinit -a fsname.
- Restart cerebrod.
The lmt-server-agent package contains Cerebro plugins that periodically read Lustre /proc values, convert them to strings, and push the strings into the Cerebro monitoring network. Install this package on Lustre servers, including all MDS, OSS, and optionally LNET routers, and test that LMT is correctly parsing /proc. Pick an OSS and run this on it (output edited for readability):
# /usr/sbin/lmtmetric -m ost
ost: 2;tycho1;0.137269;9.561285;
lc1-OST0000;112367742;113304682;449470968;1929120176;6344724;4294968016;0;137;0;0;0;694;245;COMPLETE 138/138 0s remaining;
lc1-OST0008;114812669;115748383;459250676;1929120176;6344724;4294968016;0;137;0;0;0;694;262;COMPLETE 138/138 0s remaining;
lc1-OST0010;113690420;114627066;454761680;1929120176;6344724;4294968016;0;137;0;0;0;694;240;COMPLETE 138/138 0s remaining;
Run this on your MDS (output edited for readability):
# /usr/sbin/lmtmetric -m osc
osc: 1;tycho-mds1;lc1-OST0000;F;lc1-OST0001;F;lc1-OST0002;F;lc1-OST0003;F;lc1-OST0004;F;lc1-OST0005;F;lc1-OST0006;F;...
# /usr/sbin/lmtmetric -m mdt
mdt: 1;tycho-mds1;0.015762;1.269578;
lc1-MDT0000;413833009;465522098;1655332036;1688473892;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;2;0;0...
If there are any errors, open a bug in the LMT issue tracker.
Now restart Cerebro on your Lustre servers:
# /sbin/service cerebrod restart
Verify that cerebrod is still running on your Lustre servers. Since LMT plugins are shared libraries running in cerebrod's address space, a segfault for example could crash cerebrod. If this occurs, open a bug in the LMT issue tracker.
LMT data should now be present on the Cerebro monitoring network. Install the lmt-server package on your LMT server node and verify that you can see live data with ltop:
$ /usr/bin/ltop
Filesystem: lc1
Inodes: 443.956m total, 49.295m used ( 11%), 394.662m free
Space: 172.188t total, 129.573t used ( 75%), 42.615t free
Bytes/s: 0.000g read, 0.000g write, 0 IOPS
MDops/s: 0 open, 0 close, 0 getattr, 0 setattr
0 link, 0 unlink, 0 mkdir, 0 rmdir
0 statfs, 0 rename, 0 getxattr
OST S OSS Exp CR rMB/s wMB/s IOPS LOCKS LGR LCR %cpu %mem %spc
0000 F tycho1 137 0 0 0 0 0 0 0 1 10 77
0001 F tycho2 137 0 0 0 0 0 0 0 0 9 76
0002 F tycho3 137 0 0 0 0 0 0 0 0 9 76
0003 F tycho4 137 0 0 0 0 0 0 0 0 10 76
0004 F tycho5 137 0 0 0 0 0 0 0 1 9 76
0005 F tycho6 137 0 0 0 0 0 0 0 0 10 76
0006 F tycho7 137 0 0 0 0 0 0 0 0 9 76
0007 F tycho8 137 0 0 0 0 0 0 0 0 9 76
0008 F tycho1 137 0 0 0 0 0 0 0 1 10 76
0009 F tycho2 137 0 0 0 0 0 0 0 0 9 76
000a F tycho3 137 0 0 0 0 0 0 0 0 9 76
000b F tycho4 137 0 0 0 0 0 0 0 0 10 77
000c F tycho5 137 0 0 0 0 0 0 0 1 9 75
000d F tycho6 137 0 0 0 0 0 0 0 0 10 76
000e F tycho7 137 0 0 0 0 0 0 0 0 9 76
000f F tycho8 137 0 0 0 0 0 0 0 0 9 74
If you don't, you may need to debug your Cerebro configuration. A few things to check are
- Does cerebro-stat -l show _lmt_ prefixed metrics?
- Does cerebro-stat -m metricname show live data?
- Are you speaking/listening on the correct interfaces?
- Can the network you are using pass multicast traffic?
Visit the Cerebro sourceforge project site for further cerebro info.
Install and configure the MySQL/MariaDB server on your LMT server node. Users starting from scratch with MySQL will want to do something like this:
# /sbin/service mysqld start
# msyql_secure_installation
LMT uses C libraries provided by MySQL and MariaDB to connect to the running database. To verify if the required libraries are installed, you can use the following command:
# ls /usr/lib64/*{mariadb,mysql}*/plugin /usr/lib64/*{mariadb,mysql}*.so* /usr/include/mysql/mysql.h
If the libraries are installed, you should see something similar to:
/usr/lib64/mariadb/plugin/caching_sha2_password.so
/usr/lib64/libmariadb.so.3
/usr/include/mysql/mysql.h
If not, you must install them. Depending on your distro, they may be provided by packages called
*MariaDB-shared* and *MariaDB-devel*.
or
*mysql-connector-c-shared* and *mysql-connector-c-devel*
or
*mariadb-connector-c-shared* and *mariadb-connector-c-devel*
The database will need two LMT users, one for read-write access to the database, and one for read-only access. Use /usr/share/lmt/mkusers.sql as a template, running as a privileged mysql user:
# Example script for creating LMT MySQL users.
CREATE USER 'lwatchclient'@'localhost';
GRANT SHOW DATABASES ON *.* TO 'lwatchclient'@'localhost';
GRANT SELECT ON *.* TO 'lwatchclient'@'localhost';
CREATE USER 'lwatchadmin'@'localhost' IDENTIFIED BY 'mypass';
GRANT SHOW DATABASES ON *.* TO 'lwatchadmin'@'localhost';
GRANT SELECT,INSERT,DELETE ON *.* TO 'lwatchadmin'@'localhost';
GRANT CREATE,DROP ON *.* TO 'lwatchadmin'@'localhost';
FLUSH PRIVILEGES;
Next, configure /etc/lmt/lmt.conf with the usernames and passwords you just added to the database:
lmt_db_host = nil
lmt_db_port = 0
lmt_db_rouser = "lwatchclient"
lmt_db_ropasswd = nil
lmt_db_rwuser = "lwatchadmin"
lmt_db_rwpasswd = "mypass"
Alternatively you can put the password (all by itself) in /etc/lmt/rwpasswd, make that file readable only by root, and use the following line in place of the lmt_db_rwuser line in your lmt.conf:
f = io.open("/etc/lmt/rwpasswd")
if (f) then
lmt_db_rwpasswd = f:read("*l")
f:close()
else
lmt_db_rwpasswd = nil
end
This restricts write access to the database to only the root user. Next, add a database for each file system you wish to monitor. For example if your file system is named test:
# /usr/sbin/lmtinit -a test
lmtinit will use the read-write account from lmt.conf when adding or deleting databases. You can list the file systems that have databases with:
# /usr/sbin/lmtinit -l
lmtinit will use the read-only account from lmt.conf when listing databases.
Restart Cerebro on the lmt-server node so the LMT Cerebro plugin will pick up the new database info:
# /sbin/service cerebrod restart
To see if data is being added to your database, run the lmtsh utility a couple of times and watch the tables that end in _DATA. The row count should be increasing:
# /usr/sbin/lmtsh -f test
test> t
Available tables for test:
Table Name Row Count
EVENT_DATA 0
EVENT_INFO 0
FILESYSTEM_AGGREGATE_DAY 225
FILESYSTEM_AGGREGATE_HOUR 4977
FILESYSTEM_AGGREGATE_MONTH 18
FILESYSTEM_AGGREGATE_WEEK 45
FILESYSTEM_AGGREGATE_YEAR 9
FILESYSTEM_INFO 1
MDS_AGGREGATE_DAY 125
MDS_AGGREGATE_HOUR 2770
MDS_AGGREGATE_MONTH 10
MDS_AGGREGATE_WEEK 25
MDS_AGGREGATE_YEAR 5
MDS_DATA 398042
MDS_INFO 1
MDS_OPS_DATA 31270065
MDS_VARIABLE_INFO 7
OPERATION_INFO 81
OSS_DATA 1589675
OSS_INFO 4
OSS_INTERFACE_DATA 0
OSS_INTERFACE_INFO 0
OSS_VARIABLE_INFO 7
OST_AGGREGATE_DAY 10800
OST_AGGREGATE_HOUR 238896
OST_AGGREGATE_MONTH 864
OST_AGGREGATE_WEEK 2160
OST_AGGREGATE_YEAR 432
OST_DATA 19074487
OST_INFO 48
OST_OPS_DATA 0
OST_VARIABLE_INFO 11
ROUTER_AGGREGATE_DAY 3
ROUTER_AGGREGATE_HOUR 3
ROUTER_AGGREGATE_MONTH 3
ROUTER_AGGREGATE_WEEK 3
ROUTER_AGGREGATE_YEAR 3
ROUTER_DATA 428097
ROUTER_INFO 1
ROUTER_VARIABLE_INFO 3
TIMESTAMP_INFO 470209
VERSION 0
test>