-
Notifications
You must be signed in to change notification settings - Fork 2
Home
LogWatcher is a python daemon that gathers metrics from the access logs of web applications (apache and tomcat have been tested), and sends the metrics downstream to either a Graphite Server or Ganglia gmond listener in near-realtime. Metrics are named with the prefix "LW_" for easy identification. An instance of LogWatcher is required for each access log that is to be watched. The log file can be either statically named or pre-rotated with timestamp in the filename. Metrics are typically collected/averaged by minute, but this is configurable.
- Python 2.6+ (with time,os,re,sys,atexit,ConfigParser,getopt,string)
- Ganglia or Graphite
LogWatcher can be deployed as a python package from ____ or you can build and deploy your own python package from the GitHub source. A sample init(start/stop script) and ini(configuration) file are also provided as these are not part of the available python package.
There is also a sample Spec file that can be used to build and deploy LogWatcher as an RPM.
LogWather can currently send metrics either directly to a Graphite servers or to Ganglia.
Ganglia is the default destination for metrics. Logwatcher will expect and use the /etc/gmond.conf file on your system.
To send to graphite you need to use the following runtime options. Using these will disable the default behavior of sending metrics to Ganglia.
-g --graphite-server <s> Use graphite, with server <s>
-G --use-graphite Use graphite, find server in /etc/graphite.conf
LogWatcher can be used to generate metrics from anything that can be found in the access logs using regular expressions. Some basic log format suggestions for Tomcat and Apache are as follows.
<Valve className="org.apache.catalina.valves.AccessLogValve"
directory="/var/log/access"
prefix="access"
resolveHosts="false"
checkExists="true"
rotatable="false"
pattern="%a %v %u %t "%r" %s %b "%{Referer}i" "%{User-agent}i" "%{REQUEST_DETAILS}r" %D"
/>
LogFormat "%h %{Host}i %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D" custom_fmt
Additional custom metrics can be added and found via regex in the config. The following is a recommended format for these metrics:
[key=value]
Example Tomcat Log Line with Additional Custom Metrics:
1.23.45.678 logwather.com - [26/Sep/2015:15:59:16 -0700] "GET /profile HTTP/1.1" 200 2989 "http://referrer.com/restaurants" "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko" " [wsTime=7] [isCust=0] [ver=2] [showAds=true] [daoTimelisting=2.678338] [daoTimecgmdblisting=2.678338] [oTime=0] [pTime=1] [daoTimecontent=2.511219] [daoTimecgmdbcontent=2.511219] [clientIp=12.3.4.5.6]" 8
There are basically three primary types of metrics supported, plus another derived from the first two and special-use priming metrics. Most use a regexp which finds a value.
The value saved as $1 in the regex will be counted, and a separate metric created for each $1 found (as well as a _NotSet metric that counts lines not matching your regex) Note that metric names are dynamically generated from the values found. The metrics are persisted for the run-time of the LW instance, typically months.
The values saved as $1 since the last notify event (based on the notify_schedule) are added together and saved as a single metric.
These are derived from either counts or sums. The ratio is the value of the original metric divided by the Queries metric (unfiltered requests per minute) Used for alerting, since ratios don't vary much with traffic changes during the day.
These ratios are derived from counts and/or sums using user-defined expressions Can be used to configure ratio-style metrics on specific segments of traffic, instead of all requests
Each of these is a collection of counts showing the distribution of values over N buckets of size M. Used primarily to provide data for processing time histograms (typically 11 buckets of 100ms each, the last bucket counting any value over 1000ms)
Reported Units: seconds Units: seconds Description: The sum of the processing time value from every log line, in ms, since the last notify event (based on the notify_schedule). Requires the following parameters be set: processing_time_regex processing_time_units (see the Config Options table below for suggested settings)
Reported Units: seconds Units: seconds Description: LW_<distinguisher>Total_Processing_Time / LW<distinguisher>_Queries
Reported Units: seconds Units: seconds Description: The maximum value matching processing_time_regex since the last notify event (based on the notify_schedule).
Reported Units: percent Units: percent Description: The percent of not-ignored log lines (see ignore_pattern) processed since the last notify event (based on the notify_schedule), who's processing time exceeds sla_ms.
Reported Units: percent Units: decimal Description: The count of not-ignored log lines (see ignore_pattern) processed since the last notify event (based on the notify_schedule), who's processing time exceeds sla_ms.
Reported Units: count Units: decimal Description: The total number of not-ignored log lines (see ignore_pattern) processed since the last notify event (based on the notify_schedule).
Reported Units: qps Units: decimal Description: The (average?) QPS, not including ignored log lines (see ignore_pattern) since the last notify event (based on the notify_schedule).
Reported Units: string Units: string Description: The version of LogWatcher.
Reported Units: count Units: decimal Description: A count of the number of log lines ignored (matching ignore_pattern) since the last notify event (based on the notify_schedule).
Reported Units: qps Units: decimal Description: The (average?) QPS for log lines matching brand_regex, not including ignored log lines (see ignore_pattern) since the last notify event (based on the notify_schedule).
Reported Units: qps Units: decimal Description: The (average?) QPS for log lines that do not match brand_regex, not including ignored log lines (see ignore_pattern) since the last notify event (based on the notify_schedule).
Reported Units: seconds Units: decimal Description:
Reported Units: float Units: decimal Description: Count of new metrics that were not sent on the last cycle, or what? ...it does seem to exclude some, or all, of the built-in metrics.
Reported Units: float Units: decimal Description: A count of the number of metrics LogWatcher is sending, not counting this metric.
Reported Units: seconds Units: decimal Description: