Skip to content
devinkramer edited this page Sep 29, 2015 · 15 revisions

Description

LogWatcher is a python daemon that gathers metrics from the access logs of web applications (apache and tomcat have been tested), and sends the metrics downstream to either a Graphite Server or Ganglia gmond listener in near-realtime. Metrics are named with the prefix "LW_" for easy identification. An instance of LogWatcher is required for each access log that is to be watched. The log file can be either statically named or pre-rotated with timestamp in the filename. Metrics are typically collected/averaged by minute, but this is configurable.

Requirements

  • Python 2.6+ (with time,os,re,sys,atexit,ConfigParser,getopt,string)
  • Ganglia or Graphite

Deployment

LogWatcher can be deployed as a python package from ____ or you can build and deploy your own python package from the GitHub source. A sample init (start/stop script) and ini (configuration) file are also provided as these are not part of the available python package.

There is also a sample Spec file that can be used to build and deploy LogWatcher as an RPM.

Configuration

Go here for details about the LogWatcher configuration file.

Metric Destinations

LogWather can currently send metrics either directly to a Graphite servers or to Ganglia.

Ganglia

Ganglia is the default destination for metrics. Logwatcher will expect and use the /etc/gmond.conf file on your system.

Graphite

To send to graphite you need to use the following runtime options. Using these will disable the default behavior of sending metrics to Ganglia.

  -g --graphite-server <s> Use graphite, with server <s>
  -G --use-graphite        Use graphite, find server in /etc/graphite.conf

Log Formatting

LogWatcher can be used to generate metrics from anything that can be found in the access logs using regular expressions. Some basic log format suggestions for Tomcat and Apache are as follows.

Basic Tomcat Log Format

<Valve className="org.apache.catalina.valves.AccessLogValve"
    directory="/var/log/access"
    prefix="access"
    resolveHosts="false"
    checkExists="true"
    rotatable="false"
    pattern="%a %v %u %t &quot;%r&quot; %s %b &quot;%{Referer}i&quot; &quot;%{User-agent}i&quot; &quot;%{REQUEST_DETAILS}r&quot; %D"
/>

Basic Apache Log Format

LogFormat "%h %{Host}i %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D" custom_fmt

Additional custom metrics can be added and found via regex in the config. The following is a recommended format for these metrics:

[key=value]

Example Tomcat Log Line with Additional Custom Metrics:

1.23.45.678 logwather.com  - [26/Sep/2015:15:59:16 -0700] "GET /profile HTTP/1.1" 200 2989 "http://referrer.com/restaurants" "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko" " [wsTime=7] [isCust=0] [ver=2] [showAds=true] [daoTimelisting=2.678338] [daoTimecgmdblisting=2.678338]  [oTime=0] [pTime=1]  [daoTimecontent=2.511219] [daoTimecgmdbcontent=2.511219] [clientIp=12.3.4.5.6]" 8

Supported Metric Types

There are basically three primary types of metrics supported, plus another derived from the first two and special-use priming metrics. Most use a regexp which finds a value.

counts (metrics_count)

The value saved as $1 in the regex will be counted, and a separate metric created for each $1 found (as well as a _NotSet metric that counts lines not matching your regex) Note that metric names are dynamically generated from the values found. The metrics are persisted for the run-time of the LW instance, typically months.

sums (metrics_sum)

The values saved as $1 since the last notify event (based on the notify_schedule) are added together and saved as a single metric.

ratios (metrics_ratio)

These are derived from either counts or sums. The ratio is the value of the original metric divided by the Queries metric (unfiltered requests per minute) Used for alerting, since ratios don't vary much with traffic changes during the day.

calculated (metrics_calc)

These ratios are derived from counts and/or sums using user-defined expressions Can be used to configure ratio-style metrics on specific segments of traffic, instead of all requests

distribution (metrics_dist)

Each of these is a collection of counts showing the distribution of values over N buckets of size M. Used primarily to provide data for processing time histograms (typically 11 buckets of 100ms each, the last bucket counting any value over 1000ms)

Automatic/Default Metrics

Metric Name Reported Units Units Description
LW_<distinguisher>_Total_Processing_Time seconds seconds The sum of the processing time value from every log line, in ms, since the last notify event (based on the notify_schedule). Requires the following parameters be set: processing_time_regex processing_time_units (see the Config Options table below for suggested settings)
LW_<distinguisher>_Avg_Processing_Time seconds seconds LW_<distinguisher>Total_Processing_Time / LW<distinguisher>_Queries
LW__Max_Processing_Time seconds seconds The maximum value matching processing_time_regex since the last notify event (based on the notify_schedule option).
LW_<distinguisher>_exceeding_SLA percent percent The percent of not-ignored log lines (see ignore_pattern option) processed since the last notify event (based on the notify_schedule option), who's processing time exceeds sla_ms value.
LW_<distinguisher>_exceeding_SLA_ct percent decimal The count of not-ignored log lines (see ignore_pattern) processed since the last notify event (based on the notify_schedule), who's processing time exceeds sla_ms.
LW_<distinguisher>_Queries count decimal The total number of not-ignored log lines (see ignore_pattern option) processed since the last notify event (based on the notify_schedule).
LW_<distinguisher>_QPS qps decimal The (average?) QPS, not including ignored log lines (see ignore_pattern) since the last notify event (based on the notify_schedule).
LW_LW_Version string string The version of LogWatcher.
LW__ignored count decimal A count of the number of log lines ignored (matching ignore_pattern) since the last notify event (based on the notify_schedule).
LW_<distinguisher>QPS qps decimal The (average?) QPS for log lines matching brand_regex, not including ignored log lines (see ignore_pattern) since the last notify event (based on the notify_schedule).
LW_<distinguisher>_QPS_NULL_brand qps decimal The (average?) QPS for log lines that do not match brand_regex, not including ignored log lines (see ignore_pattern) since the last notify event (based on the notify_schedule).
LW_LW_LogTime seconds decimal
LW_LW_NewMetrics float decimal Count of new metrics that were not sent on the last cycle, or what? ...it does seem to exclude some, or all, of the built-in metrics.
LW_LW_TotalMetrics float decimal A count of the number of metrics LogWatcher is sending, not counting this metric.
LW_LW_NotifyTime seconds decimal

Plugins

LogWatcher supports very simple LinePlugins. The plugins can modify the log lines, compute complex metrics, or even send some or all of the lines to a separate log file or other system (kafka). Note that lines excluded by the exclude filter are not sent to plugins. Plugin Details are available here.

Clone this wiki locally