Skip to content
phrend edited this page Sep 28, 2015 · 15 revisions

Description

LogWatcher is a python daemon that gathers metrics from the access logs of web applications (apache and tomcat have been tested), and sends the metrics downstream to either a Graphite Server or Ganglia gmond listener in near-realtime. Metrics are named with the prefix "LW_" for easy identification. An instance of LogWatcher is required for each access log that is to be watched. The log file can be either statically named or pre-rotated with timestamp in the filename. Metrics are typically collected/averaged by minute, but this is configurable.

Requirements

  • Python 2.6+ (with time,os,re,sys,atexit,ConfigParser,getopt,string)
  • Ganglia or Graphite

Deployment

LogWatcher can be deployed as a python package from ____ or you can build and deploy your own python package from the GitHub source. A sample init(start/stop script) and ini(configuration) file are also provided as these are not part of the available python package.

There is also a sample Spec file that can be used to build and deploy LogWatcher as an RPM.

Metric Destinations

LogWather can currently send metrics either directly to a Graphite servers or to Ganglia.

Ganglia

Ganglia is the default destination for metrics. Logwatcher will expect and use the /etc/gmond.conf file on your system.

Graphite

To send to graphite you need to use the following runtime options. Using these will disable the default behavior of sending metrics to Ganglia.

  -g --graphite-server <s> Use graphite, with server <s>
  -G --use-graphite        Use graphite, find server in /etc/graphite.conf

Log Formatting

LogWatcher can be used to generate metrics from anything that can be found in the access logs using regular expressions. Some basic log format suggestions for Tomcat and Apache are as follows.

Basic Tomcat Log Format

<Valve className="org.apache.catalina.valves.AccessLogValve"
    directory="/var/log/access"
    prefix="access"
    resolveHosts="false"
    checkExists="true"
    rotatable="false"
    pattern="%a %v %u %t &quot;%r&quot; %s %b &quot;%{Referer}i&quot; &quot;%{User-agent}i&quot; &quot;%{REQUEST_DETAILS}r&quot; %D"
/>

Basic Apache Log Format

LogFormat "%h %{Host}i %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D" custom_fmt

Additional custom metrics can be added and found via regex in the config. The following is a recommended format for these metrics:

[key=value]

Example Tomcat Log Line with Additional Custom Metrics:

1.23.45.678 logwather.com  - [26/Sep/2015:15:59:16 -0700] "GET /profile HTTP/1.1" 200 2989 "http://referrer.com/restaurants" "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko" " [wsTime=7] [isCust=0] [ver=2] [showAds=true] [daoTimelisting=2.678338] [daoTimecgmdblisting=2.678338]  [oTime=0] [pTime=1]  [daoTimecontent=2.511219] [daoTimecgmdbcontent=2.511219] [clientIp=12.3.4.5.6]" 8

Supported Metric Types

There are basically three primary types of metrics supported, plus another derived from the first two and special-use priming metrics. Most use a regexp which finds a value.

counts (metrics_count)

The value saved as $1 in the regex will be counted, and a separate metric created for each $1 found (as well as a _NotSet metric that counts lines not matching your regex) Note that metric names are dynamically generated from the values found. The metrics are persisted for the run-time of the LW instance, typically months.

sums (metrics_sum)

The values saved as $1 since the last notify event (based on the notify_schedule) are added together and saved as a single metric.

ratios (metrics_ratio)

These are derived from either counts or sums. The ratio is the value of the original metric divided by the Queries metric (unfiltered requests per minute) Used for alerting, since ratios don't vary much with traffic changes during the day.

calculated (metrics_calc)

These ratios are derived from counts and/or sums using user-defined expressions Can be used to configure ratio-style metrics on specific segments of traffic, instead of all requests

distribution (metrics_dist)

Each of these is a collection of counts showing the distribution of values over N buckets of size M. Used primarily to provide data for processing time histograms (typically 11 buckets of 100ms each, the last bucket counting any value over 1000ms)

Automatic/Default Metrics

LW_<distinguisher>_Total_Processing_Time

Reported Units: seconds Units: seconds Description: The sum of the processing time value from every log line, in ms, since the last notify event (based on the notify_schedule). Requires the following parameters be set: processing_time_regex processing_time_units (see the Config Options table below for suggested settings)

LW_<distinguisher>_Avg_Processing_Time

Reported Units: seconds Units: seconds Description: LW_<distinguisher>Total_Processing_Time / LW<distinguisher>_Queries

LW_<distinguisher>_Max_Processing_Time

Reported Units: seconds Units: seconds Description: The maximum value matching processing_time_regex since the last notify event (based on the notify_schedule).

LW_<distinguisher>_exceeding_SLA

Reported Units: percent Units: percent Description: The percent of not-ignored log lines (see ignore_pattern) processed since the last notify event (based on the notify_schedule), who's processing time exceeds sla_ms.

LW_<distinguisher>_exceeding_SLA_ct

Reported Units: percent Units: decimal Description: The count of not-ignored log lines (see ignore_pattern) processed since the last notify event (based on the notify_schedule), who's processing time exceeds sla_ms.

LW_<distinguisher>_Queries

Reported Units: count Units: decimal Description: The total number of not-ignored log lines (see ignore_pattern) processed since the last notify event (based on the notify_schedule).

LW_<distinguisher>_QPS

Reported Units: qps Units: decimal Description: The (average?) QPS, not including ignored log lines (see ignore_pattern) since the last notify event (based on the notify_schedule).

LW_LW_Version

Reported Units: string Units: string Description: The version of LogWatcher.

LW_<distinguisher>_ignored

Reported Units: count Units: decimal Description: A count of the number of log lines ignored (matching ignore_pattern) since the last notify event (based on the notify_schedule).

LW_<distinguisher>QPS

Reported Units: qps Units: decimal Description: The (average?) QPS for log lines matching brand_regex, not including ignored log lines (see ignore_pattern) since the last notify event (based on the notify_schedule).

LW_<distinguisher>_QPS_NULL_brand

Reported Units: qps Units: decimal Description: The (average?) QPS for log lines that do not match brand_regex, not including ignored log lines (see ignore_pattern) since the last notify event (based on the notify_schedule).

LW_LW_LogTime

Reported Units: seconds Units: decimal Description:

LW_LW_NewMetrics

Reported Units: float Units: decimal Description: Count of new metrics that were not sent on the last cycle, or what? ...it does seem to exclude some, or all, of the built-in metrics.

LW_LW_TotalMetrics

Reported Units: float Units: decimal Description: A count of the number of metrics LogWatcher is sending, not counting this metric.

LW_LW_NotifyTime

Reported Units: seconds Units: decimal Description:

Clone this wiki locally