Skip to content

Configuration guide

Źmicier Žaleźničenka edited this page May 8, 2014 · 8 revisions

Elfstatsd is meant to be configured with its settings.py file. This is a plain Python file that allows for easy and flexible customization of all the settings available in elfstatsd.

###Basic configuration

Three the most important options that have to be configured by the elfstatsd's user at first are ELF_FORMAT, DATA_FILES and VALID_REQUESTS.

ELF_FORMAT is the format string that is used in the HTTP server to format the access log records. Just replace the default option with the format string of your web server. Make sure to surround it with r''.

It is important to notice that elfstatsd expects to see %D parameter in the log format string. This parameter tells Apache to record requests' execution time and is used to calculate most of the elfstatsd metrics. Elfstatsd will fail to launch if not finding this parameter in the log format string.

DATA_FILES is a list of 3-tuples describing the location of input and output files used for processing. You can add as many comma-separated entries there as you want.

The first element of this tuple is a full path to the access log file. It may contain date and time placeholders in Python datetime format, if needed.

The second element is used if the access logs are rotated in-place. In this case, specify a location of the previous log file in this element. For example, you have an in-place rotating log file /var/log/httpd/apache.log. After this file reaches its maximal length, it is copied to /var/log/httpd/apache.log.1 and is emptied. In this situation, you have to specify /var/log/httpd/apache.log.1 as the second element of a tuple. This is needed to correctly read the last records in this file that were added after the previous invocation of the daemon but before the rotation.

The third element is the location of report file that will be generated for the respective access logs.

VALID_REQUESTS is the tricky part. This is a list of regular expressions that you have to write to track your requests. The expressions should be written in Python format and surrounded with re.compile(r''). Each expression should contain <method> named group that will be used to extract the request identifier. It is also recommended to also specify <group> named group to combine the expressions into the groups. Thus, many Munin plugins operate on a group level and proper assignment of the requests to the groups will lead to better-looking data. Also, it will help with preventing name collisions. All the requests not having <group> specifier will be put into nogroup group.

###Advanced configuration

While changing the settings described in the Basic configuration section should be enough to make elfstatsd up and running, it has a number of additional awesome features that require some optional configuration. Pretty tough, I know.

All of the settings existing in settings.py are well-documented there and most of them do not require many additional explanations. Thus, here we only cover the options that require specifying additional regular expressions.

####Additional file processing parameters

Since v.1.18.7 elfstatsd supports specifying additional parameters when describing log files to process in DATA_FILES option of settings.py. The parameters have to be specified after the ? symbol in the file name and are key=value pairs separated with &. By now, two parameters are supported:

  • ts allows to specify positive or negative time shift in seconds. If you need to work with the files that contain records from the past or future, comparing to the system time, this will help to process them properly. For instance, let say that you have Apache logs stored in format apache.log-%Y-%m-%d-%H. This log rotates every hour and is copied to a different machine that runs elfstatsd. It is impossible to monitor the logs in real-time, as they are copied only after the full log file is available. In this case it is possible to apply a one-hour back shift to process not the current logs, but the logs generated an hour ago. An example configuration for this setup may look like DATA_FILES = [('apache.log-%Y-%m-%d-%H?ts=-3600','','elfstatsd.data')]. By default, ts=0,

  • ts-name-only is needed in a rare case when you have in-place rotating log, e.g. varnish.log, and keep the previous log files with date specifiers in their names, e.g. varnish.log-%Y-%m-%d-%H. To correctly read the remained records in the previous log file after the rotation, elfstatsd should generate its name with a time shift, one hour back in this case. If ts-name-only is set to true, then time shift will only be used to generate the correct file name, it won't affect the records contained in the file. Otherwise, if it is set to false, then the records in the log file will also be processed with the shifted datetime. Example DATA_FILES configuration for this case: DATA_FILES = [('varnish.log', 'varnish.log-%Y-%m-%d-%H?ts=-3600&ts-name-only=true', 'elfstatsd.data')]. By default, ts-name-only is set to false.

####Skipping not needed requests

There always are some requests that you don't need to track and see in the graphs, like the health checks. However, if you won't specify them in the VALID_REQUESTS setting, they will be reported to the elfstatsd's internal log. To prevent elfstatsd from reporting them there, add these requests to REQUESTS_TO_SKIP option.

####Additional requests aggregation

You may want to rename or regroup some of the requests parsed by the elfstatsd to organize the report in more logical way. This is where REQUESTS_AGGREGATION option comes in. It should contain a list of comma-separated 3-tuples defining additional aggregation rules. The first two elements of the tuples are <group> and <method> accordingly. The third one is the regex. After the request passes its initial validation through the regexes specified in VALID_REQUESTS option, it is validated against the rules in REQUESTS_AGGREGATION, and if a match is found, its original group and method identifiers get rewritten with the ones specified in the matching tuple. This means that you can omit specifying group and method in VALID_REQUESTS if you plan to rename/rearrange them using REQUESTS_AGGREGATION.

####Extraction of custom named patterns

Another flexible but tricky option is PATTERNS_TO_EXTRACT. It is used to extract arbitrary patterns from the requests and count the number of their occurrences. This option can be extremely helpful if you want to work not with the whole requests but with their specific parts. For example, let say you want to count the number of unique logged users accessing the site. You know that each user accesses /user/<uid> page and want to count the uids based on this information. To achieve this, you can configure a named pattern called uid with a regex matching /user/<uid> pattern. Each matched uid will be stored during the log file processing and their total number, as well as the number of unique matches, will be reported. An example configuration is provided in settings.py.

Clone this wiki locally