Here are some small ad-hoc analysis tools for web server access logs:
clf.py
: Log file parser and statisticsclfgrep.py
: Find log lines by arbitrary criteria
clf.py operator field [operator field]...
Operators:
count: Count lines, grouped by field
set: Extract unique field values
avg: Compute the average of a numeric field
max: Find the maximum field value
min: Find the minimum field value
sum: Add numeric field values
Available fields: host, identity, user, date, request, status, bytes, referer, user_agent, method, uri, protocol, utcoffset
clfgrep.py field operator value
Purpose: Scan web server log files.
Operators: = for exact matches, ~ for regular expressions.
Prefix with: ! to negate, * for case insensitive search, in this order.
clf.py
orclfgrep.py
: Usage instructions
clf.py count method < access.log
: Count different HTTP request methodsclf.py count protocol < access.log
: Count all HTTP protocol versionsclf.py set user_agent < access.log
: List all user agent stringsclf.py avg bytes < access.log
: Compute the average response size
clfgrep.py method\*=post < access.log
: Find all POST requestsclfgrep.py useragent\*~bot < access.log
: Find all requests where the user agent containsbot
(case insensitive)clfgrep.py status=404 < access.log
: Find dead links
clfgrep.py status=404 < access.log | clf.py count uri
: Get the number of broken linksclfgrep.py protocol=HTTP/1.0 < access.log | ./clf.py count user_agent
: Rank broken user agentsclfgrep.py user_agent\*~bot < access.log | clf.py count user_agent
: Rank search engine hits