-
Notifications
You must be signed in to change notification settings - Fork 5
Tutorials
- Write to OpenTSDB
- Collect operating system metrics
- Set the cluster tag
- Access the API
- Poll remote hosts
- Shed load
OpenTSDB is seen as a feed subscriber with special requirements:
- It scales best when receiving feeds directly from forwarders (without indirecting via aggregator).
- It doesn't need to receive data points that don't contribute to line plots.
Therefore, register it by
adding the following entry to /etc/tsp-controller/network
:
<subscriber id="tsd" host="tsd.example.com" direct="true" dedup="true"/>
In case your OpenTSDB servers aren't reachable via single domain name (e.g. a VIP), provide domain names of OpenTSDB servers as a comma-separated list:
<subscriber
id="tsd"
host="tsd01.example.com,tsd02.example.com"
direct="true"
dedup="true"
/>
Apply by restarting controller:
service tsp-controller restart
Once the config propagates (about 1 minute), OpenTSDB will start receiving feeds on TCP/4242 from all forwarders and the poller.
tsp-controller(8) has more details about registering subscribers.
TSP does not ship a plugin that reads operating systems metrics. Instead, it makes it easy to reuse plugins developed by the tcollector project.
For example, collecting cpu metrics is a matter of running:
cd /etc/tsp/collect.d
wget https://raw.githubusercontent.com/OpenTSDB/tcollector/bd7c16e47e7617e93a092da50b2f9be671d4ef47/collectors/0/procstats.py
chmod +x procstats.py
Use your favorite deployment system to install these scripts.
Once tsp-forwarader detects a new script (about 1 second), it is executed automatically. You can observe the newly added metrics by running the following command on the aggregator host:
tcpdump -Alnp -i any "dst net localnet and dst port 4242" | grep "put proc\."
By default,
data points obtained from TSP lack explicit server group information.
The host
tag provides identification of individual hosts,
which in some cases can be sufficient to infer group membership.
For example, host=web01.us.example.com
can be reasonably
expected to be a member of the web.us
server group.
However,
this convention can be difficult to enforce.
For this reason,
TSP provides a method for providing this
group information explicitly
by storing it in the cluster
tag of every data point.
Provide the desired cluster information by
creating the file /etc/tsp-controller/config
:
<config>
<hostgroup id="web">
<cluster id="web.us">
<host id="web01.us.example.com"/>
<host id="web02.us.example.com"/>
<host id="web03.us.example.com"/>
</cluster>
</hostgroup>
</config>
Apply by restarting controller:
service tsp-controller restart
Once the config propagates (about 1 minute), the real-time feed will start including the new tag for all metrics originating at these three web hosts.
tsp-controller(8)
has more details about the format of the config
file.
In order to access the real-time API,
you must write a job that accepts single TCP connection on TCP/4242,
and reads the put
commands from it.
In addition, every 5 seconds the job has to respond to a heartbeat request
(the version
command).
Once the job is up and listening on TCP/4242,
register it
by adding the following entry to /etc/tsp-controller/network
:
<subscriber id="myjob" host="myjob.example.com"/>
Apply by restarting controller:
service tsp-controller restart
Once the config propagates (about 1 minute), your job will receive the stream over a connection established by aggregator.
Developing stream-processing jobs is easy. Example: proof-of-concept threshold checker implemented in bash:
# heartbeat responds to aggregator's heartbeat requests.
heartbeat() {
while true
do
echo "Built on <unknown> (myjob)" || exit
sleep 5
done
}
# readfeed reads the feed looking for problems with blocked i/o
readfeed() {
while read _ metric t n tags
do
if [ "$metric" = "proc.stat.procs_blocked" -a $n -gt 100 ]
then
series="$metric $tags"
echo "$t,$series,too many blocked processes ($n>100)"
fi
done
}
heartbeat | nc -k -l 4242 | grep "^put " | readfeed
Its output is a 3-column csv, where the first column is time in Unix epoch format, the second is the series used to detect the problem, and the third is the diagnostic message.
tsp-aggregator(8) has more details on the data contract of the feed.
The
tsp-poller(8)
service exists to allow polling of remote hosts.
It accepts plugins just like
tsp-forwarder(8)
except these plugins are installed under /etc/tsp-poller/collect.d
.
TODO: include collect-f5 and collect-netscaler examples.
Betfair experienced complete failure of OpenTSDB caused by partial failure of the underlying HBase layer. The right emergency response in such scenario is to reduce the data rate of the real-time feed so that it stops exceeding the available capacity.
TSP gives the operator a mechanism for
precise blocking of data points inserted into the feed.
For example,
in order to block all metrics matching foo.*
,
create the file /etc/tsp-controller/filter
with
the following content:
#!/bin/bash
program=$1
noop() {
echo "[]"
exit 0
}
[ "$program" = "tsp-forwarder" ] || noop
cat <<'EOF'
[
{
"Match": ["^foo\."],
"Block": true
}
]
EOF
Make it executable:
chmod +x /etc/tsp-controller/filter
Once the propagation delay elapses (about 1 minute), the forwarders will start dropping all matching data points.
tsp-controller(8)
has more details about the filter
mechanism.
Return to Documentation.
⌇ opentsp.org - Time Series Pipeline