Plugin Redis: Intial implementation #2

LukeHandle · 2018-06-07T21:35:51Z

If redis-cli is in the PATH, it will:

Run --stat showing key metric changes over 5 seconds
Run --latency for 3 seconds, 3 times
Run --intrinsic-latency, testing the local systems latency
Run info for general statistics

If the cli tool is not present locally, netcat is used to hit the specified host/port with an info.

Both methods support the options:

PLUGIN_OPTS_REDIS_HOST defaults to 127.0.0.1
PLUGIN_OPTS_REDIS_PORT defaults to 6379

I can re PR this against the upstream/master - but it looks like a much larger change until your change rackerlabs#158 is merged.

If redis-cli is in the PATH, it will: * Run the "--stat" showing key metric changes over 5 seconds * Run "--latency" for 3 seconds, 3 times * Run "--intrinsic-latency", testing the local systems latency * Run "info" for general statistics If the cli tool is not present locally, netcat is used to hit the specified host/port. Both methods support the options: `PLUGIN_OPTS_REDIS_HOST` defaults to 127.0.0.1 `PLUGIN_OPTS_REDIS_PORT` defaults to 6379

tonyskapunk

Overall this looks really good.

Few suggested changes, some additional ideas to handle things a little bit different.

I think doing the PR against the branch that is used for the PR on the upstream repo is the right thing to do.

Two ideas that triggered this.

When running this plugin, I immediately noticed an increase in execution time(see below).

Sounds like it would be nice to run plugins/core processes in parallel.

Maybe plugins could live in a different repo.

At the moment is too soon but wanted to write down them.

Thanks for the help!!

--

EDIT: Adding the timing examples:

Without plugins:

2018-06-08 10:11:15-05:00 [INFO] No plugins found.
2018-06-08 10:11:15-05:00 [INFO] Loading plugins from: /usr/lib/recap/plugins-enabled
2018-06-08 10:11:15-05:00 [INFO] Loading plugin: /usr/lib/recap/plugins-enabled/*
2018-06-08 10:11:15-05:00 [WARNING] No plugins found to execute in: /usr/lib/recap/plugins-enabled
2018-06-08 10:11:15-05:00 [INFO] recap[7252]: Caught signal - deleting /var/lock/recap.lock
2018-06-08 10:11:15-05:00 [INFO] Execution time: 11s
2018-06-08 10:11:15-05:00 [INFO] --- Ending recap[7252] ---

Enabling redis plugin:

2018-06-08 10:13:52-05:00 [INFO] 1 plugins found: redis
2018-06-08 10:13:52-05:00 [INFO] Loading plugins from: /usr/lib/recap/plugins-enabled
2018-06-08 10:13:52-05:00 [INFO] Loading plugin: /usr/lib/recap/plugins-enabled/redis
2018-06-08 10:13:52-05:00 [INFO] Starting 'redis' report - redis_20180608-101341.log
2018-06-08 10:14:05-05:00 [INFO] Ended 'redis' report
2018-06-08 10:14:05-05:00 [INFO] recap[7658]: Caught signal - deleting /var/lock/recap.lock
2018-06-08 10:14:05-05:00 [INFO] Execution time: 24s
2018-06-08 10:14:05-05:00 [INFO] --- Ending recap[7658] ---

tonyskapunk · 2018-06-08T14:07:32Z

src/plugins/redis

+  if [[ -n "${redis_cli}" ]]; then
+    print_blankline "${LOGFILE}"
+
+    echo "Redis stat (+delta / 0.5s)" >> "${LOGFILE}"


Might be worth defining another option variable for the stat? PLUGIN_OPTS_REDIS_STAT ? or is a value that will hardly change?

For the stat interval? As we are running it for 5 seconds, the value would only ever make sense below 1 second. It's possible that on a really busy server, you might see benefit lowering further, but you'd likely be better placed increasing the run time (or running proper Redis monitoring!). I don't think it will need changing that often.

Speaking of run time - I considered adding options for configuring the "timeout" values, but that felt like overload of variables.

tonyskapunk · 2018-06-08T14:26:09Z

src/plugins/redis

+  log INFO "Starting '${plugin_name}' report - ${LOGFILE##*/}"
+  print_redis_timings "${LOGFILE}"
+  print_blankline "${LOGFILE}"
+  print_redis_info "${LOGFILE}"


There is no need to pass ${LOGFILE} in here for the print_redis_*, since you are not receiving the value on those functions and also because you're calling those functions in the same scope where ${LOGFILE} is defined, that allows you to use it in the same way you use ${redis_cli}

Yeah, I initially had them redefined locally in each of the other functions, but then realised they were of course still in scope (but forgot to remove this remnant).

tonyskapunk · 2018-06-08T15:32:16Z

src/plugins/redis

+print_redis() {
+  local LOGFILE="$1"
+  local plugin_name=${FUNCNAME[0]/print_/}
+  local redis_cli=$( type -p redis-cli )


I see you test if redis-cli is available in two out of the three print_redis_* functions, maybe is worth to test that in here and bail out if not installed ?

Maybe you only call print_redis_info since you have another way to pull info through nc ?

Sure. It crossed my mind, but thought it'd get messy when then running print_redis_info and checking there anyway - I'll add a new function as suggested further down.

tonyskapunk · 2018-06-08T16:49:40Z

src/plugins/redis

+}
+
+print_redis_latency() {
+  local latency=$( timeout 3 "${redis_cli}" -h "${host}" -p "${port}" --latency 2>&1 )


Wow, that ANSI code you are removing(*2K) took me a while to understand it, it was until I ran a strace that I could see it, just for documenting purposes:

$ strace -o redis_cli.log -f -s255 timeout 0.1 "${redis_cli}" -h "${host}" -p "${port}" --latency 2>&1 min: 0, max: 1, avg: 0.62 (8 samples) $ grep min redis_cli.log |tail -1 9087 write(1, "\33[0G\33[2Kmin: 0, max: 1, avg: 0.62 (8 samples)", 45) = 45

The above means ( from this nice document https://www.real-world-systems.com/docs/ANSIcode.html)

\33 - Escape sequence

[0g - Clear horizontal tab stop at current position

[2K - erase entire current line

A quick example on how that affects the logs:

$ printf "$( date --rfc-3339=s ) \33[0G\33[2K[foo]\n" [foo] $ printf "$( date --rfc-3339=s ) [foo]\n" 2018-06-08 11:35:49-05:00 [foo]

I found another difference, I'm using Arch for testing and redis version 4.0.9-1 from community repo.

$ echo $( timeout 3 "${redis_cli}" -h "${host}" -p "${port}" --latency 2>&1 ) 0 17 0.53 94

While the version provided in CentOS 7, 3.2.10-2.el7 from EPEL. Print outs differently:

$ echo $(timeout 3 "${redis_cli}" -h "${host}" -p "${port}" --latency 2>&1 ) min: 0, max: 1, avg: 0.07 (294 samples)

I not sure what's going on with the redis client version 4, I guess is "OK" just printing the tree values knowing that they mean "min, max and avg".

I encountered a similar problem in another project where I was only splitting a string on \n, missing the \r (CR) character which would then reset the cursor position causing an overwrite...

I noticed the problem here purely by outputting to a file and viewing with less. cat interprets the escape codes by default unless ran with -e.

Did you want me to update the comment/add another line saying along lines of:

test this with "redis-cli --latency | cat -e" to see raw output and escape codes

For reference:

$ redis-cli --latency | cat -e ^[[0G^[[2Kmin: 11, max: 11, avg: 11.00 (1 samples)^[[0G^[[2Kmin: 0, max: 11, avg: 5.50 (2 samples)

Interestingly, it appears it also only runs for a second on 4.0:

$ time redis-cli --latency | cat -e 0 13 0.19 89$ real 0m1.025s user 0m0.001s sys 0m0.010s

With the help page showing the answer:

--latency Enter a special mode continuously sampling latency. If you use this mode in an interactive session it runs forever displaying real-time stats. Otherwise if --raw or --csv is specified, or if you redirect the output to a non TTY, it samples the latency for 1 second (you can use -i to change the interval), then produces a single output and exits.

I've tested setting the -i to 3, but it takes marginally longer and terminates early with no output (it doesn't output an update for each measurement like the 3.2 client). Setting this lower to 2.5 would still cause issues on an extremely loaded system:

$ time timeout 3 redis-cli --latency -i 2.5 | cat -e Terminated real 0m3.397s user 0m0.002s sys 0m0.013s

I'm inclined to still set it to 2.5 as it's fairly safe, but do you think I should instead add some version checking and rely purely on the builtin timer for 4.0?

tonyskapunk · 2018-06-08T16:52:10Z

src/plugins/redis

+  if [[ -n "${redis_cli}" ]]; then
+    "${redis_cli}" -h "${host}" -p "${port}" info all &>> "${LOGFILE}"
+  else
+    ( printf "INFO ALL\r\n"; sleep 1 ) | nc "${host}" "${port}" &>> "${LOGFILE}"


Couple of comments in here.

It would be a good idea to test if nc is available to use it. As plugins are optional and enabled by users package maintainers won't need to add dependencies to them.

I added a similar comment on the print_redis function, where it might be worth checking for redis_cli and maybe nc as well(?) and just call those print_redis_* functions that will produce results with the found commands. Maybe this function can be split into print_redis_info and print_redis_info_nc or something similar. Just throwing some ideas.

--

Netcat(nc) is hanging when not using nmap ncat, with Arch I attempted using gnu-netcat and openbsd-netcat and after receiving the all the info from redis it remains connected. CentOS works fine as I have tested with nmap-ncat. Debian has to use ncat as this is the nmap "netcat" tool, otherwise the other netcat tools(netcat-traditional, netcat-openbsd) hang as well.

In Debian the nmap ncat tool is not offered as an alternative of nc then one needs to be explicit about its use. In CentOS there is no issue as nc is symlinked to ncat. Then I suggest the use of ncat that should satisfy those distros to use the nmap netcat tool.

( printf "INFO ALL\r\n"; sleep 1 ) | ncat -C "${host}" "${port}"

I only tested with centos 7, debian 9 (and Arch, but that does not count)

Sure sure. Another function seems neatish, maybe I'll name it print_redis_info_fallback instead.

Ahh, so the issues with hanging are due to the line end differences between implementations?

I'll also add a -w 5 to try and handle timeouts (almost all the redis-cli calls are wrapped in timeout which would handle this).

If we still see further issues here, we could instead switch to a raw sockets implementation à la http://milhouseonsoftware.com/2015/07/27/writing-a-redis-client-in-pure-bash-part-1/

Would bypass issues with ncat being present or differing behaviours, but plenty of other problems would present themselves I also suspect.

$ exec {redis_socket}<>/dev/tcp/${host}/${port} $ printf "INFO ALL\r\nQUIT\r\n" >&${redis_socket} $ cat <&${redis_socket} [...] # Keyspace db0:keys=284,expires=5,avg_ttl=3127545 +OK

tonyskapunk · 2018-06-08T16:53:45Z

src/plugins/redis

@@ -0,0 +1,69 @@
+#!/bin/bash
+#
+#   Copyright (C) 2017 Rackspace, Inc.


I think your name should be here, I'll ask around as that would be my impression.

I can update this in the Contributors section, but as discussed, this is fine.

LukeHandle

Thanks for taking a look!

I think I'll wrap the options which take time under a flag which is default off to speed up execution.

I'm considering a further flag to run a CLIENT LIST (default off), but I'm hesitant on the useful-ness of this.

LukeHandle · 2018-06-09T14:36:10Z

src/plugins/redis

+  if [[ -n "${redis_cli}" ]]; then
+    print_blankline "${LOGFILE}"
+
+    echo "Redis stat (+delta / 0.5s)" >> "${LOGFILE}"


For the stat interval? As we are running it for 5 seconds, the value would only ever make sense below 1 second. It's possible that on a really busy server, you might see benefit lowering further, but you'd likely be better placed increasing the run time (or running proper Redis monitoring!). I don't think it will need changing that often.

Speaking of run time - I considered adding options for configuring the "timeout" values, but that felt like overload of variables.

LukeHandle · 2018-06-09T15:37:37Z

src/plugins/redis

+
+    echo "Redis stat (+delta / 0.5s)" >> "${LOGFILE}"
+    # "--stat" expects a TTY (does an ioctl with TCGETS), stdbuf resolves this neatly
+    timeout 5 stdbuf -oL "${redis_cli}" -h "${host}" -p "${port}" --stat -i 0.5 &>> "${LOGFILE}"


Funnily enough, I believe my initial understanding was false here. When setting the timeout larger without the stdbuf, it does output to the terminal, but (penny-drop), buffered. My solution was correct for the actual problem, but my assumption on what it must have been doing after seeing what happened in the strace with the ioctl was wrong.

I'll update the comment here to reflect this.

LukeHandle · 2018-06-09T15:50:04Z

src/plugins/redis

+}
+
+print_redis_latency() {
+  local latency=$( timeout 3 "${redis_cli}" -h "${host}" -p "${port}" --latency 2>&1 )


I encountered a similar problem in another project where I was only splitting a string on \n, missing the \r (CR) character which would then reset the cursor position causing an overwrite...

I noticed the problem here purely by outputting to a file and viewing with less. cat interprets the escape codes by default unless ran with -e.

Did you want me to update the comment/add another line saying along lines of:

test this with "redis-cli --latency | cat -e" to see raw output and escape codes

For reference:

$ redis-cli --latency | cat -e ^[[0G^[[2Kmin: 11, max: 11, avg: 11.00 (1 samples)^[[0G^[[2Kmin: 0, max: 11, avg: 5.50 (2 samples)

Interestingly, it appears it also only runs for a second on 4.0:

$ time redis-cli --latency | cat -e 0 13 0.19 89$ real 0m1.025s user 0m0.001s sys 0m0.010s

With the help page showing the answer:

--latency Enter a special mode continuously sampling latency. If you use this mode in an interactive session it runs forever displaying real-time stats. Otherwise if --raw or --csv is specified, or if you redirect the output to a non TTY, it samples the latency for 1 second (you can use -i to change the interval), then produces a single output and exits.

I've tested setting the -i to 3, but it takes marginally longer and terminates early with no output (it doesn't output an update for each measurement like the 3.2 client). Setting this lower to 2.5 would still cause issues on an extremely loaded system:

$ time timeout 3 redis-cli --latency -i 2.5 | cat -e Terminated real 0m3.397s user 0m0.002s sys 0m0.013s

I'm inclined to still set it to 2.5 as it's fairly safe, but do you think I should instead add some version checking and rely purely on the builtin timer for 4.0?

LukeHandle · 2018-06-09T16:28:15Z

src/plugins/redis

+  if [[ -n "${redis_cli}" ]]; then
+    "${redis_cli}" -h "${host}" -p "${port}" info all &>> "${LOGFILE}"
+  else
+    ( printf "INFO ALL\r\n"; sleep 1 ) | nc "${host}" "${port}" &>> "${LOGFILE}"


Sure sure. Another function seems neatish, maybe I'll name it print_redis_info_fallback instead.

Ahh, so the issues with hanging are due to the line end differences between implementations?

I'll also add a -w 5 to try and handle timeouts (almost all the redis-cli calls are wrapped in timeout which would handle this).

If we still see further issues here, we could instead switch to a raw sockets implementation à la http://milhouseonsoftware.com/2015/07/27/writing-a-redis-client-in-pure-bash-part-1/

Would bypass issues with ncat being present or differing behaviours, but plenty of other problems would present themselves I also suspect.

$ exec {redis_socket}<>/dev/tcp/${host}/${port} $ printf "INFO ALL\r\nQUIT\r\n" >&${redis_socket} $ cat <&${redis_socket} [...] # Keyspace db0:keys=284,expires=5,avg_ttl=3127545 +OK

LukeHandle · 2018-06-09T16:31:52Z

src/plugins/redis

+print_redis_info() {
+  echo "Redis INFO ALL" >> "${LOGFILE}"
+  if [[ -n "${redis_cli}" ]]; then
+    "${redis_cli}" -h "${host}" -p "${port}" info all &>> "${LOGFILE}"


I'll wrap this in a timeout 3 as well. Should not be needed, but the client doesn't seem to have better connection timeout handling and would otherwise hang (indefinitely?) when the other end is DROP-ing all packets.

(Note: --intrinsic-latency does not initiate any connection)

LukeHandle · 2018-06-09T16:33:03Z

src/plugins/redis

@@ -0,0 +1,69 @@
+#!/bin/bash
+#
+#   Copyright (C) 2017 Rackspace, Inc.


I can update this in the Contributors section, but as discussed, this is fine.

LukeHandle · 2018-06-09T16:34:10Z

src/plugins/redis

+print_redis() {
+  local LOGFILE="$1"
+  local plugin_name=${FUNCNAME[0]/print_/}
+  local redis_cli=$( type -p redis-cli )


Sure. It crossed my mind, but thought it'd get messy when then running print_redis_info and checking there anyway - I'll add a new function as suggested further down.

* Fixes from PR * Add flag for running timings (default no) * Add flag for printing client list (default no) * More comments in code

LukeHandle · 2018-06-11T09:26:35Z

Hey, pushed a new commit with following:

Adjusted ncat per recommendations for compatibility and added wait-timeout
Add flag for running timings (default no) - reduce the execution time
Add flag for printing client list (default no) - can be verbose, but might be handy
More comments in code - clarified some bits
Add timeout on redis-cli singular command
Separated handling for lack of CLI tool fallback
Improved Redis 4 compatibility

Hopefully not too many regressions in quality

tonyskapunk · 2018-06-18T03:19:43Z

This is looking good, I tested the new code and is working as expected, just last minor request, could you try to break the code and comments inside 80c when possible? e.g.

# from
local latency=$( timeout 3 "${redis_cli}" -h "${host}" -p "${port}" --latency -i 2.5 2>&1 )
# to
local latency=$( timeout 3 \
                   "${redis_cli}" \
                     -h "${host}" \
                     -p "${port}" \
                     --latency \
                     -i 2.5 \
                     2>&1
               )

I know in some cases is not possible to do so, but when possible let's do it. recaplog has some good examples about it.

LukeHandle · 2018-06-18T14:40:15Z

Hey, I made the change to be within 80c. I was rather conservative with my multi-line command breaks, but the recaplog seemed to only break when strictly needed vs. on every arg.

tonyskapunk · 2018-06-18T17:03:55Z

tested this and looking good :) 👍

Thanks for the contribution.

LukeHandle mentioned this pull request Jun 7, 2018

Reporting other/custom/redis information rackerlabs/recap#155

Closed

tonyskapunk reviewed Jun 8, 2018

View reviewed changes

LukeHandle commented Jun 9, 2018

View reviewed changes

Plugin Redis: changes from discussuion

4d2e4b5

* Fixes from PR * Add flag for running timings (default no) * Add flag for printing client list (default no) * More comments in code

Plugin Redis: trim line length

9a0ac77

LukeHandle mentioned this pull request Jun 18, 2018

Plugin support rackerlabs/recap#158

Merged

tonyskapunk merged commit 8c64521 into tonyskapunk:issue155 Jun 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plugin Redis: Intial implementation #2

Plugin Redis: Intial implementation #2

LukeHandle commented Jun 7, 2018 •

edited

Loading

tonyskapunk left a comment •

edited

Loading

tonyskapunk Jun 8, 2018

LukeHandle Jun 9, 2018

tonyskapunk Jun 8, 2018

LukeHandle Jun 9, 2018

tonyskapunk Jun 8, 2018

LukeHandle Jun 9, 2018

tonyskapunk Jun 8, 2018

LukeHandle Jun 9, 2018

tonyskapunk Jun 8, 2018

LukeHandle Jun 9, 2018

tonyskapunk Jun 8, 2018

LukeHandle Jun 9, 2018

LukeHandle left a comment

LukeHandle Jun 9, 2018

LukeHandle Jun 9, 2018

LukeHandle Jun 9, 2018

LukeHandle Jun 9, 2018

LukeHandle Jun 9, 2018

LukeHandle Jun 9, 2018

LukeHandle Jun 9, 2018

LukeHandle commented Jun 11, 2018

tonyskapunk commented Jun 18, 2018

LukeHandle commented Jun 18, 2018

tonyskapunk commented Jun 18, 2018

Plugin Redis: Intial implementation #2

Plugin Redis: Intial implementation #2

Conversation

LukeHandle commented Jun 7, 2018 • edited Loading

tonyskapunk left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LukeHandle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LukeHandle commented Jun 11, 2018

tonyskapunk commented Jun 18, 2018

LukeHandle commented Jun 18, 2018

tonyskapunk commented Jun 18, 2018

LukeHandle commented Jun 7, 2018 •

edited

Loading

tonyskapunk left a comment •

edited

Loading