Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reporting other/custom/redis information #155

Closed
LukeHandle opened this issue May 10, 2018 · 12 comments
Closed

Reporting other/custom/redis information #155

LukeHandle opened this issue May 10, 2018 · 12 comments
Milestone

Comments

@LukeHandle
Copy link
Contributor

Is there interest in adding in other reporting aspects - or ability for a custom reporting?

I'm particularly interested in Redis stats (eg. redis-cli info or just via nc), but the rotation and file organization that recap uses is great.

Maybe a custom report option allowing command "X" to run as part of the report to file name "Y"? Is this of interest, or entirely out of scope 😄

@tonyskapunk
Copy link
Contributor

Hi @LukeHandle , at the moment there is no support for custom reports, plugins, third party scripts, etc. Mainly because there hasn't been any request about that.

The current structure of recap couldn't easily plug in a new reporting, but the efforts put on version 1.0.0 attempted to separate its functionality towards a more "modular" way. Although it was not completely possible back then. The code has some notes around this:

And also this other issue is related #115

Let me dig into this and see how feasible this is. Iideally plugins/scripts could be written and called through recap when enabled through the configuration and produce the reports for each one of those.

@tonyskapunk
Copy link
Contributor

Seems like this will be possible, I've started migrating some of the core functions outside the recap script, this way it will be more modular and we can import other functions to produce logs as well. Right now the first step is to create a clear separation of the code, you can take a look to the code in this branch: tonyskapunk/recap:issue155 (latest ommit)

I'll keep working on this and updating.

@LukeHandle
Copy link
Contributor Author

I've been having a mess around with some useful Redis commands (not sure if that function should be added to the core, or separate of the code in this project).

  • also, not sure on opinions running these with the timeout command - it seems dirty, but the redis-cli does not seem to offer a nice method to stop after x seconds.

run redis-cli info once
run timeout 5 redis-cli --stat 0.5 once
run timeout 3 redis-cli --latency x3 (a rough version of latency-history) (9 second length)

off by default:
redis-cli --intrinsic-latency 5

I'm happy to write a draft module if that helps?

@tonyskapunk
Copy link
Contributor

The use of timeout will be key in all the modules/plugins added, (recap is capped to 5m when running through cron).

Since is not yet concrete how the non-core modules/plugins will be loaded (I'm thinking redis will be the first one), lets wait on the coding for now. But definitely love to have contributors for this project :)

Thanks!

@tonyskapunk
Copy link
Contributor

Made some progress by fixing #115 it is important to avoid that dependency so the new reports can be dynamically archived/backed up/emailed/etc, I made some testing and documented in that issue: #115 (comment)

@tonyskapunk
Copy link
Contributor

Sorry I forgot to update last week that I made good progress in here, tonyskapunk/recap:issue155 has a new commit, in here the functionality to run plugins has been added, I also included an example plugin. The README has been updated and includes the requirements to write a new plugin.

I tested the plugin and the creation of logs should work, I did not test the backup/snapshot features, but I'd expect them to work just fine per the previous commits. It still needs to test recaplog to ensure is archiving the old logs, I don't expect issues in there either.

Once I've spend some time on testing I'll create the PR to get this going.

@tonyskapunk
Copy link
Contributor

TL;DR

Had to make a small change to fix report names detection, after that the tests came successfully, I created PR #158 to get this one rolling.

@LukeHandle could you please do some testing on your end making sure this is working as expected?

The README should include the information needed around Plugins, specifically here

Test results

recap

Recap run

Execution

# recap
# ls -1 /var/log/recap/*20180601-105055*
/var/log/recap/fdisk_20180601-105055.log
/var/log/recap/kernel_cmd_20180601-105055.log
/var/log/recap/netstat_20180601-105055.log
/var/log/recap/ps_20180601-105055.log
/var/log/recap/pstree_20180601-105055.log
/var/log/recap/resources_20180601-105055.log

Logs

2018-06-01 10:50:55-05:00 [INFO] --- Starting recap[5175] ---
2018-06-01 10:50:55-05:00 [INFO] -- bash info: 4 4 19 1 release x86_64-unknown-linux-gnu
2018-06-01 10:50:55-05:00 [INFO] recap[5175]: Created lock file: /var/lock/recap.lock
2018-06-01 10:50:55-05:00 [INFO] Starting check for disk space
2018-06-01 10:50:55-05:00 [INFO] Ended check for disk space
2018-06-01 10:50:55-05:00 [INFO] -- Report suffix: 20180601-105055
2018-06-01 10:50:55-05:00 [INFO] Starting 'ps' report - ps_20180601-105055.log
2018-06-01 10:50:55-05:00 [INFO] Ended 'ps' report
2018-06-01 10:50:55-05:00 [INFO] Starting 'uptime' report - resources_20180601-105055.log
2018-06-01 10:50:55-05:00 [INFO] Ended 'uptime' report
2018-06-01 10:50:55-05:00 [INFO] Starting 'free' report - resources_20180601-105055.log
2018-06-01 10:50:55-05:00 [INFO] Ended 'free' report
2018-06-01 10:50:55-05:00 [INFO] Starting 'vmstat' report - resources_20180601-105055.log
2018-06-01 10:50:57-05:00 [INFO] Ended 'vmstat' report
2018-06-01 10:50:57-05:00 [INFO] Starting 'iostat' report - resources_20180601-105055.log
2018-06-01 10:50:59-05:00 [INFO] Ended 'iostat' report
2018-06-01 10:50:59-05:00 [INFO] Starting 'iotop' report - resources_20180601-105055.log
2018-06-01 10:51:01-05:00 [INFO] Ended 'iotop' report
2018-06-01 10:51:01-05:00 [INFO] Starting 'sar' report - resources_20180601-105055.log
2018-06-01 10:51:01-05:00 [INFO] Ended 'sar' report
2018-06-01 10:51:01-05:00 [INFO] Starting 'disk utilization' report - resources_20180601-105055.log
2018-06-01 10:51:01-05:00 [INFO] Ended 'disk utilization' report
2018-06-01 10:51:01-05:00 [INFO] Starting 'slab info' report - resources_20180601-105055.log
2018-06-01 10:51:01-05:00 [INFO] Ended 'slab info' report
2018-06-01 10:51:01-05:00 [INFO] Starting 'top 10 cpu' report - resources_20180601-105055.log
2018-06-01 10:51:05-05:00 [INFO] Ended 'top 10 cpu' report
2018-06-01 10:51:05-05:00 [INFO] Starting 'top 10 memory' report - resources_20180601-105055.log
2018-06-01 10:51:05-05:00 [INFO] Ended 'top 10 memory' report
2018-06-01 10:51:05-05:00 [INFO] Starting 'pstree' report - pstree_20180601-105055.log
2018-06-01 10:51:05-05:00 [INFO] Ended 'pstree' report
2018-06-01 10:51:05-05:00 [INFO] Starting 'network socket' report - netstat_20180601-105055.log
2018-06-01 10:51:05-05:00 [INFO] Ended 'network socket' report
2018-06-01 10:51:05-05:00 [INFO] Starting 'disk partition' report - fdisk_20180601-105055.log
2018-06-01 10:51:05-05:00 [INFO] Ended 'disk partition' report
2018-06-01 10:51:05-05:00 [INFO] Finding plugins in /usr/lib/recap/plugins-available
2018-06-01 10:51:05-05:00 [INFO] 1 plugins found: kernel_cmd
2018-06-01 10:51:05-05:00 [INFO] Finding plugins in /usr/lib/recap/plugins-enabled
2018-06-01 10:51:05-05:00 [INFO] 1 plugins found: kernel_cmd
2018-06-01 10:51:05-05:00 [INFO] Loading plugins from: /usr/lib/recap/plugins-enabled
2018-06-01 10:51:05-05:00 [INFO] Loading plugin: /usr/lib/recap/plugins-enabled/kernel_cmd
2018-06-01 10:51:05-05:00 [INFO] Starting 'kernel_cmd' report - kernel_cmd_20180601-105055.log
2018-06-01 10:51:05-05:00 [INFO] Ended 'kernel_cmd' report
2018-06-01 10:51:05-05:00 [INFO] recap[5175]: Caught signal - deleting /var/lock/recap.lock
2018-06-01 10:51:05-05:00 [INFO] Execution time: 10s
2018-06-01 10:51:05-05:00 [INFO] --- Ending recap[5175] ---

Snapshots:

Execution

# recap -S
# ls -1 /var/log/recap/snapshots/
fdisk_20180601-105156.log_snapshot
kernel_cmd_20180601-105156.log_snapshot
netstat_20180601-105156.log_snapshot
ps_20180601-105156.log_snapshot
pstree_20180601-105156.log_snapshot
resources_20180601-105156.log_snapshot

Logs

2018-06-01 10:51:56-05:00 [INFO] --- Starting recap[5380] ---
2018-06-01 10:51:56-05:00 [INFO] -- bash info: 4 4 19 1 release x86_64-unknown-linux-gnu
2018-06-01 10:51:56-05:00 [INFO] recap[5380]: Created lock file: /var/lock/recap.lock
2018-06-01 10:51:56-05:00 [INFO] Starting check for disk space
2018-06-01 10:51:56-05:00 [INFO] Ended check for disk space
2018-06-01 10:51:56-05:00 [INFO] -- Taking snapshot, storing reports in /var/log/recap/snapshots
2018-06-01 10:51:56-05:00 [INFO] -- Report suffix: 20180601-105156
2018-06-01 10:51:56-05:00 [INFO] Starting 'ps' report - ps_20180601-105156.log_snapshot
2018-06-01 10:51:56-05:00 [INFO] Ended 'ps' report
2018-06-01 10:51:56-05:00 [INFO] Starting 'uptime' report - resources_20180601-105156.log_snapshot
2018-06-01 10:51:56-05:00 [INFO] Ended 'uptime' report
2018-06-01 10:51:56-05:00 [INFO] Starting 'free' report - resources_20180601-105156.log_snapshot
2018-06-01 10:51:56-05:00 [INFO] Ended 'free' report
2018-06-01 10:51:56-05:00 [INFO] Starting 'vmstat' report - resources_20180601-105156.log_snapshot
2018-06-01 10:51:58-05:00 [INFO] Ended 'vmstat' report
2018-06-01 10:51:58-05:00 [INFO] Starting 'iostat' report - resources_20180601-105156.log_snapshot
2018-06-01 10:52:00-05:00 [INFO] Ended 'iostat' report
2018-06-01 10:52:00-05:00 [INFO] Starting 'iotop' report - resources_20180601-105156.log_snapshot
2018-06-01 10:52:03-05:00 [INFO] Ended 'iotop' report
2018-06-01 10:52:03-05:00 [INFO] Starting 'sar' report - resources_20180601-105156.log_snapshot
2018-06-01 10:52:03-05:00 [INFO] Ended 'sar' report
2018-06-01 10:52:03-05:00 [INFO] Starting 'disk utilization' report - resources_20180601-105156.log_snapshot
2018-06-01 10:52:03-05:00 [INFO] Ended 'disk utilization' report
2018-06-01 10:52:03-05:00 [INFO] Starting 'slab info' report - resources_20180601-105156.log_snapshot
2018-06-01 10:52:03-05:00 [INFO] Ended 'slab info' report
2018-06-01 10:52:03-05:00 [INFO] Starting 'top 10 cpu' report - resources_20180601-105156.log_snapshot
2018-06-01 10:52:07-05:00 [INFO] Ended 'top 10 cpu' report
2018-06-01 10:52:07-05:00 [INFO] Starting 'top 10 memory' report - resources_20180601-105156.log_snapshot
2018-06-01 10:52:07-05:00 [INFO] Ended 'top 10 memory' report
2018-06-01 10:52:07-05:00 [INFO] Starting 'pstree' report - pstree_20180601-105156.log_snapshot
2018-06-01 10:52:07-05:00 [INFO] Ended 'pstree' report
2018-06-01 10:52:07-05:00 [INFO] Starting 'network socket' report - netstat_20180601-105156.log_snapshot
2018-06-01 10:52:07-05:00 [INFO] Ended 'network socket' report
2018-06-01 10:52:07-05:00 [INFO] Starting 'disk partition' report - fdisk_20180601-105156.log_snapshot
2018-06-01 10:52:07-05:00 [INFO] Ended 'disk partition' report
2018-06-01 10:52:07-05:00 [INFO] Finding plugins in /usr/lib/recap/plugins-available
2018-06-01 10:52:07-05:00 [INFO] 1 plugins found: kernel_cmd
2018-06-01 10:52:07-05:00 [INFO] Finding plugins in /usr/lib/recap/plugins-enabled
2018-06-01 10:52:07-05:00 [INFO] 1 plugins found: kernel_cmd
2018-06-01 10:52:07-05:00 [INFO] Loading plugins from: /usr/lib/recap/plugins-enabled
2018-06-01 10:52:07-05:00 [INFO] Loading plugin: /usr/lib/recap/plugins-enabled/kernel_cmd
2018-06-01 10:52:07-05:00 [INFO] Starting 'kernel_cmd' report - kernel_cmd_20180601-105156.log_snapshot
2018-06-01 10:52:07-05:00 [INFO] Ended 'kernel_cmd' report
2018-06-01 10:52:07-05:00 [INFO] recap[5380]: Caught signal - deleting /var/lock/recap.lock
2018-06-01 10:52:07-05:00 [INFO] Execution time: 11s
2018-06-01 10:52:07-05:00 [INFO] --- Ending recap[5380] ---

Backup

Execution

# recap -B
# ls -1 /var/log/recap/backups/
fdisk_20180601-105055.log
kernel_cmd_20180601-105055.log
netstat_20180601-105055.log
ps_20180601-105055.log
pstree_20180601-105055.log
resources_20180601-105055.log

Logs

2018-06-01 10:53:18-05:00 [INFO] --- Starting recap[5608] ---
2018-06-01 10:53:18-05:00 [INFO] -- bash info: 4 4 19 1 release x86_64-unknown-linux-gnu
2018-06-01 10:53:18-05:00 [INFO] recap[5608]: Created lock file: /var/lock/recap.lock
2018-06-01 10:53:18-05:00 [INFO] Starting check for disk space
2018-06-01 10:53:18-05:00 [INFO] Ended check for disk space
2018-06-01 10:53:18-05:00 [INFO] -- Taking backup, storing reports in /var/log/recap/backups
2018-06-01 10:53:18-05:00 [INFO] Starting backup of reports
2018-06-01 10:53:18-05:00 [INFO] Last run was on: 20180601-105055
2018-06-01 10:53:18-05:00 [INFO] Reports found: [ fdisk kernel_cmd netstat ps pstree resources ]
2018-06-01 10:53:18-05:00 [INFO] Ended backup of reports
2018-06-01 10:53:18-05:00 [INFO] recap[5608]: Caught signal - deleting /var/lock/recap.lock
2018-06-01 10:53:18-05:00 [INFO] Execution time: 1s
2018-06-01 10:53:18-05:00 [INFO] --- Ending recap[5608] ---

recaplog

Archiving (and compressing)

Archiving the previous day

## Mimicking yesterday's logs:
# for f in *20180601*.log; do cp ${f} ${f/20180601/20180531}; done

# recaplog
# ls -1 /var/log/recap/*.gz
/var/log/recap/fdisk_daily_20180531.log.tar.gz
/var/log/recap/kernel_cmd_daily_20180531.log.tar.gz
/var/log/recap/netstat_daily_20180531.log.tar.gz
/var/log/recap/ps_daily_20180531.log.tar.gz
/var/log/recap/pstree_daily_20180531.log.tar.gz
/var/log/recap/resources_daily_20180531.log.tar.gz

Logs

2018-06-01 11:01:08-05:00 [INFO] --- Starting recaplog[5826] ---
2018-06-01 11:01:08-05:00 [INFO] recaplog (5826): Created lock file: /var/lock/recaplog.lock
2018-06-01 11:01:08-05:00 [INFO] Compressing old log files
2018-06-01 11:01:08-05:00 [INFO] Finding reports from: 20180531
2018-06-01 11:01:08-05:00 [INFO] Reports found: [ fdisk kernel_cmd netstat ps pstree resources ]
2018-06-01 11:01:08-05:00 [INFO] Packing fdisk...
2018-06-01 11:01:08-05:00 [INFO] Moving 5 logs to: /var/log/recap/fdisk_daily_20180531
2018-06-01 11:01:09-05:00 [INFO] Compressing 5 logs into: /var/log/recap/fdisk_daily_20180531.log.tar.gz
2018-06-01 11:01:09-05:00 [INFO] Deleting 5 logs.
2018-06-01 11:01:09-05:00 [INFO] Packing kernel_cmd...
2018-06-01 11:01:09-05:00 [INFO] Moving 8 logs to: /var/log/recap/kernel_cmd_daily_20180531
2018-06-01 11:01:09-05:00 [INFO] Compressing 8 logs into: /var/log/recap/kernel_cmd_daily_20180531.log.tar.gz
2018-06-01 11:01:09-05:00 [INFO] Deleting 8 logs.
2018-06-01 11:01:09-05:00 [INFO] Packing netstat...
2018-06-01 11:01:09-05:00 [INFO] Moving 8 logs to: /var/log/recap/netstat_daily_20180531
2018-06-01 11:01:09-05:00 [INFO] Compressing 8 logs into: /var/log/recap/netstat_daily_20180531.log.tar.gz
2018-06-01 11:01:09-05:00 [INFO] Deleting 8 logs.
2018-06-01 11:01:09-05:00 [INFO] Packing ps...
2018-06-01 11:01:09-05:00 [INFO] Moving 8 logs to: /var/log/recap/ps_daily_20180531
2018-06-01 11:01:09-05:00 [INFO] Compressing 8 logs into: /var/log/recap/ps_daily_20180531.log.tar.gz
2018-06-01 11:01:09-05:00 [INFO] Deleting 8 logs.
2018-06-01 11:01:09-05:00 [INFO] Packing pstree...
2018-06-01 11:01:09-05:00 [INFO] Moving 5 logs to: /var/log/recap/pstree_daily_20180531
2018-06-01 11:01:09-05:00 [INFO] Compressing 5 logs into: /var/log/recap/pstree_daily_20180531.log.tar.gz
2018-06-01 11:01:09-05:00 [INFO] Deleting 5 logs.
2018-06-01 11:01:09-05:00 [INFO] Packing resources...
2018-06-01 11:01:09-05:00 [INFO] Moving 8 logs to: /var/log/recap/resources_daily_20180531
2018-06-01 11:01:09-05:00 [INFO] Compressing 8 logs into: /var/log/recap/resources_daily_20180531.log.tar.gz
2018-06-01 11:01:09-05:00 [INFO] Deleting 8 logs.
2018-06-01 11:01:09-05:00 [INFO] Deleting log files older than 15 days...
2018-06-01 11:01:09-05:00 [INFO] Deleting: 0 log files.
2018-06-01 11:01:09-05:00 [INFO] Deleting: 0 empty directories.
2018-06-01 11:01:09-05:00 [INFO] recaplog (5826): Caught signal - deleting /var/lock/recaplog.lock
2018-06-01 11:01:09-05:00 [INFO] --- Ending recaplog[5826] ---

Deleting older logs

Removing logs older than 15 days

## Mimicking 17 days old logs
# age_17d=$( date -d 'now -17 day' '+%Y%m%d' ); echo ${age_17d}
20180515
# for f in *20180601*.log; do cp ${f} ${f/20180601/${age_17d}}; done
# for f in *${age_17d}*; do touch -m -d ${age_17d} $f; done

# recaplog
2018-06-01 11:09:21-05:00 [ERROR] Unable to archive unexisting reports.
# ls -1 /var/log/recap/*${age_17d}*
ls: cannot access '/var/log/recap/*20180515*': No such file or directory

The ERROR above is expected, since we have already archived the logs from the day before.

Also the fact there are no files for that date is a confirmation of recaplog working.

Logs

2018-06-01 11:09:21-05:00 [INFO] --- Starting recaplog[6311] ---
2018-06-01 11:09:21-05:00 [INFO] recaplog (6311): Created lock file: /var/lock/recaplog.lock
2018-06-01 11:09:21-05:00 [INFO] Compressing old log files
2018-06-01 11:09:21-05:00 [INFO] Finding reports from: 20180531
2018-06-01 11:09:21-05:00 [INFO] Reports found: [  ]
2018-06-01 11:09:21-05:00 [ERROR] Unable to archive unexisting reports.
2018-06-01 11:09:21-05:00 [INFO] Deleting log files older than 15 days...
2018-06-01 11:09:21-05:00 [INFO] Deleting: 42 log files.
2018-06-01 11:09:21-05:00 [INFO] Deleting: 0 empty directories.
2018-06-01 11:09:21-05:00 [INFO] recaplog (6311): Caught signal - deleting /var/lock/recaplog.lock
2018-06-01 11:09:21-05:00 [INFO] --- Ending recaplog[6311] ---

@LukeHandle
Copy link
Contributor Author

Hey @tonyskapunk, currently looking at this. I've written a basic Redis plugin for testing though will likely raise a PR for discussion on methods being used. Some of the redis-cli commands rewrite their output though which might make things messy...

@LukeHandle
Copy link
Contributor Author

I've opened tonyskapunk#2 over on your actual account as it doesn't make sense to PR against here until #158 is merged?

I can rebase against master instead if preferred as well?

===

Please let me know any changes you would like me to make etc. I have tried to follow standard you have used elsewhere, but may have slipped etc.

@tonyskapunk
Copy link
Contributor

tonyskapunk commented Jun 8, 2018

We keep moving on this one, I tested the PR tonyskapunk#2 and left some messages in there, I think is good approach try to include that plugin in the PR for this issue and that's the best way I can think of.

I will remove one of the plugins in there kernel_cmd it meant to be an example, maybe I'll create another directory with an example documented and use that for it.

  • Still need some people to review the code of PR Plugin support #158
  • Need to confirm that Copyright for plugins contributed should be under the contributor's name.

@tonyskapunk
Copy link
Contributor

Sorry this is taking longer but since there are so many changes I want to ensure others review them.

Since there are quite good amount of changes in development I'm going to release a new, and potentially the last, version of recap from the 1.x series. I'm thinking in including #154 and then rebase #158 to then release 2.x with it. With that being said it is expected that it will take longer to get this solved and #158 merged.

@tonyskapunk tonyskapunk added this to the 2.0.0 milestone Jun 22, 2018
@tonyskapunk
Copy link
Contributor

The PR was up for quite a long time and I don't want to hold this more. I'm happy with the testing so far, the introduction of CI (through travis-ci) should help this project to do some automated validations so the person validating does not have to do everything on her own.

I'm closing this since is now in development.

Thanks so much for your help @LukeHandle !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants