-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agentlog: add pprof debug http handler #3237
Conversation
TODO @christoph-zededa : yetus warnings done ... |
Previous comment from @milan-zededa : Maybe we should take this opportunity to document all the features of agentlog that one may use for debugging purposes? (e.g. under this folder) - #3229 (comment) If it is not too much, I can add your suggestions. Also I added https://github.com/lf-edge/eve/pull/3237/files#diff-e76641ce9bfa02962b0687e68a963a44dfa9d1a4f427cfafd9fbfa36d55986f4R225 |
4d7812c
to
4a89c72
Compare
conf/grub.cfg
Outdated
@@ -3,3 +3,4 @@ | |||
# set_global eve_flavor kvm | |||
# to force booting in Xen mode, uncomment: | |||
# set_global eve_flavor xen | |||
set_getty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added by mistake. And yes, we need to change how we prepare the debug grub config. Otherwise it is really annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it
(I let this comment open in case I accidentally re-add it ...)
4a89c72
to
7740c64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One heads up. Some stack trace code is invoke from the logrus log.Panic callback, and that can in turn not call back into logrus. Should be comments around that code, but I don't recall how it relates to dumping all of the goroutine stacks. Can you double-check that that code path is not affected by this restructuring?
pkg/dom0-ztools/rootfs/bin/eve
Outdated
pkill -USR2 /opt/zededa/bin/zedbox | ||
fi | ||
|
||
wget -O - "$URL" --post-data '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think we have wget included in EVE-OS.
We have curl e.g., in the debug container, but not in pillar etc.Will this be invoked from inside the debug container?
Note that a console (keyboard and screen) lands in a different place than a ssh to EVE-OS.,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to wget
as we don't have curl
in the root container (terminal / serial console).
Now I also checked if we have wget
in the debug container and when using ssh - we do.
So it should all work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think we had curl outside of the debug container either. I wonder where that and wget gets pulled in (we want to minimize the set of packages we use to minimize the CVEs we need to look for).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I changed it to nc
: 83f3bec
so, So, as far as I understand, the stacktrace is only printed via log as an ExitHandler which is called by |
7740c64
to
b92677c
Compare
b92677c
to
83f3bec
Compare
pprof show various insights into the process and can be accessed via http://localhost:6543/ to start it, send USR2 to zedbox or run `eve http-debug`; to stop it, do curl localhost:6543/stop -X POST or `eve http-debug stop` Signed-off-by: Christoph Ostarek <christoph@zededa.com>
this first starts the pprof interface on zedbox and then forwards the port to the edgeview client once the user is done the pprof interface is closed again Signed-off-by: Christoph Ostarek <christoph@zededa.com>
7dee404
to
8c67fea
Compare
in order to keep the image small and reduce the attack surface wget may be removed in the future, so instead use nc Signed-off-by: Christoph Ostarek <christoph@zededa.com>
8c67fea
to
5ea17eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Makefile
Outdated
@@ -46,6 +46,8 @@ INSTALLER_IMG_FORMAT=raw | |||
VERIFICATION_IMG_FORMAT=raw | |||
# SSH port to use for running images live | |||
SSH_PORT=2222 | |||
# Debug port to access debugging information (e.g. pprof) | |||
DEBUG_PORT=6543 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the normal pprof port '6060' is used? https://pkg.go.dev/net/http/pprof
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code does not exist anymore - is outdated.
pkg/dom0-ztools/rootfs/bin/eve
Outdated
iptables -t mangle -F | ||
echo "Listening on :6543 -- use 'eve http-debug stop' to stop" | ||
elif [ "$2" = "stop" ]; then | ||
wget -O - 127.1:6543/stop --post-data '' > /dev/null 2>&1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to remove the iptable setup after stop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code does not exist anymore - is outdated.
(both scripts can be found <a href="https://github.com/brendangregg/FlameGraph">here</a>) | ||
` | ||
|
||
mux.Handle("/", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zedcloud side has plan to move away from mux router due to: https://github.com/gorilla#gorilla-toolkit, not sure about the EVE status on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about the status in EVE; I guess we need to discuss this soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@naiming-zededa @christoph-zededa this is not gorillamux; just the simple http.NewServeMux() which we also use in zedrouter/server.go
@@ -982,6 +983,22 @@ func dispAFile(f os.FileInfo) { | |||
fmt.Printf("%s, %v, %d, %s\n", f.Mode().String(), f.ModTime(), f.Size(), f.Name()) | |||
} | |||
|
|||
func runPprof() { | |||
cmd := exec.Command("/usr/bin/pkill", "-USR2", "/opt/zededa/bin/zedbox") | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you expect the client side to query the pprof stats quite soon? otherwise the tcp will timeout if no activity. Or in rare case the cmd run has error, we need to exit here. Would be more flexible to offer more options. For example, with 'pprof/on' to trigger it to start and send back status of success or not and print out the port being used, 'pprof/off' to shut it down. either use general tcp/localhost:port, or do a 'pprof/run' to do all 3.
also, add the 'help' line in the basics.go for the new command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"do a 'pprof/run' to do all 3" <- that was my initial plan, we can just drop
5ea17eb
and it works that way.
pkg/edgeview/src/system.go
Outdated
@@ -1215,7 +1233,7 @@ func runTechSupport(cmds cmdOpt, isLocal bool) { | |||
closePipe(true) | |||
|
|||
printTitle("\n - system info -\n\n", colorRED, false) | |||
runSystem(cmds, "hw,model,pci,usb,lastreboot,newlog,volume,app,datastore,cipher,dmesg,configitem") | |||
runSystem(cmds, "hw,model,pci,usb,lastreboot,newlog,volume,app,datastore,cipher,dmesg,configitem,pprof") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be in techsupport, since without option input it will just return, and it will not generate any output into the techsupport file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it
also, if you can update this document: https://wiki.lfedge.org/display/EVE/EdgeView+Commands to add the new command there. thanks. |
let the user of edgeview do it manually Signed-off-by: Christoph Ostarek <christoph@zededa.com>
5ea17eb
to
235d4f6
Compare
"syscall" | ||
"time" | ||
|
||
"github.com/lf-edge/eve/pkg/pillar/base" | ||
"github.com/lf-edge/eve/pkg/pillar/types" | ||
fileutils "github.com/lf-edge/eve/pkg/pillar/utils/file" | ||
"github.com/satori/go.uuid" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the need for this change; the package is referenced by uuid in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just done by gopls/goimports.
I've read a bit on golang/go#28428 and it seems it speeds up checking for imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@naiming-zededa have your issues been resolved?
pprof show various insights into the process and can be accessed via http://localhost:6543/
to start it, send USR2 to zedbox or run
eve http-debug
; to stop it, do curl localhost:6543/stop -X POSTor
eve http-debug stop