Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agentlog: add pprof debug http handler #3237

Merged
merged 4 commits into from
Jun 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/DEBUGGING.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,11 @@ cp -r /containers/services/pillar/lower /persist/services/pillar
# reboot and enjoy updates to the pillar container
```

## pprof access

pprof can be enabled with `eve http-debug`; now zedbox listens on port 6543 which can be forwarded f.e. with edge-view.
Information on how to use pprof can be found here: <https://pkg.go.dev/net/http/pprof>

## Keyboard/console access

For security reasons the USB ports are disabled by default. The only exception is during hardware onboarding when an override file might be needed from a USB stick to do the initial network configuration as specified in in [DEVICE-CONNECTIVITY](DEVICE-CONNECTIVITY.md). During that onboarding USB keyboard access is currently also allowed. However, the USB access does not start until the pillar container is started.
Expand Down
47 changes: 47 additions & 0 deletions pkg/dom0-ztools/rootfs/bin/eve
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ Welcome to EVE!
persist attach <disk>
config mount <mountpoint>
config unmount
http-debug
dump-stacks
dump-memory
firewall drop
verbose on|off
version
Expand Down Expand Up @@ -83,6 +86,33 @@ unmount_partlabel() {
fi
}

http_debug_request() {
URL="$1"
running=0

if nc -z 127.1 6543; then
running=1
fi

if [ "$running" = "0" ]; then
pkill -USR2 /opt/zededa/bin/zedbox
fi

printf "POST %s HTTP/1.0\r\n\r\n" "$URL" | nc 127.1 6543

if [ "$running" = "0" ]; then
printf "POST /stop HTTP/1.0\r\n\r\n" | nc 127.1 6543
fi
}

dump_stacks() {
http_debug_request "/dump/stacks"
}

dump_mem() {
http_debug_request "/dump/memory"
}

case "$1" in
exec) # shellcheck disable=SC2086
ID=$(${CTR_CMD} t ls | awk '/^'${2:-pillar}' /{print $2;}' 2>/dev/null)
Expand Down Expand Up @@ -136,6 +166,23 @@ __EOT__
help
fi
;;
http-debug) if [ -z "$2" ] || [ "$2" = "start" ]; then
pkill -USR2 /opt/zededa/bin/zedbox
echo "Listening on :6543 -- use 'eve http-debug stop' to stop"
elif [ "$2" = "stop" ]; then
printf "POST /stop HTTP/1.0\r\n\r\n" | nc 127.1 6543
else
echo "Unknown command; use 'start' or 'stop'"
fi
;;
dump-stacks)
dump_stacks
echo "Your information can be found in logread"
;;
dump-memory)
dump_mem
echo "Your information can be found in logread"
;;
verbose) # first lets find our piping process
for PIPE in $(pgrep cat); do
[ "$(readlink /proc/"$PIPE"/fd/0)" = /run/diag.pipe ] && break
Expand Down
3 changes: 3 additions & 0 deletions pkg/edgeview/src/basics.go
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ func initOpts() {
"techsupport",
"top",
"volume",
"pprof",
}

logdirectory = []string{
Expand Down Expand Up @@ -663,6 +664,8 @@ func printHelp(opt string) {
helpExample("cat/<path> -line <num>", "display only <num> of lines, like 'head' if <num> is positive, like 'tail' if the <num> is negative", false)
case "datastore":
helpOn("datastore", "display the device current datastore: EQDN, type, cipher information")
case "pprof":
helpOn("pprof", "pprof/on to turn on pprof; pprof/off to turn off again")
case "dmesg":
helpOn("dmesg", "display the device current dmesg information")
case "download":
Expand Down
35 changes: 35 additions & 0 deletions pkg/edgeview/src/system.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,12 @@ import (
"encoding/base64"
"encoding/json"
"encoding/pem"
"errors"
"fmt"
"io"
"net/http"
"os"
"os/exec"
"path/filepath"
"strconv"
"strings"
Expand Down Expand Up @@ -95,6 +97,8 @@ func runSystem(cmds cmdOpt, sysOpt string) {
getDmesg()
} else if strings.HasPrefix(opt, "tar/") {
getTarFile(opt)
} else if strings.HasPrefix(opt, "pprof") {
togglePprof(opt)
} else {
fmt.Printf("opt %s: not supported yet\n", opt)
}
Expand Down Expand Up @@ -982,6 +986,37 @@ func dispAFile(f os.FileInfo) {
fmt.Printf("%s, %v, %d, %s\n", f.Mode().String(), f.ModTime(), f.Size(), f.Name())
}

func togglePprof(opt string) {
toggle := strings.SplitN(opt, "pprof/", 2)
if len(toggle) != 2 {
fmt.Printf("pprof needs to be either pprof/on or pprof/off\n")
return
}

if toggle[1] == "on" {
runPprof()
}
if toggle[1] == "off" {
stopPprof()
}
}

func runPprof() {
cmd := exec.Command("/usr/bin/pkill", "-USR2", "/opt/zededa/bin/zedbox")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you expect the client side to query the pprof stats quite soon? otherwise the tcp will timeout if no activity. Or in rare case the cmd run has error, we need to exit here. Would be more flexible to offer more options. For example, with 'pprof/on' to trigger it to start and send back status of success or not and print out the port being used, 'pprof/off' to shut it down. either use general tcp/localhost:port, or do a 'pprof/run' to do all 3.
also, add the 'help' line in the basics.go for the new command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"do a 'pprof/run' to do all 3" <- that was my initial plan, we can just drop
5ea17eb
and it works that way.

err := cmd.Run()
if err != nil {
fmt.Printf("could not signal zedbox to run pprof")
}
}

func stopPprof() {
_, err := http.Post("http://localhost:6543/stop", "", nil)
if err != nil && !errors.Is(err, io.EOF) {
fmt.Printf("could not stop pprof: %+v\n", err)
}
}

func getTarFile(opt string) {
execCmd := strings.SplitN(opt, "tar/", 2)
if len(execCmd) != 2 {
Expand Down
183 changes: 140 additions & 43 deletions pkg/pillar/agentlog/agentlog.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,24 @@ package agentlog

import (
"fmt"
"io"
"net/http"
"net/http/pprof"
"os"
"os/signal"
"runtime"
dbg "runtime/debug"
"sort"
"strings"
"sync"
"sync/atomic"
"syscall"
"time"

"github.com/lf-edge/eve/pkg/pillar/base"
"github.com/lf-edge/eve/pkg/pillar/types"
fileutils "github.com/lf-edge/eve/pkg/pillar/utils/file"
"github.com/satori/go.uuid"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the need for this change; the package is referenced by uuid in any case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just done by gopls/goimports.
I've read a bit on golang/go#28428 and it seems it speeds up checking for imports.

uuid "github.com/satori/go.uuid"
"github.com/sirupsen/logrus"
)

Expand Down Expand Up @@ -149,58 +153,151 @@ func (hook *SkipCallerHook) Levels() []logrus.Level {

// Wait on channel then handle the signals
func handleSignals(log *base.LogObject, agentName string, agentPid int, sigs chan os.Signal) {
agentDebugDir := fmt.Sprintf("%s/%s/", types.PersistDebugDir, agentName)
sigUsr1FileName := agentDebugDir + "/sigusr1"
sigUsr2FileName := agentDebugDir + "/sigusr2"

for {
select {
case sig := <-sigs:
log.Functionf("handleSignals: received %v\n", sig)
switch sig {
case syscall.SIGUSR1:
stacks := getStacks(true)
stackArray := strings.Split(stacks, "\n\n")

sigUsr1File, err := os.OpenFile(sigUsr1FileName,
os.O_WRONLY|os.O_CREATE|os.O_SYNC|os.O_TRUNC, 0755)
if err == nil {
for _, stack := range stackArray {
// This goes to /persist/agentdebug/<agentname>/sigusr1 file
sigUsr1File.WriteString(stack + "\n\n")
}
sigUsr1File.Close()
} else {
log.Errorf("handleSignals: Error opening file %s with: %s", sigUsr1FileName, err)
}

usr1LogObject := base.EnsureLogObject(log, base.SigUSR1StacksType,
"", uuid.UUID{}, string(base.SigUSR1StacksType))
if usr1LogObject != nil {
log.Warnf("SIGUSR1 triggered with %d stacks", len(stackArray))
for _, stack := range stackArray {
usr1LogObject.Warnf("%v", stack)
}
log.Warnf("SIGUSR1: end of stacks")
}
dumpStacks(log, agentName)
case syscall.SIGUSR2:
log.Warnf("SIGUSR2 triggered memory info:\n")
sigUsr2File, err := os.OpenFile(sigUsr2FileName,
os.O_WRONLY|os.O_CREATE|os.O_SYNC|os.O_TRUNC, 0755)
if err != nil {
log.Errorf("handleSignals: Error opening file %s with: %s", sigUsr2FileName, err)
} else {
// This goes to /persist/agentdebug/<agentname>/sigusr2 file
sigUsr2File.WriteString("SIGUSR2 triggered memory info:\n")
}
go listenDebug(log, agentName)
}
}
}
}

logMemUsage(log, sigUsr2File)
logMemAllocationSites(log, sigUsr2File)
if sigUsr2File != nil {
sigUsr2File.Close()
}
func dumpMemoryInfo(log *base.LogObject, fileName string) {
log.Warnf("SIGUSR2 triggered memory info:\n")
sigUsr2File, err := os.OpenFile(fileName,
os.O_WRONLY|os.O_CREATE|os.O_SYNC|os.O_TRUNC, 0755)
if err != nil {
log.Errorf("handleSignals: Error opening file %s with: %s", fileName, err)
} else {
// This goes to /persist/agentdebug/<agentname>/sigusr2 file
_, err := sigUsr2File.WriteString("SIGUSR2 triggered memory info:\n")
if err != nil {
log.Errorf("could not write to %s: %+v", fileName, err)
}
}

logMemUsage(log, sigUsr2File)
logMemAllocationSites(log, sigUsr2File)
if sigUsr2File != nil {
sigUsr2File.Close()
}
}

func dumpStacks(log *base.LogObject, fileName string) {
stacks := getStacks(true)
stackArray := strings.Split(stacks, "\n\n")

sigUsr1File, err := os.OpenFile(fileName,
os.O_WRONLY|os.O_CREATE|os.O_SYNC|os.O_TRUNC, 0755)
if err == nil {
for _, stack := range stackArray {
// This goes to /persist/agentdebug/<agentname>/sigusr1 file
_, err := sigUsr1File.WriteString(stack + "\n\n")
if err != nil {
log.Errorf("could not write to %s: %+v", fileName, err)
}
}
sigUsr1File.Close()
} else {
log.Errorf("handleSignals: Error opening file %s with: %s", fileName, err)
}

usr1LogObject := base.EnsureLogObject(log, base.SigUSR1StacksType,
"", uuid.UUID{}, string(base.SigUSR1StacksType))
if usr1LogObject != nil {
log.Warnf("SIGUSR1 triggered with %d stacks", len(stackArray))
for _, stack := range stackArray {
usr1LogObject.Warnf("%v", stack)
}
log.Warnf("SIGUSR1: end of stacks")
}
}

func writeOrLog(log *base.LogObject, w io.Writer, msg string) {
if _, err := w.Write([]byte(msg)); err != nil {
log.Errorf("Could not write to %+v: %+v", w, err)
}
}

var listenDebugRunning atomic.Bool

func listenDebug(log *base.LogObject, agentName string) {
if listenDebugRunning.Swap(true) {
return
}

agentDebugDir := fmt.Sprintf("%s/%s/", types.PersistDebugDir, agentName)
stacksDumpFileName := agentDebugDir + "/sigusr1"
memDumpFileName := agentDebugDir + "/sigusr2"

mux := http.NewServeMux()

server := &http.Server{
Addr: "localhost:6543",
Handler: mux,
ReadHeaderTimeout: 10 * time.Second,
}

info := `
This server exposes the net/http/pprof API.</br>
For examples on how to use it, see: <a href="https://pkg.go.dev/net/http/pprof">https://pkg.go.dev/net/http/pprof</a></br>
<a href="debug/pprof/">pprof methods</a></br></br>
To create a flamegraph, do: go tool pprof -raw -output=cpu.txt 'http://localhost:6543/debug/pprof/profile?seconds=5';</br>
stackcollapse-go.pl cpu.txt | flamegraph.pl --width 4096 > flame.svg</br>
(both scripts can be found <a href="https://github.com/brendangregg/FlameGraph">here</a>)
`

mux.Handle("/", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zedcloud side has plan to move away from mux router due to: https://github.com/gorilla#gorilla-toolkit, not sure about the EVE status on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the status in EVE; I guess we need to discuss this soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naiming-zededa @christoph-zededa this is not gorillamux; just the simple http.NewServeMux() which we also use in zedrouter/server.go

w.Header().Set("Content-Type", "text/html")
writeOrLog(log, w, info)
}))
mux.Handle("/index.html", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/html")
writeOrLog(log, w, info)
}))

mux.Handle("/debug/pprof/", http.HandlerFunc(pprof.Index))
mux.Handle("/debug/pprof/cmdline", http.HandlerFunc(pprof.Cmdline))
mux.Handle("/debug/pprof/profile", http.HandlerFunc(pprof.Profile))
mux.Handle("/debug/pprof/symbol", http.HandlerFunc(pprof.Symbol))
mux.Handle("/debug/pprof/trace", http.HandlerFunc(pprof.Trace))
mux.Handle("/stop", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.Method == http.MethodPost {
server.Close()
listenDebugRunning.Swap(false)
} else {
http.Error(w, "Did you want to use POST method?", http.StatusMethodNotAllowed)
return
}
}))
mux.Handle("/dump/stacks", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.Method == http.MethodPost {
dumpStacks(log, stacksDumpFileName)
response := fmt.Sprintf("Stacks can be found in logread or %s\n", stacksDumpFileName)
writeOrLog(log, w, response)
} else {
http.Error(w, "Did you want to use POST method?", http.StatusMethodNotAllowed)
return
}
}))
mux.Handle("/dump/memory", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.Method == http.MethodPost {
dumpMemoryInfo(log, memDumpFileName)
response := fmt.Sprintf("Stacks can be found in logread or %s\n", memDumpFileName)
writeOrLog(log, w, response)
} else {
http.Error(w, "Did you want to use POST method?", http.StatusMethodNotAllowed)
return
}
}))

if err := server.ListenAndServe(); err != nil {
log.Errorf("Listening failed: %+v", err)
}
}

Expand Down