Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(inputs.procstat): Add ability to collect per-process socket statistics #15423

Merged
merged 7 commits into from
Jul 17, 2024

Conversation

srebhan
Copy link
Member

@srebhan srebhan commented May 29, 2024

Summary

This PR allows to collect socket statistics per process when specifying sockets in the properties setting. Furthermore, the PR adds an socket_protocols option to select only a subset of socket-types to collect.

Checklist

  • No AI generated code was used in this PR

Related issues

resolves #3436

@telegraf-tiger telegraf-tiger bot added area/procstat feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels May 29, 2024
@srebhan srebhan force-pushed the procstat_issue_3436 branch from b9f89e0 to 475d573 Compare May 29, 2024 22:31
@srebhan srebhan self-assigned this May 29, 2024
@phemmer
Copy link
Contributor

phemmer commented May 31, 2024

Have a few issues from a quick test I just did. My test involved specifying a single PID of an openvpn client. A subset of the output is below which I'll refer to when describing the issues.

Config:

[inputs.procstat]
pid_file = "pid"
tag_with = ["pid","protocol"]
properties = ["sockets"]
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix src="",rx_queue=0i,tx_queue=0i,inode=28715i,peer=13955i,state="established",pid=485078i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix peer=21103i,state="established",tx_queue=0i,inode=5046i,name="/run/dbus/system_bus_socke",pid=485078i,src="/run/dbus/system_bus_socket",rx_queue=0i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix state="established",src="/run/systemd/journal/stdout",tx_queue=0i,inode=4597i,rx_queue=0i,name="/run/systemd/journal/stdou",peer=3932i,pid=485078i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix state="established",rx_queue=0i,inode=4913i,pid=485078i,src="/run/user/1000/bus",tx_queue=0i,name="/run/user/1000/bu",peer=11525i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix inode=37i,name="/run/systemd/private
                                                                                                                            ",state="listen",pid=485078i,src="/run/systemd/private",rx_queue=0i,tx_queue=4096i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix state="established",src="@/tmp/.X11-unix/X0",name="/tmp/.X11-unix/X",pid=485078i,rx_queue=0i,tx_queue=0i,inode=104597i,peer=108836i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix tx_queue=0i,name="/run/user/1000/at-spi/bus_",peer=93635i,pid=485078i,state="established",src="/run/user/1000/at-spi/bus_0",rx_queue=0i,inode=104582i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix inode=6382879i,name="/run/rpcbind.sock
                                                                                                                              ",state="listen",pid=485078i,src="/run/rpcbind.sock",rx_queue=0i,tx_queue=4096i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix pid=485078i,src="/run/lvm/lvmpolld.socket",rx_queue=0i,tx_queue=4096i,inode=55i,name="/run/lvm/lvmpolld.socket
                                                                                                                                                                                                      ",state="listen" 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix state="listen",pid=485078i,src="/run/systemd/oom/io.system.ManagedOOM",rx_queue=0i,tx_queue=4096i,inode=13501i,name="/run/systemd/oom/io.system.ManagedOOM
                                                                                                                                                                                                                                                  " 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix rx_queue=0i,tx_queue=1i,inode=23771296i,name="/var/run/NetworkManager/nm-openvpn-6a9df689-697d-4c84-91a7-c28ecbaf0f17
                                                                                                                                                                                                             ",state="listen",pid=485078i,src="/var/run/NetworkManager/nm-openvpn-6a9df689-697d-4c84-91a7-c28ecbaf0f17" 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix tx_queue=0i,state="established",src="@/tmp/.ICE-unix/4058",rx_queue=0i,name="/tmp/.ICE-unix/405",peer=62791i,pid=485078i,inode=67948i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix pid=485078i,src="/run/pcscd/pcscd.comm",rx_queue=0i,tx_queue=0i,inode=37234i,name="/run/pcscd/pcscd.com",peer=28353i,state="established" 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix tx_queue=0i,inode=4818120i,peer=4820327i,name="/run/user/1000/gvfsd/socket-JL79tTj",state="established",pid=485078i,src="/run/user/1000/gvfsd/socket-JL79tTjR",rx_queue=0i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix state="established",pid=485078i,src="/tmp/.ssh_control-marlins.x.net-22-phemmer.NlCQ1Rd7SBCvOzsn",peer=25975509i,rx_queue=0i,tx_queue=0i,inode=25989282i,name="/tmp/.ssh_control-marlins.x.net-22-phemmer.NlCQ1Rd7SBCvOzs" 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix name="/run/gssproxy.default.sock
                                                                                                                        ",state="listen",pid=485078i,src="/run/gssproxy.default.sock",rx_queue=0i,tx_queue=10i,inode=13183i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix name="/run/gssproxy.sock
                                                                                                                ",state="listen",pid=485078i,src="/run/gssproxy.sock",rx_queue=0i,tx_queue=10i,inode=13184i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix state="established",src="/tmp/.ssh_control-truenas-22-phemmer.PFFhVyMRFVZ2Vf68",tx_queue=0i,inode=61278763i,name="/tmp/.ssh_control-truenas-22-phemmer.PFFhVyMRFVZ2Vf6",pid=485078i,rx_queue=0i,peer=61286458i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix inode=86220071i,name="/tmp/.ssh_control-ded7847.ded.x.net-22-phemmer.JROKi3NjWVIMQax6
                                                                                                                                                                                     ",state="listen",pid=485078i,src="/tmp/.ssh_control-ded7847.ded.x.net-22-phemmer.JROKi3NjWVIMQax6",rx_queue=0i,tx_queue=64i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix src="/run/nut/usbhid-ups-pc",name="/run/nut/usbhid-ups-p",tx_queue=0i,inode=7344006i,peer=7348619i,state="established",pid=485078i,rx_queue=0i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix peer=4812472i,pid=485078i,rx_queue=0i,tx_queue=0i,inode=4815963i,name="/run/user/1000/gvfsd/socket-aNHi5Bd",state="established",src="/run/user/1000/gvfsd/socket-aNHi5Bdw" 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix peer=5328413i,state="established",tx_queue=0i,name="/run/user/1000/gvfsd/socket-oWM7GFd",rx_queue=0i,inode=5338787i,pid=485078i,src="/run/user/1000/gvfsd/socket-oWM7GFd5" 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix rx_queue=0i,tx_queue=4096i,inode=1290i,name="/run/systemd/journal/io.systemd.journal
                                                                                                                                                                            ",state="listen",pid=485078i,src="/run/systemd/journal/io.systemd.journal" 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix peer=11509i,src="/run/user/1000/pipewire-0",tx_queue=0i,inode=18888i,name="/run/user/1000/pipewire-",state="established",pid=485078i,rx_queue=0i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix pid=485078i,src="/run/systemd/userdb/io.systemd.DynamicUser",rx_queue=0i,tx_queue=4096i,inode=10270i,name="/run/systemd/userdb/io.systemd.DynamicUser
                                                                                                                                                                                                                                             ",state="listen" 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix inode=10271i,name="/run/systemd/io.system.ManagedOOM
                                                                                                                                            ",state="listen",pid=485078i,src="/run/systemd/io.system.ManagedOOM",rx_queue=0i,tx_queue=4096i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix src="/run/user/1000/pulse/native",rx_queue=0i,inode=18884i,name="/run/user/1000/pulse/nativ",peer=18302i,state="established",pid=485078i,tx_queue=0i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix inode=17685i,name="/run/avahi-daemon/socket
                                                                                                                                   ",state="listen",pid=485078i,src="/run/avahi-daemon/socket",rx_queue=0i,tx_queue=4096i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix tx_queue=4096i,inode=17687i,name="/run/cups/cups.sock
                                                                                                                                             ",state="listen",pid=485078i,src="/run/cups/cups.sock",rx_queue=0i 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix state="listen",pid=485078i,src="/run/docker.sock",rx_queue=0i,tx_queue=4096i,inode=17689i,name="/run/docker.sock
                                                                                                                                                                                                        " 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix state="listen",pid=485078i,src="/run/libvirt/libvirt-sock",rx_queue=0i,tx_queue=1000i,inode=7832i,name="/run/libvirt/libvirt-sock
                                                                                                                                                                                                                         " 1717164019000000000
procstat_socket,host=whistler,pid=485078,pidfile=pid,process_name=openvpn,protocol=unix state="listen",pid=485078i,src="/run/libvirt/libvirt-admin-sock",rx_queue=0i,tx_queue=20i,inode=7834i,name="/run/libvirt/libvirt-admin-sock
                                                                                                                                                                                                                                   " 1717164019000000000
  • First issue is that I have no idea where it's getting all these sockets from. They're not coming from pid 485078. That process only has 7 open file descriptors:
openvpn 485078 nm-openvpn    0r   CHR                1,3      0t0        4 /dev/null
openvpn 485078 nm-openvpn    1u  unix 0x0000000035dcc4c2      0t0     1808 type=STREAM (CONNECTED)
openvpn 485078 nm-openvpn    2u  unix 0x0000000035dcc4c2      0t0     1808 type=STREAM (CONNECTED)
openvpn 485078 nm-openvpn    3u  unix 0x00000000da6cd4b8      0t0 23759776 type=DGRAM (CONNECTED)
openvpn 485078 nm-openvpn    4u  unix 0x0000000029d5cad3      0t0 23759777 /var/run/NetworkManager/nm-openvpn-ad223269-f348-4898-ba0a-b1ad0af71355 type=STREAM (LISTEN)
openvpn 485078 nm-openvpn    5u  unix 0x00000000e32a5ebe      0t0 23771182 /var/run/NetworkManager/nm-openvpn-ad223269-f348-4898-ba0a-b1ad0af71355 type=STREAM (CONNECTED)
openvpn 485078 nm-openvpn    6u  IPv4           23771183      0t0      UDP *:42111 
openvpn 485078 nm-openvpn    7u   CHR             10,200     0t52      667 /dev/net/tun
  • Another issue is that many of the name fields are split among 2 lines. I'm not exactly sure what's going on here, as it's not just a newline character.

  • The name field is often truncated. If you look in the input, you'll see name="/run/user/1000/pulse/nativ". There's supposed to be an e on the end. There are a few others in there too.

  • Pid is reported twice. Once as a tag, once as a field.

  • I see no difference between src and name (other than that name is truncated). src should probably be omitted for unix sockets.

  • Every metric is in the same series (has same set of tags) and will collide. Granted this can be fixed with a processor to change some of the fields into tags, but smells fishy having to rely on that, especially since there's a lot of overlap in responsibility between the processor and the tag_with config param (which is insufficient for this task). People might not realize that a processor exists for this purpose, or that it is needed to address the issue.

@srebhan
Copy link
Member Author

srebhan commented Jun 4, 2024

@phemmer thanks for testing! Let me answer your comments:

First issue is that I have no idea where it's getting all these sockets from. They're not coming from pid 485078. That process only has 7 open file descriptors:

I'm reading /proc/<pid>/net/unix for the connections and then compared to netstat -alpen --unix | grep <pid> and it looks like both match. How did you get your output? Do we only want a subset of the sockets listed?

Another issue is that many of the name fields are split among 2 lines. I'm not exactly sure what's going on here, as it's not just a newline character.

The name field is often truncated. If you look in the input, you'll see name="/run/user/1000/pulse/nativ". There's supposed to be an e on the end. There are a few others in there too.

This is the raw "name" as delivered by the syscall. I can try to clean this up or simply only use the "src"!? I guess the truncated strings are due to the strange chars at the end...

Pid is reported twice. Once as a tag, once as a field.

Will fix.

I see no difference between src and name (other than that name is truncated). src should probably be omitted for unix sockets.

So probably we should use the src as name...

Every metric is in the same series (has same set of tags) and will collide. Granted this can be fixed with a processor to change some of the fields into tags, but smells fishy having to rely on that, especially since there's a lot of overlap in responsibility between the processor and the tag_with config param (which is insufficient for this task). People might not realize that a processor exists for this purpose, or that it is needed to address the issue.

I agree, what do you think should be "tagable"? Maybe source, destination and sockname including ports if any?

@srebhan srebhan added the waiting for response waiting for response from contributor label Jun 10, 2024
@phemmer
Copy link
Contributor

phemmer commented Jun 10, 2024

I'm reading /proc/<pid>/net/unix for the connections and then compared to netstat -alpen --unix | grep <pid> and it looks like both match. How did you get your output? Do we only want a subset of the sockets listed?

/proc/$pid/net/unix isn't scoped to that PID. It's scoped to the PID's namespace (the entire net directory is). So you will see data that is unrelated to the process.
You have to scan the process' open file descriptors, match the inode number.

what do you think should be "tagable"? Maybe source, destination and sockname including ports if any?

for normal IP, to be uniquely identifiable, you need source addr, source port, destination addr, & destination port. For unix sockets it's a bit more messy as you can have multiple connections with the same source and destination. For that you have to include inode as well.

Edit: Oh, I might also go with "name" for unix sockets, not "src". Leave "src" for IP sockets.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jun 10, 2024
@srebhan srebhan force-pushed the procstat_issue_3436 branch from 1ec628a to a3f0ff8 Compare June 13, 2024 15:53
@srebhan
Copy link
Member Author

srebhan commented Jun 13, 2024

@phemmer I think my latest commits address all your point!? Looking forward to your test-results and comments!

@srebhan srebhan added the waiting for response waiting for response from contributor label Jun 17, 2024
@phemmer
Copy link
Contributor

phemmer commented Jun 17, 2024

Seems to work fine. I've only done some light testing, but it appears to work and the functionality seems reasonable. Sockets without a name show up as name="", but omitting name could result in creation of a new series (depending on the storage engine (e.g. InfluxDB)). So having a present-but-empty value seems acceptable (might be nice to have a processor plugin that can drop fields based on value, but that's a separate issue). Also note that when name is empty without tagging inode, it's still possible to have collisions. But this might also be acceptable, as I don't know that tagging on inode is useful. Probably best to use an aggregator in this case to sum up these nameless entries.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jun 17, 2024
@srebhan
Copy link
Member Author

srebhan commented Jun 17, 2024

I agree to your assessment. Would it be helpful to provide the inode in the name field/tag in case it is empty? E.g. name="inode-12345?

@srebhan srebhan added the waiting for response waiting for response from contributor label Jun 20, 2024
@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Jul 4, 2024

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Forums or provide additional details in this issue and reqeust that it be re-opened. Thank you!

@telegraf-tiger telegraf-tiger bot closed this Jul 4, 2024
@srebhan
Copy link
Member Author

srebhan commented Jul 8, 2024

After talking to the team, we decided to go with my suggestion and use the inode-<xyz> as name if the name is empty otherwise...

@srebhan srebhan reopened this Jul 8, 2024
@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jul 8, 2024
@srebhan srebhan force-pushed the procstat_issue_3436 branch from ec84093 to 6c6b618 Compare July 17, 2024 11:00
@srebhan srebhan added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Jul 17, 2024
@srebhan srebhan assigned powersj and DStrand1 and unassigned srebhan Jul 17, 2024
@telegraf-tiger
Copy link
Contributor

go.mod Show resolved Hide resolved
@powersj powersj merged commit fd8cbbf into influxdata:master Jul 17, 2024
27 checks passed
@github-actions github-actions bot added this to the v1.32.0 milestone Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/procstat feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Monitor application socket buffers
4 participants