Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gssproxy] retry writing to /proc/net/rpc/use-gss-proxy #85

Merged
merged 1 commit into from
Oct 18, 2023

Conversation

Alphix
Copy link
Contributor

@Alphix Alphix commented Oct 18, 2023

This improves the handling of cases where the auth_rpcgss module has not yet been loaded when gssproxy is started.

@Alphix
Copy link
Contributor Author

Alphix commented Oct 18, 2023

With this patch applied (note the last line):

Oct 18 16:55:30 qtest1 systemd[1]: Starting gssproxy.service - GSSAPI Proxy Daemon...
Oct 18 16:55:30 qtest1 gssproxy[959]: [2023/10/18 14:55:30]: Debug Level changed to 3
Oct 18 16:55:30 qtest1 gssproxy[959]: k5tracer_thread started!
Oct 18 16:55:30 qtest1 gssproxy[959]: [2023/10/18 14:55:30]: Service: nfs-server, Keytab: /etc/krb5.keytab.d/nfs.keytab, Enctype: 18
Oct 18 16:55:30 qtest1 gssproxy[961]: [2023/10/18 14:55:30]: Kernel doesn't support GSS-Proxy (can't open /proc/net/rpc/use-gss-proxy: 2 (No such file or directory))
Oct 18 16:55:30 qtest1 systemd[1]: Started gssproxy.service - GSSAPI Proxy Daemon.
Oct 18 16:55:30 qtest1 gssproxy[961]: [2023/10/18 14:55:30]: Initialization complete.
Oct 18 16:55:50 qtest1 gssproxy[961]: [2023/10/18 14:55:50]: Kernel GSS-Proxy support enabled

This is an alternative to PR #84

src/gp_init.c Outdated Show resolved Hide resolved
src/gp_init.c Show resolved Hide resolved
src/gp_init.c Outdated Show resolved Hide resolved
/* failure, but the auth_rpcgss module might not be loaded yet */
if (!gpctx->retry_proc_ev) {
gpctx->retry_proc_ev = verto_add_timeout(gpctx->vctx,
VERTO_EV_FLAG_PERSIST,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means there will be cases where this may fire over and over forever if init_proc_nfsd_once() always fail ... is this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it'll fire every 10 seconds forever....but it will only do so if:
a) the user has a nfs-server configuration for gssproxy; and
b) /proc/net/rpc/use-gss-proxy doesn't exist (i.e. the auth_rpcgss module isn't loaded)

I do realize that the default gssproxy installation includes /etc/gssproxy/24-nfs-server.conf though....I still think a wakeup every 10s is acceptable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think maybe we should have a very simple backoff.
Store the delay in another static variable, start with one second delay and double it each time we call the function until it reaches 1024 seconds (slightly more than 15 minutes), and then keep it there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that has the potential of being quite confusing. The admin installs gssproxy...starts it, then installs nfs-utils a couple of minutes later....it doesn't work straight away but 15 minutes later it magically starts working?

libverto already has a 60s wakeup interval even when gssproxy isn't doing anything....maybe use that as the upper limit?

root@qtest1:~# ps ax | grep gssproxy
    961 ?        Ssl    0:00 /usr/sbin/gssproxy -D
root@qtest1:~# strace -t -p 961 (that's gssproxy)
strace: Process 961 attached
19:08:09 epoll_wait(8, [], 64, 59743)   = 0
19:09:09 epoll_wait(8, [], 64, 59743)   = 0
19:10:09 epoll_wait(8, [], 64, 59743)   = 0
19:11:09 epoll_wait(8, [], 64, 59743)   = 0
19:12:09 epoll_wait(8, [], 64, 59743)   = 0
19:13:08 epoll_wait(8, [], 64, 59743)   = 0
19:14:08 epoll_wait(8, [], 64, 59743)   = 0

Though I really don't think a 10s interval is a real problem either....it's basically 1-2 system calls per 10s....won't even show up in top on the original Raspberry Pi :)

@Alphix
Copy link
Contributor Author

Alphix commented Oct 18, 2023

I just pushed an updated version

Copy link
Contributor

@simo5 simo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit.
I do not insist on the back off, but if we do not back off I think we need to find another way to limit wakeups. We can also simply give up after a certain number of attempts.

src/gp_init.c Show resolved Hide resolved
This improves the handling of cases where the auth_rpcgss module has not yet
been loaded when gssproxy is started.

Signed-off-by: David Härdeman <david@hardeman.nu>
@Alphix
Copy link
Contributor Author

Alphix commented Oct 18, 2023

One nit. I do not insist on the back off, but if we do not back off I think we need to find another way to limit wakeups. We can also simply give up after a certain number of attempts.

I understand your concerns about wakeups....but simply giving up means the very gotcha that the patch is meant to solve will still be there. Admin installs gssproxy, configures it, starts it....end of working day.....next day the admin installs nfs-kernel-server (or whatever the package is called on $DISTRO_OF_CHOICE)...it doesn't work...

@simo5
Copy link
Contributor

simo5 commented Oct 18, 2023

Ok let's compromise on an upper limit of 60 seconds.
After all it is a misconfiguration, if you never want to use gssproxy for nfs just remove the config and all the wakeups will go away.

@simo5
Copy link
Contributor

simo5 commented Oct 18, 2023

So after some thinking I decided the good is good enough, if I get reports that 10 seconds wakeups are a problem and there is a legitimate reason for them beyond "I did not care to properly configure it", we'll deal with it.

@simo5 simo5 merged commit fb8737b into gssapi:main Oct 18, 2023
3 checks passed
@simo5
Copy link
Contributor

simo5 commented Oct 18, 2023

Aaaand coverity complains that we are closing fd=-1 ...
I will fix in a quick follow PR

@Alphix
Copy link
Contributor Author

Alphix commented Oct 19, 2023

So after some thinking I decided the good is good enough, if I get reports that 10 seconds wakeups are a problem and there is a legitimate reason for them beyond "I did not care to properly configure it", we'll deal with it.

Excellent, thank you for the quick feedback loop and handling of the PR.

Now, can we have a 0.9.2 release as well? 😃

@Alphix Alphix deleted the loop-proc-nfsd branch October 19, 2023 10:11
@simo5
Copy link
Contributor

simo5 commented Oct 19, 2023

Somehow I knew this question was coming :)

@Alphix
Copy link
Contributor Author

Alphix commented Oct 19, 2023

Somehow I knew this question was coming :)

I owe you a beer :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants