Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up unix socket #739

Merged
merged 2 commits into from
Aug 20, 2022
Merged

Clean up unix socket #739

merged 2 commits into from
Aug 20, 2022

Conversation

pauldthomson
Copy link
Contributor

On MacOS it seems subsequent attempts to bind to the abstract Unix
socket fail without first explicitly removing it. This issue seems to be
MacOS-specific as it doesn't occur on an Ubuntu VM (using lima).

Fixes #738

Signed-off-by: Paul Thomson thomsonp83@gmail.com

On MacOS it seems subsequent attempts to bind to the abstract Unix
socket fail without first explicitly removing it. This issue seems to be
MacOS-specific as it doesn't occur on an Ubuntu VM (using `lima`).

Fixes sigstore#738

Signed-off-by: Paul Thomson <thomsonp83@gmail.com>
@codecov-commenter
Copy link

codecov-commenter commented Aug 16, 2022

Codecov Report

Merging #739 (5cd969c) into main (29c2b20) will decrease coverage by 0.14%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main     #739      +/-   ##
==========================================
- Coverage   54.33%   54.18%   -0.15%     
==========================================
  Files          36       36              
  Lines        2251     2257       +6     
==========================================
  Hits         1223     1223              
- Misses        936      942       +6     
  Partials       92       92              
Impacted Files Coverage Δ
cmd/app/grpc.go 45.16% <0.00%> (-3.12%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@pauldthomson
Copy link
Contributor Author

Not sure if the deployment is failing due to my change, if it looks that way let me know :)

@bobcallaway
Copy link
Member

Not sure if the deployment is failing due to my change, if it looks that way let me know :)

The error log message is:

2022-08-16T05:26:42.987551935Z stderr F {"severity":"fatal","ts":1660627602.9874976,"caller":"app/grpc.go:113","message":"unlinkat ./@fulcio-legacy-grpc-socket: read-only file system","stacktrace":"github.com/sigstore/fulcio/cmd/app.(*grpcServer).startUnixListener.func1\n\tgit.luolix.top/sigstore/fulcio/cmd/app/grpc.go:113"}

cmd/app/grpc.go Outdated
Comment on lines 112 to 115
if err := os.RemoveAll(LegacyUnixDomainSocket); err != nil {
log.Logger.Fatal(err)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this, try adding lis.SetUnlinkOnClose(true) on line 120 after the defer lis.Close().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I don't think that will work. It seems the @ in the socket address means it's in the abstract namespace which doesn't belong to a file on the filesystem, so not sure why a readonly filesystem won't let me remove it (yet will let it be created?)
Also, digging in to what os.RemoveAll does, leads to this (https://github.com/golang/go/blob/e49e8764553bf510b5d9f6fb38aeec0850ec6672/src/os/file_unix.go#L301):

	e := ignoringEINTR(func() error {
		return syscall.Unlink(name)
	})

so it seems that it's trying to do an unlink already? And lastly, looking into the lis.SetUnlinkOnClose(true) call, it seems it won't unlink due to it being an abstract domain socket (i.e. starting with @):

func (ln *UnixListener) close() error {
	// The operating system doesn't clean up
	// the file that announcing created, so
	// we have to clean it up ourselves.
	// There's a race here--we can't know for
	// sure whether someone else has come along
	// and replaced our socket name already--
	// but this sequence (remove then close)
	// is at least compatible with the auto-remove
	// sequence in ListenUnix. It's only non-Go
	// programs that can mess us up.
	// Even if there are racy calls to Close, we want to unlink only for the first one.
	ln.unlinkOnce.Do(func() {
		if ln.path[0] != '@' && ln.unlink {
			syscall.Unlink(ln.path)
		}
	})
	return ln.fd.Close()
}

I'm not super well versed on the nuances of Unix Domain Sockets so will need to do more digging unless someone else has ideas...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this:
https://github.com/golang/go/blob/e49e8764553bf510b5d9f6fb38aeec0850ec6672/src/os/file_unix.go#L307
probably explains why the os.RemoveAll fails with RO... Whereas I guess creating the socket in the abstract namespace only needs to create a file descriptor

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further review, abstract domain sockets are really a linux-only feature. We should probably wrap your proposed change with a check for if runtime.GOOS != "linux"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so the question for me still remains, how are these being handled on MacOS, as they still don't seem to be bound to a file on the filesystem. And I'd like to get to the bottom of why it's hanging about when it doesn't on Linux, but that might be getting a bit too far off course.

I guess I'm wondering now if:
a) there's any point in "fixing" this since it's the legacy GRPC server, like does it have a future anyway? and
b) if this is intended to be run in containers anyway is MacOS nuance something that we should worry about?

I'll just add the condition you've suggested for now, and at least then if the legacy server goes away so does the unnecessary mess :P

Copy link
Contributor Author

@pauldthomson pauldthomson Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are these being handled on MacOS

🤦 in the most obvious way, just a file in the local dir... will make the change now

MacOS doesn't have abstract unix domain sockets, and so creates a file
in the local dir which isn't removed when the server stops (due to the
handling of sockets beginning with '@')

This file needs to be removed on subsequent runs otherwise there's a
bind error

Signed-off-by: Paul Thomson <thomsonp83@gmail.com>
@bobcallaway bobcallaway merged commit ab74398 into sigstore:main Aug 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MacOS can't re-bind to abstract Unix Domain Socket
4 participants