Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use socket-bounded file wrapper #58

Merged
merged 9 commits into from
Mar 20, 2021

Conversation

anton-ryzhov
Copy link
Contributor

Hello,

I faced with strange behavior when used this tool — random files/connections were closed after I inspecting session. After debug I found the root cause.

os.fdopen-created file objects only bounded to the fd value. And they do call self.close() on garbage collection. So if such instance is GCed after closing of original socket — it may close another file/connection that reused same fd.

Demo:

import os, socket

local_socket, remote_socket = socket.socketpair()
print('Sockets FDs:', local_socket.fileno(), remote_socket.fileno())

a_file_wrapper = os.fdopen(local_socket.fileno())

local_socket.send(b'x')    # Successful send

# Close original connection, open new
local_socket.close(); remote_socket.close()

new_local_socket, new_remote_socket = socket.socketpair()
print('Sockets FDs:', new_local_socket.fileno(), new_remote_socket.fileno())

new_local_socket.send(b'x')    # Successful send

# Deletion of the file wrapper…
del a_file_wrapper

# … closes unrelated connection
new_local_socket.send(b'x')    # OSError: [Errno 9] Bad file descriptor

socket.makefile() creates fileobject that is bounded to exact socket instance, so it never closes unrelated fds.

@anton-ryzhov anton-ryzhov changed the title Use bounded file wrapper Use socket-bounded file wrapper Mar 13, 2021
@ionelmc
Copy link
Owner

ionelmc commented Mar 15, 2021

I expect this to break the eventlet/gevent support as makefile would be the patched makefile that does all sorts of bad things.

@ionelmc
Copy link
Owner

ionelmc commented Mar 15, 2021

I guess your actual problem was a double-close bug... can you give more details?

@anton-ryzhov
Copy link
Contributor Author

Don't know how gevent could be affected by that change. Worth testing, yes.

I guess your actual problem was a double-close bug... can you give more details?

What is double closing bug?

Originally I saw that — ManholeConsole instance and connection file wrapper were leaked after an error (#59) and got over real connection socket. And then when that link was removed (another manhole connection, another exception stored in sys and replaced the old one) — it closed some random file/connection.

@ionelmc
Copy link
Owner

ionelmc commented Mar 15, 2021

Double-close bugs are situations where there would be two close() calls for the same fd.

Eg, that .detach() call was supposed to prevent such problems.

@anton-ryzhov
Copy link
Contributor Author

I saw these detaches but didn't get why do you need them.

Are these double close real issues? I thought it was safe to call it many times — only first does close and others just no-op.

@ionelmc
Copy link
Owner

ionelmc commented Mar 15, 2021

Double close bugs are very dangerous imo. Worst i've seen: https://bugs.python.org/issue18748

@anton-ryzhov
Copy link
Contributor Author

Hm, that's exactly what I saw there but the opposite way. In your bug.py sys.stdout = os.fdopen(self.fh.fileno(), 'w') wrapper is closed first, fd got reused and then del fh closes some other "random files/connections".

It's not a double close of the same file[object] (which is safe), but misused fdopen that created dangerous situation when 2 different GC- tracked objects refer same fd. It looks more like sort of "double free" error.

fdopen(closefd=False) should make that less dangerous, but still not completely safe.

Double close of same fileobj / socketobj are fine. Just confirmed that with strace:

print('CREATESOCK')
# write(1</dev/pts/2>, "CREATESOCK\n", 11) = 11
sock, _ = socket.socketpair()
# socketpair(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, [3<UNIX:[12057326->12057327]>, 4<UNIX:[12057327->12057326]>]) = 0
print('CLOSESOCK1')
# write(1</dev/pts/2>, "CLOSESOCK1\n", 11) = 11
sock.close()
# close(3<UNIX:[12057326->12057327]>)     = 0
print('CLOSESOCK2')
# write(1</dev/pts/2>, "CLOSESOCK2\n", 11) = 11
sock.close()
# <noop>
print('DONE')
# write(1</dev/pts/2>, "DONE\n", 5)       = 5

print('CREATEFILE')
# write(1</dev/pts/2>, "CREATEFILE\n", 11) = 11
file = open('/etc/hostname')
# openat(AT_FDCWD, "/etc/hostname", O_RDONLY|O_CLOEXEC) = 3</etc/hostname>
print('CLOSEFILE1')
# write(1</dev/pts/2>, "CLOSEFILE1\n", 11) = 11
file.close()
# close(3</etc/hostname>)                 = 0
print('CLOSEFILE2')
# write(1</dev/pts/2>, "CLOSEFILE2\n", 11) = 11
file.close()
# <noop>
print('DONE')
# write(1</dev/pts/2>, "DONE\n", 5)       = 5

@anton-ryzhov
Copy link
Contributor Author

So what do you think?

@ionelmc
Copy link
Owner

ionelmc commented Mar 17, 2021

Well, this is not dangerous: os.fdopen(client.detach()) because client.detach would make client.close() a no-op.

This is indeed dangerous: os.fdopen(client.fileno()) as both close the same fd.

You wouldn't happen to use python 2.7?

Copy link
Contributor Author

@anton-ryzhov anton-ryzhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested on 3.8

@@ -296,13 +294,14 @@ def handle_connection_repl(client):
if _MANHOLE.redirect_stderr:
patches.append(('w', ('stderr', '__stderr__')))
try:
client_fd = client.fileno()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there was no client.detach here, only fileno.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooooof, looks like a bug.

@@ -267,7 +265,7 @@ def exit():
raise ExitExecLoop()

client.settimeout(None)
fh = os.fdopen(client.detach() if hasattr(client, 'detach') else client.fileno())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here that might be safer (until it hasattr it)

@anton-ryzhov
Copy link
Contributor Author

I expect this to break the eventlet/gevent support as makefile would be the patched makefile that does all sorts of bad things.

I've tried with eventlet.monkey_patch(socket=True) and gevent.monkey.patch_all() — it works as usual. But I never used these libs, so maybe I missed some corner cases

@ionelmc
Copy link
Owner

ionelmc commented Mar 18, 2021

How about this change instead?

@@ -265,11 +265,15 @@ def handle_connection_exec(client):
 
     def exit():
         raise ExitExecLoop()
 
     client.settimeout(None)
-    fh = os.fdopen(client.detach() if hasattr(client, 'detach') else client.fileno())
+    if hasattr(client, 'detach'):
+        client_fd = client.detach()
+    else:
+        client_fd = _ORIGINAL_DUP(client.fileno())
+    fh = _ORIGINAL_FDOPEN(client_fd)
 
     with closing(client):
         with closing(fh):
             try:
                 payload = fh.readline()
@@ -294,11 +298,14 @@ def handle_connection_repl(client):
     old_interval = getinterval()
     patches = [('r', ('stdin', '__stdin__')), ('w', ('stdout', '__stdout__'))]
     if _MANHOLE.redirect_stderr:
         patches.append(('w', ('stderr', '__stderr__')))
     try:
-        client_fd = client.fileno()
+        if hasattr(client, 'detach'):
+            client_fd = client.detach()
+        else:
+            client_fd = _ORIGINAL_DUP(client.fileno())
         for mode, names in patches:
             for name in names:
                 backup.append((name, getattr(sys, name)))
                 setattr(sys, name, _ORIGINAL_FDOPEN(client_fd, mode, 1 if PY3 else 0))
         try:

@anton-ryzhov
Copy link
Contributor Author

What's the benefit of doing it this way?

As I understand socket.makefile effectively similar to os.fdopen(client.detach()), but more explicit and supported/recommended way for that.

Why do you prefer fdopen? Is there a case when makefile doesn't work?

@ionelmc
Copy link
Owner

ionelmc commented Mar 18, 2021

There's no easy way to get an unmonkeypatched makefile afaik. If we'd follow recommended ways of doing things we wouldn't have this project in the first place, just saying :-)

@ionelmc
Copy link
Owner

ionelmc commented Mar 18, 2021

So what I would like to know is if my proposed change fixes the problem in your project.

@anton-ryzhov
Copy link
Contributor Author

If we'd follow recommended ways of doing things we wouldn't have this project in the first place, just saying :-)

Haha, That's true. But it's awesome and very useful!

There's no easy way to get an unmonkeypatched makefile afaik.

Is it monkeypatched at all? You mean by eventlet / gevent? If yes, it is it a problem? I've tested it and it worked fine. Do you have a case when makefile doesn't work?

So what I would like to know is if my proposed change fixes the problem in your project.

Maybe, need to test

@ionelmc
Copy link
Owner

ionelmc commented Mar 19, 2021

@anton-ryzhov so #60 is mostly passing. I think that pypy failure can be ignored (might be just be hitting the jit the wrong way there). Can you test that it solves your problem? pip install https://github.com/ionelmc/python-manhole/archive/refs/heads/fix-double-close.zip

@anton-ryzhov
Copy link
Contributor Author

I've tested it. Well… I don't think it solves the problem without dup for every instance.

That's how it closes the connection now (strace log):

write(4<UNIX:[5205940->5208023,"/tmp/manhole-31407"]>, "\n######### ProcessID=31407, Thre"..., 1739) = 1739
write(4<UNIX:[5205940->5208023,"/tmp/manhole-31407"]>, "\n", 1) = 1
write(4<UNIX:[5205940->5208023,"/tmp/manhole-31407"]>, "Python 3.8.2 (default, May  2 20"..., 165) = 165
write(4<UNIX:[5205940->5208023,"/tmp/manhole-31407"]>, ">>> ", 4) = 4
read(4<UNIX:[5205940->5208023,"/tmp/manhole-31407"]>, "", 8192) = 0
write(4<UNIX:[5205940,"/tmp/manhole-31407"]>, "\n", 1) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=31407, si_uid=1000} ---
close(4<UNIX:[5205940,"/tmp/manhole-31407"]>) = 0
      # here fd could be reused and closed right below
close(4)                          = -1 EBADF (Bad file descriptor)
close(4)                          = -1 EBADF (Bad file descriptor)
close(4)                          = -1 EBADF (Bad file descriptor)
close(4)                          = -1 EBADF (Bad file descriptor)
write(4, "\n", 1)                 = -1 EBADF (Bad file descriptor)
close(4)                          = -1 EBADF (Bad file descriptor)
write(4, "\n", 1)                 = -1 EBADF (Bad file descriptor)
write(4, "\n", 1)                 = -1 EBADF (Bad file descriptor)
close(4)                          = -1 EBADF (Bad file descriptor)

One real close and 6 consequent redundant closes for std{in,out.err} and __std{in,out.err}__.

Another edge case — sys.stderr.close() aborts the connection.

Also fdopened object uses read/write syscalls when makefile uses recv/send. Not sure what that may cause, but it doesn't look semantically correct.

I still don't understand why you don't want to use makefile approach — with it it closes exactly once only after closing very last file wrapper. Looks very straightforward, reliable and safe.

@anton-ryzhov
Copy link
Contributor Author

^ I've pulled your fixed tests into my branch

@codecov
Copy link

codecov bot commented Mar 20, 2021

Codecov Report

Merging #58 (cdcf451) into master (16596e7) will increase coverage by 0.80%.
The diff coverage is 73.68%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #58      +/-   ##
==========================================
+ Coverage   81.91%   82.72%   +0.80%     
==========================================
  Files           6        6              
  Lines        1261     1279      +18     
  Branches      135      139       +4     
==========================================
+ Hits         1033     1058      +25     
+ Misses        187      182       -5     
+ Partials       41       39       -2     
Impacted Files Coverage Δ
tests/wsgi.py 38.46% <0.00%> (-38.47%) ⬇️
tests/test_manhole.py 91.23% <50.00%> (+0.25%) ⬆️
src/manhole/__init__.py 90.59% <100.00%> (+2.28%) ⬆️
tests/test_manhole_cli.py 100.00% <100.00%> (+2.56%) ⬆️
src/manhole/cli.py 36.13% <0.00%> (+1.68%) ⬆️
tests/helper.py 76.52% <0.00%> (+2.18%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 16596e7...cdcf451. Read the comment docs.

@ionelmc
Copy link
Owner

ionelmc commented Mar 20, 2021

Hmm indeed, the fdopen stuff on the stdout patching is still buggy in my changes.

To be honest I'm not really sure now why I avoided makefile in the first place. Perhaps I wanted to avoid some of the crazy makefile monkeypatching in some older version of gevent/eventlet...

@ionelmc ionelmc merged commit 9a18cf5 into ionelmc:master Mar 20, 2021
@ionelmc
Copy link
Owner

ionelmc commented Mar 20, 2021

Thanks for not giving up lol, I've checked now the gevent/eventlet sourcecode and it looks like nothing crazy is going on for makefile.

@anton-ryzhov
Copy link
Contributor Author

Cool, thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants