Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Py_Initialize Hangs on Windows 10 #83526

Closed
dhamilton mannequin opened this issue Jan 15, 2020 · 12 comments
Closed

Py_Initialize Hangs on Windows 10 #83526

dhamilton mannequin opened this issue Jan 15, 2020 · 12 comments
Labels
3.7 (EOL) end of life OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@dhamilton
Copy link
Mannequin

dhamilton mannequin commented Jan 15, 2020

BPO 39345
Nosy @pfmoore, @tjguk, @zware, @zooba
Files
  • dup-hang.c
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-01-15.18:55:56.242>
    labels = ['3.7', 'type-bug', 'library', 'OS-windows']
    title = 'Py_Initialize Hangs on Windows 10'
    updated_at = <Date 2021-06-09.02:44:42.595>
    user = 'https://bugs.python.org/dhamilton'

    bugs.python.org fields:

    activity = <Date 2021-06-09.02:44:42.595>
    actor = 'duaneg'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)', 'Windows']
    creation = <Date 2020-01-15.18:55:56.242>
    creator = 'dhamilton'
    dependencies = []
    files = ['50099']
    hgrepos = []
    issue_num = 39345
    keywords = []
    message_count = 4.0
    messages = ['360070', '375628', '375639', '395377']
    nosy_count = 7.0
    nosy_names = ['paul.moore', 'tim.golden', 'duaneg', 'zach.ware', 'steve.dower', 'dhamilton', 'ph.fieschi']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue39345'
    versions = ['Python 3.7']

    @dhamilton
    Copy link
    Mannequin Author

    dhamilton mannequin commented Jan 15, 2020

    This is related to https://bugs.python.org/issue17797, which is closed.

    Using Python 3.7.4, Windows 10.0.18362, Visual Studio 2017 and running as a C Application. Py_Initialize() eventually calls is_valid_fd with STDIN. The behavior appears to cause both dup() and fstat() to hang indefinitely (using RELEASE MSVCRT DLLs, it works correctly using MSVCRT Debug DLLs). The call stack shows Windows is waiting for some Windows Event. The recommended patch in bpo-17797 will not work.

    is_valid_fd appears to want to read the 'input' using a file descriptor. since both dup and fstat hang, I realized that isatty() would indicate if the file descriptor is valid and works for any predefined FD descriptor(STDIN-0, STDOUT-1, STDERR-2).

    #if defined(MS_WINDOWS)
    	struct stat buf;
    	if (fd >= fileno(stdin) && fd <= fileno(stderr)) {
    		return (_isatty(fd) == 0 && errno == EBADF) ? 0 : 1;
    	}
    	else if (fstat(fd, &buf) < 0 && (errno == EBADF || errno == ENOENT))
    		return 0;
    	return 1;
    #else

    @dhamilton dhamilton mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir OS-windows type-bug An unexpected behavior, bug, or error labels Jan 15, 2020
    @zooba
    Copy link
    Member

    zooba commented Aug 18, 2020

    Are you able to capture a process dump at the hang? I haven't seen this anywhere else, and don't even know how to go about trying to reproduce it with this information - Py_Initialize is called by every single Python process, so there's something special about your situation that isn't obvious yet :)

    @phfieschi
    Copy link
    Mannequin

    phfieschi mannequin commented Aug 19, 2020

    Personally, I have the same problem of Py_Initialize() hanging indefinitely.

    Here is the context in which it happens : I am developing an application in Java, in which I use the library jep (https://github.com/ninia/jep), that enables me to get a Python interpreter from Java, and I am developing and testing it on Windows. My Python version is 3.8.2 and I am on Windows 10 - version 1903.
    When I test this library outside my app in a simple Java project, everything works fine, and the interpreter works.
    But when I try to use it in the app, it hangs indefinitely when I create the interpreter. When I digged into the code of the library, I found out that it occurs in the native code of jep, during the call to Py_Initialize(). I posted an issue on the github of jep, and they brought me here. I bet this is related to stdin and stdout when I see what dhamilton posted.
    My Java's stdout is normal and writes in the console. I tried to reset or redirect Java's stdin an stdout, but it doesn't change nothing.

    And when I try to do this on Linux (my application is also on Linux), on Ubuntu 16, everything works fine and it doesn't hang indefinitely. So this only happens on Windows.

    About capturing a process dump, all I can get is a message displayed on the Java console when I close the app (because it hangs indefinitely) :
    # A fatal error has been detected by the Java Runtime Environment:

    # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000000302b9d8f, pid=11960, tid=0x0000000000003f98

    # JRE version: Java(TM) SE Runtime Environment (8.0_241-b07) (build 1.8.0_241-b07)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.241-b07 mixed mode windows-amd64 compressed oops)
    # Problematic frame:
    # C 0x00000000302b9d8f

    # Failed to write core dump. Minidumps are not enabled by default on client versions of Windows

    # The crash happened outside the Java Virtual Machine in native code.

    All of this is just my personal case, and it's probably not the same for dhamilton. I hope it helped you.

    @duaneg
    Copy link
    Mannequin

    duaneg mannequin commented Jun 9, 2021

    I can reproduce this on Windows 10 with Python 3.9. See attached source. At least for us, it is hanging when one thread is doing a read on the file descriptor while a second calls Py_Initialize (or just dup directly).

    The windows kernel call stack shows the dup call is waiting on a critical section, while the thread reading from stdin is waiting in ReadFile. I can get a full stack trace from WinDbg if it is helpful, but hopefully the attached code should be enough to reproduce the problem at will for anyone interested.

    If stdin is receiving input, or is closed, then the read call will complete and unblock dup in due course. However if not then it will hang indefinitely.

    If we can fix this to work reliably in Python that would be great. Otherwise, or in the meantime, we could just add a note to the documentation. We are going to try and work-around it by using a different file descriptor instead of stdin. Other applications might be able to avoid IO using stdin until after python is initialised.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @yqs112358
    Copy link

    yqs112358 commented Mar 24, 2023

    This problem still exists on Python3.10.9, when I use Py_NewInterpreter to create new sub-interpreters for my embedded python. It is the same problem as Py_Initialize, because it finally calls Py_NewInterpreter.

    See more at https://stackoverflow.com/questions/71892914/python-c-api-py-newinterpreter-freezes-thread-when-creating-new-sys-stdin
    Above is the same problem.

    Reproduction

    Only occurs on windows platform

    1. Start a blocking IO of stdin in a thread (for example, scanf("%d", &num))
    2. Call Py_NewInterpreter in another thread
    3. The thread calling Py_NewInterpreter will freeze, until I input a new line into console and press enter.
    4. Then the program continues to work properly (the sub-intepreter created successfully).

    It is obvious that stdin is waiting for some input after calling Py_NewInterpreter. I have tested in VS and the call blocks at dup() when creating new sys.stdin for new sub-interpreter.

    StackTrace:

    image

    Block here in Pylifecycle.c, line 2124

    image

    @yqs112358
    Copy link

    yqs112358 commented Mar 24, 2023

    I got a temporary solution: start a new thread to execute Win32 API CancelIoEx on stdin before calling Py_NewInterpreter, which looks like this:

    std::thread([](){
    	Sleep(50);
    	CancelIoEx(GetStdHandle(STD_INPUT_HANDLE), NULL);
    }).detach();

    But this solution not seems good, as we don't know exactly when dup() happens to block after the execution of Py_NewInterpreter.
    Also it need to call Win32 APIs, which is not suitable for projects like CPython.

    So, is there a bette way to do something like first temporarily ban stdin before dup(), and then recover it when the calling is finished?

    @zooba
    Copy link
    Member

    zooba commented Mar 24, 2023

    I think @eryksun just pointed this same problem out in #102765, so we may need to factor it into this code path as well.

    Though I wonder if in this case we'd be better off with a way to preemptively set up stdio for Python, rather than trying to infer it? That could bypass this code entirely, and generally help embedders provide their own read/write functions.

    As a rule, we shouldn't require use of global resources in CPython when embedding. Right now, we're pretty bad about that, but it's the direction we'd want to move in. Allowing host apps to completely handle std streams would fit.

    @yqs112358
    Copy link

    I think @eryksun just pointed this same problem out in #102765, so we may need to factor it into this code path as well.

    Though I wonder if in this case we'd be better off with a way to preemptively set up stdio for Python, rather than trying to infer it? That could bypass this code entirely, and generally help embedders provide their own read/write functions.

    As a rule, we shouldn't require use of global resources in CPython when embedding. Right now, we're pretty bad about that, but it's the direction we'd want to move in. Allowing host apps to completely handle std streams would fit.

    Thanks for your remind. Hopefully Python can make more improvements for embedding in the future. I know that a number of useful improvements will be made to the sub-interpreter in Python 3.12, including splitting the GIL and so on. I'm sure these efforts will work.

    I'm trying to see if I can avoid blocking by modifying stdio beforehand. If there is any success I'll post it here

    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 24, 2023

    The implementation of is_valid_fd() was made faster and generally safer in Python 3.11:

    cpython/Python/pylifecycle.c

    Lines 2202 to 2208 in c834a4c

    #elif defined(MS_WINDOWS)
    HANDLE hfile;
    _Py_BEGIN_SUPPRESS_IPH
    hfile = (HANDLE)_get_osfhandle(fd);
    _Py_END_SUPPRESS_IPH
    return (hfile != INVALID_HANDLE_VALUE
    && GetFileType(hfile) != FILE_TYPE_UNKNOWN);

    GetFileType() calls NtQueryVolumeInformationFile() to get the file's FileFsDeviceInformation. If the file object is a direct, local open, then the I/O manager implements this query without having to call the filesystem driver and synchronize on the file object. On the other hand, it does have to call the driver and synchronize on the file if it's opened on a redirected filesystem (i.e. FILE_DEVICE_NETWORK_FILE_SYSTEM). That's only a concern in general for a remote pipe or mailslot. The chances of using a remote pipe for standard I/O are slim to none.

    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 24, 2023

    Note that there are still problems with lseek() calls. For example, the following hangs Python at startup until the console read in the parent process is completed (i.e. until enter is pressed).

    >>> import os, subprocess, threading
    >>> env = os.environ.copy()
    >>> env['PYTHONLEGACYWINDOWSSTDIO'] = '1'
    >>> th = threading.Thread(target=input)
    >>> th.start(); p = subprocess.Popen('python', env=env); th.join(); p.wait()

    It's easy to see why using an attached debugger:

    0:000> kc 8
    Call Site
    ntdll!NtQueryInformationFile
    KERNELBASE!SetFilePointerEx
    ucrtbase!common_lseek_nolock<__int64>
    ucrtbase!common_lseek<__int64>
    ucrtbase!lseeki64
    python311!portable_lseek
    python311!_io_FileIO_tell_impl
    python311!_io_FileIO_tell
    

    The child inherits its console input file from the parent. Since the parent is doing a synchronized read on this file, the NtQueryInformationFile() call in the child hangs until it can acquire the file lock.

    The I/O manager allows setting the file pointer of a synchronous pipe or console file, but it's meaningless. We could protect lseek() calls to either do nothing or fail if the file type isn't FILE_TYPE_DISK.

    @zooba
    Copy link
    Member

    zooba commented Mar 27, 2023

    So it sounds like this hang is no longer an issue on 3.11? Can anyone suffering from the problem confirm?

    @hugovk
    Copy link
    Member

    hugovk commented Nov 13, 2023

    So it sounds like this hang is no longer an issue on 3.11? Can anyone suffering from the problem confirm?

    No complaints in seven months: let's assume it's now fine, and we can re-open again if needed.

    @hugovk hugovk closed this as completed Nov 13, 2023
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants