-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows: run Process.open_files() in a separate thread #597
Conversation
I'm working on an updated version that won't rely on a global secondary thread. |
I can't say I like the thread-based solution but considering that this is a very difficult (apparently unfixable) and ancient issue I suppose it may be OK to stick with this solution, also because the alternative I had in mind was to get rid of open_files() support on Windows once and for all. |
PS: in order to raise AccesDenied form the C extension you'll have to include "../_psutil_common.h", then:
...and decorate open_files() method with wrap_exceptions() decorator in _pswindows.py. |
On case of timeout, we just skip the file handle. I've been debug printing the PID/handle on timeouts and this behavior mimics ProcessExplorer. Raising an AD exception doesn't sound right because there may be actual files, we just can't get the name of the synchronous named pipe for that corresponding handle. I agree 1 second is a long timeout, this can be changed. Right now I'm focused on trying to remove the use of global variables and a global thread so calls to get_open_files will be thread safe but I'm running into some heap corruption issues I haven't been able to track down. |
Thanks for looking into this. If this finally gets fixed it's gonna be a big step forward.
What do you mean? That ProcessExplorer uses this same technique (using a thread)? |
As far as I can tell there are 2 ways for Process Explorer to work. If it can inject a driver it will use the driver and this issue does not exist. If it cannot inject a driver, it will create a thread and terminate it if it hangs. The handle that causes NtQueryObject to hang is just ignored and not present in the output for Process Explorer. I've taken a slightly different approach which is just resetting the current execution point for the hanging thread back to the start so it's waiting for a signal. In theory should avoid all memory leaks since I'm essentially forcing the thread to goto the beginning of the function. I did some testing to verify that the stack contents aren't reallocated over and over. However, running make.bat test-memleaks shows that it's still leaking memory and I'm not sure where. I've also discovered get_open_files doesn't like file handles to network resources because the call to resolve \Device\HardDisk1\Partition0 to a drive letter doesn't understand the prefix for a file name that looks like: \Device\mup\sharename\path\to\file.txt |
This looks like a separate issue.
|
I've opened issue #600 to track the network handles issue. |
thanks |
Any news about this? I think I'm gonna release 3.0.0 this week and I'd like to get this in. I don't want to rush it though, just wanted to let you know my plans. |
I don't think this will be ready in time. I would recommend merging in #596 to solve files not being shown due to the access flags filter and the possibility of memory/handle leaks in error edge cases. Right now, I have an issue on Python 3 x64 where I'm getting a PUNICODE_STRING with a random null byte in the middle like: \Device\Harddisk1\Program Files\x00Microsoft Passing this to PyUnicode_FromWideChar causes it to segfault. I can't figure out why NtQueryObject with ObjectNameInformation is causing this to return an invalid string to me. |
Just an update: I should have this done by the end of the week. |
Cool, thanks. |
So I'm content with the code at this point. I'd like to move a lot of the defines/structs into the header file but it causes a bunch of compiler errors for struct redefinition. It looks like it's due to ntextapi.h and the requirement to support mingw. |
I thought I saw an issue referring to dropping MingW but I couldn't find it. I'll see what I can do. |
Done. I removed the conflicts that were affecting this PR so this should get you part way through #591. |
It seems there are still conflicts as the merge button is grey.
|
get_open_files spawns a worker thread to process calls to NtQueryObject(). If the call timesout (100ms), the thread is terminated and the file handle is skipped. The ConvertThreadToFiber call prevents the initial stack from leaking when the thread is terminated on Windows XP/2003. The code has been cleaned up and struct redefinitions in ntextapi have been removed and with conflicts using #define's.
a59e443
to
c813e6d
Compare
I rebased it to remove that commit. |
OK, I'll try/test this someday during this week and hopefully merge it. |
@giampaolo Any chance of this getting merged in soon? |
Sorry, I'm going to NYC on Sunday so this is gonna take a little while. |
* master: use 'with open' to make sure file is closed test_process_create_time: always test against the rounded time too don't test physical cpu count on systems that don't include it don't test num_ctx_switches on unsupported kernels fix compilation warning about possible misuse of XDECREF fix race condition in wait_for_file fix flake8 error fix permission errors when running from /root fix giampaolo#607: DUPLEX_UNKNOWN is not defined on old RedHat versions fix giampaolo#606: Process.parent() may swallow NoSuchProcess exceptions (#race-condition) attempt to fix xargs on OSX giampaolo#602: move pre-commit hook into a separate file fix giampaolo#602: add GIT pre-commit hook fix failing test on Windows C styling: if unification forgot to close the handle fix giampaolo#599 (Windows): process name() can now be determined for all PIDs add test windows: refactoring of the alternative process info implementation fix typo in documentation
…read * upstream/master: get rid of unreliable if_addrs() test on Linux better skip msg on windows
I have reason to believe this breaks compatibility on Windows XP 32-bit. I will continue to investigate further. |
Looks like the Windows XP failure is unrelated. It's a missing reference to inet_ntop [1] in net_if_addrs which isn't implemented in Windows until Vista [2]. This was introduced in PR #588 [1] psutil/psutil/_psutil_windows.c Line 2987 in 3916125
[2] https://msdn.microsoft.com/en-us/library/windows/desktop/cc805843%28v=vs.85%29.aspx |
There's an edge case on Windows XP where an infinite loop is possible. A call to NtQueryObject with a NULL buffer and a buffer size of 0 will return STATUS_INFO_LENGTH_MISMATCH but dwLength will be 0. This change will pre-allocate a buffer of size (MAX_PATH + 1) * sizeof(WCHAR) and skips the handle if the required dwLenght is 0.
Ran into some other issues, don't think this should be merged yet (Windows XP 32-bit running as SYSTEM). |
Turns out the threads won't die on WinXP under certain conditions. For Vista+, it will spawn a new thread. |
get_open_files in a separate thread
I've just merged this. Tested on Win 7 and it seems everything is fine. Thanks for working on this. It was a big one. |
@mrjefftang: also, I added this to doc:
Is it correct? Do you think we should add anything else (e.g. differences between XP and Vista)? |
I think that's correct. I would add a note that psutil spawns a thread on Vista+ only. |
What happens on XP? The thread isn't spawn so we still have the hanging issue? |
Timeout is 100ms. Windows XP uses an alternative method that was documented on MSDN which misses file handles to valid files. It's a trade off of hanging vs being accurate. Also despite all my testing, I could not get the thread cleanup to work on Windows XP which is why I went that route. We could spawn helper threads and call TerminateThread but the thread never dies off until python.exe quits. This lead to hundreds of threads being left around and in generally behaving badly. |
...so as far as I understand:
Is it correct? |
That is correct. |
Mmm... since we're killing the thread if it's hanging I suppose we may as well skip valid file handles.
|
There's a chance that given the 100ms timeout, we end up killing the thread when it didn't hang but the CPU was just slow. If a thread hangs, we kill it, spawn a new thread, and continue with the next file handle. The theory is that the thread only hangs on asynchronous pipes. |
That's exactly because I prefer to declare this is not fully reliable.
I suppose there's no way to pre-emptively check whether the handle is an async pipe and skip it, is that right? |
There is no way to check if a handle is async from userspace (hence the need for a kernel driver). We could probably just leave a disclaim that applies to all versions of Windows.
|
I know this is a somewhat old conversation, but... |
Where exactly does it leak? Can you provide a PR? |
I believe @jrfl is talking about this https://support.microsoft.com/en-us/kb/104641 |
@mrjefftang is correct. I should have been more clear. Sorry for that. |
There were a lot of changes since March 1. I might have been using CRT calls back then as well. Since the merge, I have not seen any evidence of a memory leak, have you seen otherwise @jrfl ? |
There were some leaks when mucking about looking through all processes for all open handles every 15 seconds or so. The python process grew to over a gig in about an 2 hours. That's one of the things that lead me to this discussion. It turned out not to be a problem for me as I had to abandon using this code and start using handle from sysinternals and parse its output. Handle was finding open things that psutils did not. (Sorry if this sounds like bitching. My windows internals skills have degraded as I haven't had to use them for years otherwise I'd attempt to help) |
Looking back at this it appears the leaking issue is a pretty severe one:
@mrjefftang any chance you could look into this again? |
This PR fixes issue #340 and supersedes PR #596.
This will spawn a new thread on every call to get_open_files() and time it out after 1 second.
Based off the skeleton reverse engineered code from Sysinternal's Handles here:
http://forum.sysinternals.com/discussion-howto-enumerate-handles_topic19403_post99587.html#99587
Tested on Python 2.7.2 and Python 3.4.2 on Windows 7 x64.
All tests pass and the only memory leak test failures are related to net_if_addrs as seen in #596 as well.