-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fast path to os.[l]stat() that returns incomplete information #99726
Comments
When passed as True, only st_mode's type bits, st_size and st_mtime[_nsec] are guaranteed to be set. Other fields may also be set for a given Python version on a given platform version, but may change without warning (in the case of OS changes - Python will try to keep them stable). This first implementation uses a new Windows API that is significantly faster, provided the volume identifier is not required. Other optimizations may be added later.
Updating |
I'm familiar with NTAPI If you can, please ask the them to also expose FileStatInfo and FileStatBasicInfo for use with
The filesystem has nearly zero-cost access to the volume serial number. In the NT API, it's available in FileFsVolumeInformation as Another value that could be useful is |
The basic stat info doesn't include the Based on your code, they're applying a file-type categorization to the There are several useful device characteristics, including FILE_REMOVABLE_MEDIA, FILE_READ_ONLY_DEVICE, FILE_REMOTE_DEVICE, FILE_DEVICE_IS_MOUNTED, FILE_PORTABLE_DEVICE, and FILE_CHARACTERISTIC_WEBDAV_DEVICE1. WinAPI It would be nice to expose the device characteristics as Footnotes
|
Regarding the Currently, the change time could be obtained in |
Related idea: Given FileStatInfo is available and the non-basic FileStatByNameInfo has an |
The statx call has a reasonably nice api, in that you pass along the list of attributes you're interested in, maybe we could mirror that, and the implementation could work out what system call to use? |
Now that's an idea. https://man7.org/linux/man-pages/man2/statx.2.html for reference. I've got two days of work with nobody else around (US Thanksgiving), so maybe I'll have a go at adding this API instead. |
… uses faster API when available
…faster API when available (GH-102149) This deprecates `st_ctime` fields on Windows, with the intent to change them to contain the correct value in 3.14. For now, they should keep returning the creation time as they always have.
Implemented, without returning incomplete information! The Windows team updated their API to include everything we needed, so it's just a transparent perf improvement now (as well as a minor correctness improvement). |
* main: (34 commits) pythongh-102701: Fix overflow in dictobject.c (pythonGH-102750) pythonGH-78530: add support for generators in `asyncio.wait` (python#102761) Increase stack reserve size for Windows debug builds to avoid test crashes (pythonGH-102764) pythongh-102755: Add PyErr_DisplayException(exc) (python#102756) Fix outdated note about 'int' rounding or truncating (python#102736) pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (python#102760) pythongh-99726: Improves correctness of stat results for Windows, and uses faster API when available (pythonGH-102149) pythongh-102192: remove redundant exception fields from ssl module socket (python#102466) pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (python#102743) pythongh-102737: Un-ignore ceval.c in the CI globals check (pythongh-102745) pythonGH-102748: remove legacy support for generator based coroutines from `asyncio.iscoroutine` (python#102749) pythongh-102721: Improve coverage of `_collections_abc._CallableGenericAlias` (python#102722) pythonGH-102653: Make recipe docstring show the correct distribution (python#102742) Add comments to `{typing,_collections_abc}._type_repr` about each other (python#102752) pythongh-102594: PyErr_SetObject adds note to exception raised on normalization error (python#102675) pythongh-94440: Fix issue of ProcessPoolExecutor shutdown hanging (python#94468) pythonGH-100112: avoid using iterable coroutines in asyncio internally (python#100128) pythongh-102690: Use Edge as fallback in webbrowser instead of IE (python#102691) pythongh-102660: Fix Refleaks in import.c (python#102744) pythongh-102738: remove from cases generator the code related to register instructions (python#102739) ...
… uses faster API when available (pythonGH-102149) This deprecates `st_ctime` fields on Windows, with the intent to change them to contain the correct value in 3.14. For now, they should keep returning the creation time as they always have.
… uses faster API when available (pythonGH-102149) This deprecates `st_ctime` fields on Windows, with the intent to change them to contain the correct value in 3.14. For now, they should keep returning the creation time as they always have.
A future update to Windows is bringing a new filesystem API for getting stat(-like) information more efficiently from a filename. Currently, we have to open the file, which is quite a slow operation. Being able to simply request metadata based on the path is a real improvement. My testing shows
os.stat()
andos.lstat()
(in the case where no traversal is needed) taking less than 1/4 of their current time when using the new API. I'll link the change in a PR below.However, the new API does not include the volume serial number, which is how we fill in the
st_dev
field. Adding an additional call to get the VSN takes all the time we were taking before, so there's no performance benefit.1So I'd like to propose adding a
fast=False
argument toos.stat
andos.lstat
. When left asFalse
, you get the current behaviour. If you passTrue
, we only guarantee a smaller set of data, and warn that other fields may be absent on some platforms.Looking through the fields, I have proposed that the file type bits of
st_mode
(not permissions), thest_size
andst_mtime[_ns]
fields are the only ones that are important to guarantee.2 All the rest can stay as they are, but we then have the option to drop them from the fast path in the future.3 It's no accident that these are the APIs we already offer as otheros.path
functions (apart fromsamestat
, which will have to stay on the slow path and probably needs an even slower check in order to be x-plat reliable...)I'm not sure who cares most about this, so I'm going to leave this open for a while.
Linked PRs
Footnotes
There is still discussion about changing this API before it releases. If that happens, the rest of this proposal is moot, unless we like the idea anyway. ↩
On Windows, we can further guarantee
st_file_attributes
andst_reparse_tag
, as these are the raw values used to calculate the file type bits ofst_mode
. ↩stat is already very fast on POSIX-ish filesystems, so it's unlikely to be an issue there, but if we wanted to specialise for network FS or similar then we'd be able to. ↩
The text was updated successfully, but these errors were encountered: