Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle /: for ntpath.normpath #117383

Closed
Tracked by #117361
nineteendo opened this issue Mar 29, 2024 · 9 comments
Closed
Tracked by #117361

Handle /: for ntpath.normpath #117383

nineteendo opened this issue Mar 29, 2024 · 9 comments
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows type-bug An unexpected behavior, bug, or error

Comments

@nineteendo
Copy link
Contributor

nineteendo commented Mar 29, 2024

Bug report

Bug description:

>>> import ntpath
>>> ntpath.normpath("/:foo")
'/:foo'

Expected: \\:foo, like the python implementation for ntpath.normpath.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Windows

@nineteendo nineteendo added the type-bug An unexpected behavior, bug, or error label Mar 29, 2024
@nineteendo nineteendo mentioned this issue Mar 29, 2024
16 tasks
@eryksun
Copy link
Contributor

eryksun commented Mar 29, 2024

This works as expected with the fallback implementation on POSIX:

>>> os.name
'posix'
>>> ntpath.normpath('/:foo')
'\\:foo'

The problem is in the C implementation of nt._path_normpath() on Windows.

>>> nt._path_normpath('/:foo')
'/:foo'

It's implemented by _Py_normpath_and_size() in "Python/fileutils.c", which skips a DOS drive using the following snippet of code:

    // Skip past drive segment and update minP2
    else if (p1[0] && p1[1] == L':') {
        *p2++ = *p1++;
        *p2++ = *p1++;
        minP2 = p2;
        lastC = L':';
    }

It needs an additional check for a leading path separator, e.g. p1[0] && !IS_SEP(p1) && p1[1] == L':'.

@eryksun eryksun added OS-windows interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes labels Mar 29, 2024
@zooba
Copy link
Member

zooba commented Apr 2, 2024

It needs an additional check for a leading path separator, e.g. p1[0] && !IS_SEP(p1) && p1[1] == L':'.

Do we need the separator check? Or do we need to check that p1[0] is a letter?

In any case, it's an invalid path, so I'm not concerned about producing invalid results. Unless it's somehow exploitable, this is very non-urgent.

@nineteendo
Copy link
Contributor Author

Do we need the separator check? Or do we need to check that p1[0] is a letter?

A separator check: #101363 (comment)

WinAPI GetFullPathNameW() will accept any character in the 16-bit BMP as a drive 'letter', except for null, slash, and backslash. For example:

@eryksun
Copy link
Contributor

eryksun commented Apr 2, 2024

Do we need the separator check? Or do we need to check that p1[0] is a letter?

Sure, we could restrict it further. Then we have to decide on what counts as a letter. The mount manager only supports assigning DOS volume names with letters A-Z. However, parts of Windows path handling are generic about what characters are supported as drive letters. For example:

>>> os.system('subst ñ: C:\\Windows')
0
>>> os.chdir('ñ:/')
>>> os.getcwd()
'ñ:\\'
>>> os.path.exists('ñ:/explorer.exe')
True

For the above to work as it does, the Windows API has to handle "ñ:" as a drive instead of as a filename.

The newer PathCch* routines limit this to characters for which ntdll!iswalpha() returns true. That's 114 alphabetic characters in Latin 1, for which the ntdll implementation is hard coded. The UCRT implementation of iswalpha() is generalized to use Unicode character properties via WinAPI GetStringTypeW(). Thus to match the PathCch* routines, we have to ensure the ordinal value is less than 256, e.g. p1[0] < 256 && iswalpha(p1[0]).

I don't think that legacy isalpha() should be used. The result depends on the current locale, which can vary in the non-ASCII upper range (128, 256).

@zooba
Copy link
Member

zooba commented Apr 3, 2024

Okay, so we don't need a more strict check, we just need to decide whether our fallback implementation and our native implementation need to behave the same for invalid paths.

I'd be inclined to change the fallback implementation here, honestly. As the input is invalid, the result is also invalid, and changing the fallback is lowest risk.

@nineteendo
Copy link
Contributor Author

I'd be inclined to change the fallback implementation here, honestly.

In that case, either fix the bug, or leave it as it is. Breaking the working implementation would be regression, and is not what I had in mind when I reported this.

Also, adding a bug here is probably going to slow the function down:

cpython/Lib/ntpath.py

Lines 552 to 553 in 976bcb2

path = path.replace(altsep, sep)
drive, root, path = splitroot(path)

@zooba
Copy link
Member

zooba commented Apr 3, 2024

Unless someone shows a real working scenario that involves normalising this invalid path, then leave it as is.

I don't see the motivation to change anything here. I've suggested the best motivation I can imagine, and if that's not why you want this changed, then I need you to explain why.

@nineteendo
Copy link
Contributor Author

Sorry, I can't get behind introducing a bug in a perfectly working function to make it consistent with a function with a (minor) bug (maybe even slowing it down). But I believe that if the C implementation could be fixed, that would be the best outcome.

So decide whether we want to fix the C implementation or close this as "won't fix" (obviously easier).
@barneygale what's your opinion on this?

@nineteendo
Copy link
Contributor Author

OK, I'm closing this. If you change your mind this can be re-opened.

@nineteendo nineteendo closed this as not planned Won't fix, can't repro, duplicate, stale Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants