Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ntpath.realpath() mishandles filenames that resemble drives #102475

Open
barneygale opened this issue Mar 6, 2023 · 10 comments
Open

ntpath.realpath() mishandles filenames that resemble drives #102475

barneygale opened this issue Mar 6, 2023 · 10 comments
Labels
3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes OS-windows type-bug An unexpected behavior, bug, or error

Comments

@barneygale
Copy link
Contributor

The realpath() docs say:

If strict is False, the path is resolved as far as possible and any remainder is appended without checking whether it exists.

Note the word "appended". In fact, realpath() uses os.path.join() to join the path segments, and as we all know/love, os.path.join() supports resetting the drive or root, thus discarding prior parts. As a result:

>>> os.path.realpath('c:/a:b')
'a:b'  # should be 'c:/a:b'
@barneygale barneygale added type-bug An unexpected behavior, bug, or error OS-windows 3.11 only security fixes 3.10 only security fixes 3.12 bugs and security fixes labels Mar 6, 2023
barneygale added a commit to barneygale/cpython that referenced this issue Mar 6, 2023
@kostyafarber
Copy link
Contributor

Any ideas on how to go about fixing this one? Keen to work on it but it seems (with regards to the linked issue) there are other things related to those to sort out first?

@barneygale
Copy link
Contributor Author

barneygale commented Mar 19, 2023

Thanks for taking a look!

Essentially: ntpath.realpath() should not call ntpath.join() internally. Instead, it should use a more naive form of joining, like sep.join() or simply head + sep + tail. Care must be taken not to double up slashes. A small utility function may be warranted.

I suspect the main problematic case is here:

tail = join(name, tail) if tail else name

... but there are other usages of join() which will need some thought.

@kostyafarber
Copy link
Contributor

Thanks for the detailed explanation. I'll have a go at it.

@eryksun
Copy link
Contributor

eryksun commented Mar 19, 2023

Any ideas on how to go about fixing this one?

Define a custom join() function inside of ntpath._getfinalpathname_nonstrict(). It can be simplified since it only joins two paths, the second of which is a simple relative path. For example:

        if isinstance(path, bytes):        
            tail = b''
            sep = b'\\'
        else:
            tail = ''
            sep = '\\'

        def join(path, tail):
            if path[-1:] == sep or not tail:
                return path + tail
            return path + sep + tail

Use this function for all of the four join() calls in _getfinalpathname_nonstrict().

@eryksun
Copy link
Contributor

eryksun commented Mar 19, 2023

Use this function for all of the four join() calls in _getfinalpathname_nonstrict().

I said four join() calls instead of 3 because I was counting the following as well:

cpython/Lib/ntpath.py

Lines 695 to 696 in 4d1f033

if path and not name:
return path + tail

The working directory on a non-existing drive should be the root path. For example, the following result is wrong:

>>> ntpath.exists('Z:')
False
>>> ntpath.realpath('Z:spam')
'Z:spam'

@barneygale
Copy link
Contributor Author

Eryk, is that function necessary in all four cases? I wonder if we can guarantee that path does or does not end with a separator in any of them.

@kostyafarber
Copy link
Contributor

Any ideas on how to go about fixing this one?

Define a custom join() function inside of ntpath._getfinalpathname_nonstrict(). It can be simplified since it only joins two paths, the second of which is a simple relative path. For example:

        if isinstance(path, bytes):        

            tail = b''

            sep = b'\\'

        else:

            tail = ''

            sep = '\\'



        def join(path, tail):

            if path[-1:] == sep or not tail:

                return path + tail

            return path + sep + tail

Use this function for all of the four join() calls in _getfinalpathname_nonstrict().

You've done all the work for me!

@eryksun
Copy link
Contributor

eryksun commented Mar 19, 2023

The issue also needs tests in "Lib/test_ntpath.py".

is that function necessary in all four cases? I wonder if we can guarantee that path does or does not end with a separator in any of them.

The result from nt._getfinalpathname() or nt.readlink() may or may not end with a separator. For prepending to tail, since name comes from ntpath.split(), it should never contain a backslash. However, tail could be empty, so we would need tail = (name + sep + tail) if tail else name, as opposed to tail = join(name, tail). I'd rather keep the check for an empty tail in join().

@johns1c
Copy link

johns1c commented Sep 13, 2023

Is os.path.join just wrong?

Is there a proper definition of a "canonical" path for Windows?

If not should the Python community create a definition? It will have to takes account of namespaces, drive letters, mount points, case sensitivity etc .

I think that one rule should be that If p2 is the result os.path.realpath( p1 ) then if a thing exists or can be created using path p2 it will be the same as that created / accessed via path p1.

I also think that os.path.realpath( 'c:/aa.b' ) should give a similar result to os.path.realpath( 'c:/a.b' )

p.s. see also some of the comments on issue #100162

@zooba
Copy link
Member

zooba commented Sep 14, 2023

Is there a proper definition of a "canonical" path for Windows?

No, because it depends on what the file system driver (and any filter drivers) will accept/handle. And those are extensible, so we can't define it independently of knowing exactly what's on any given system (which we don't).

I think that one rule should be ...

We can't enforce this kind of rule unless the OS does. The best we can do is be a thin wrapper over what the OS does.

The problems being raised here are probably because we also try to maintain compatibility with the old days when we assumed FAT and NTFS paths were the only ones that existed. Should still be fixed, but I don't think there's a good general policy for us to adopt besides moving towards using more OS functionality and implementing less of it ourselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes OS-windows type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants