Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export samefile, make it smarter for nonexistent inputs #9436

Closed
tkelman opened this issue Dec 21, 2014 · 19 comments · Fixed by #45275
Closed

Export samefile, make it smarter for nonexistent inputs #9436

tkelman opened this issue Dec 21, 2014 · 19 comments · Fixed by #45275
Labels
docs This change adds or pertains to documentation io Involving the I/O subsystem: libuv, read, write, etc.

Comments

@tkelman
Copy link
Contributor

tkelman commented Dec 21, 2014

Just realized at #9376 (comment) that anywhere we do file path comparisons is potentially very error-prone on case-insensitive filesystems. We have realpath and normpath which look similar but are implemented differently, and neither appear to be doing anything about case. Should we have a special string literal specifically for file paths that has comparison defined? Or a differently-named function to compare if file paths point to the same thing? normalize_string with casefold=true might be the sane thing to use here, but do we currently have an API for determining whether the file system is case sensitive? On Mac it's configurable so can't always be assumed one way or the other, right?

@tkelman tkelman added system:windows Affects only Windows system:mac Affects only macOS io Involving the I/O subsystem: libuv, read, write, etc. labels Dec 21, 2014
@nalimilan
Copy link
Member

Copying my comment from the other issue. As this seems really tricky, and can vary depending on the filesystem, it looks like it's best to actually try to open files (when checking whether a file already exists) or compare their inodes (when checking for equality), instead of trying to normalize paths:

I guess it could on Windows and OS X. But it looks like it's a rabbit hole: on OS X, you should also decompose the Unicode path. So the most reliable solution is to try to open both files and see whether they point to the same inode, which is what os.path.samefile does in Python. It would certainly be a useful addition to Julia.
See the recent discussion at https://lwn.net/Articles/626898/

@tkelman
Copy link
Contributor Author

tkelman commented Dec 23, 2014

I'm blind. We already have a Base.samefile that compares device and inode from stat, it's just not exported.

samefile(a::String, b::String) = samefile(stat(a),stat(b))

We should just be conscious of when to use it, try to keep it in mind, see if we're doing path comparisons anywhere without realizing this potential gotcha.

@tkelman tkelman closed this as completed Dec 23, 2014
@StefanKarpinski
Copy link
Member

Sorry, I was vaguely following this and kind of confused. Didn't realized that samefile was not exported – it should be.

@tkelman tkelman removed system:mac Affects only macOS system:windows Affects only Windows labels Dec 23, 2014
@tkelman tkelman changed the title File path comparison on case-insensitive filesystems Export samefile Dec 23, 2014
@tkelman tkelman reopened this Dec 23, 2014
@tkelman
Copy link
Contributor Author

tkelman commented Dec 25, 2014

As mentioned here #9376 (comment) samefile should maybe try to do something slightly smarter for file paths that don't exist yet.

@tkelman tkelman changed the title Export samefile Export samefile, make it smarter for nonexistent inputs Dec 25, 2014
@nalimilan
Copy link
Member

Indeed. But contrary to what @staticfloat said there, I think path normalization is too brittle, as rules vary even depending on the filesystem the file is on. Python's samefile fails when one of the paths does not point to an existing file. I think that's what Julia should do, as it's the only way to completely guarantee that if you create the files, they won't point to the same inode in the end (thus possibly overwriting one another).

So the only reliable way to find out if two non-existent files are (would be) the same is to create one of them, and remove it afterwards. That may be a bit too many operations happening behind the scenes.

A smarter solution would be to fail only when none of the two files exist. If one of them exists, and stat fails on the other, it means they are not the same, so samefile can return false. Then people would just have to create one of the files before instead of after checking. I guess in most situations one of the files already exists anyway.

@tkelman tkelman added the help wanted Indicates that a maintainer wants help on an issue or pull request label Dec 25, 2014
@StefanKarpinski
Copy link
Member

I think that samefile should fail when one of the arguments doesn't exist. It's not called samepath – it's called samefile – the arguments must be actual files, not paths. If you want to check for two paths being the same, that's a different and much harder name canonicalization problem.

@nalimilan
Copy link
Member

I'm fine with that too. People can just create the files and then call samefile.

@StefanKarpinski
Copy link
Member

If one is willing to take it that far, that's the ultimate test if two paths are the same on a given file system.

@tkelman
Copy link
Contributor Author

tkelman commented Dec 26, 2014

Well the samepath operation for files (you could also want to do this comparison on directories or other related objects) that don't exist yet is also quite valuable, though harder. Erroring when neither input exists is probably a good idea. I'm not so sure about erroring when one input exists but the other doesn't - that sounds like it would be enough information to conclusively say no, the two inputs are not the same file (or the same path), but maybe there's some counterexample?

@staticfloat
Copy link
Member

Agreed with @tkelman on all counts. Also, I hadn't really thought through the problems with nonexistant files once things like symlinks are considered, it's entirely possible that /a/b/c/d and /a/x/d are the same files, and trying to infer that without files actually existing sounds like entirely too much work. Let's just error out when neither file exists and call it good. (The only benefit to calling abspath on nonexistant files would be that things that are obviously the same, even if they don't exist, would still return true, such as /a and /a, or /a/../b and /b)

@vtjnash
Copy link
Member

vtjnash commented Dec 29, 2014

on linux, /a/../b is not necessarily the same as /b. since symlinks are resolved from left to right, if a is a symlink, then this says to look up b in the parent directory of whatever a points to.

on windows, .. is resolved before symlinks (e.g. abspath is called before readlink)

@StefanKarpinski
Copy link
Member

That's a fair point that if one exists while the other doesn't it's safe to just return false.

@staticfloat
Copy link
Member

@vtjnash that case was only supposed to be considered when neither file exists.

@vtjnash
Copy link
Member

vtjnash commented Dec 29, 2014

i think Stefan made a good point that it isn't called samepath. if neither file exists, there isn't a whole lot you can meaningfully compare between the filenames. there are filesystem entities on linux (such as those typically found mounted at /proc and /dev) where I'm not even sure that opening the exact same filename would return the same file object.

@staticfloat
Copy link
Member

Yep, let's just leave it that if EITHER file doesn't exist, (including the case where BOTH don't exist) we return false.

@nalimilan
Copy link
Member

I'd prefer an exception to be raised. Else people may not realize that if they create both files, they may end up being the same, and that can create terrible security bugs.

@vtjnash
Copy link
Member

vtjnash commented Dec 30, 2014

samefile is never valid as a security check (effectively, it's only
advisory)

@tkelman
Copy link
Contributor Author

tkelman commented May 13, 2015

this is somewhat improved by fec7ef1 (now returns false if either input does not exist), but samefile should probably still be documented, advertised and exported

@nalimilan nalimilan added the good first issue Indicates a good issue for first-time contributors to Julia label May 14, 2015
@nalimilan
Copy link
Member

Looks like a good candidate for the "intro issue" label (it was already "up for grabs").

@vtjnash vtjnash added docs This change adds or pertains to documentation and removed good first issue Indicates a good issue for first-time contributors to Julia help wanted Indicates that a maintainer wants help on an issue or pull request labels Jul 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs This change adds or pertains to documentation io Involving the I/O subsystem: libuv, read, write, etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants