-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os: add ReadDir method for lightweight directory reading #41467
Comments
I like this a lot (it's similar to Python's
|
This looks great. I felt that there may be some advantage to making One comment from #41188 regarding this suggestion was that It might be good to rename From my point of view, I think the I have a number tools that will benefit significantly from this proposal. |
@benhoyt Good point regarding IsDir and symlinks. I think it needs to return false. Following a symlink will necessarily require another Stat to determine whether the target is a directory. This is probably worth documenting. |
Regardless of which way this ends up working (Lstat vs Stat), it should be mentioned explicitly in the comments/docs for the method, as what is "obvious" to one person will not be to another. |
I personally don't see a point in this. Implementations could simply just return an internal |
@benhoyt, added answers above but (1) I agree with you; (2) ReadEntries is the wrong long-term name, and in the long term Readdir will just be deprecated (but not removed) and fade away; (3) yes, Lstat; (4) yes, false. |
Russ, glad to see a new API proposed, thanks. Some Q's This entails a per-item Does Could we flush the Wouldn't the different behavior of There is no way to get the Windows fileId (analagous to inode) via Unix dirents include the inode -- needed for tree replication. Could For the latter two, I think you'd return a EDIT: Background... I'm building a desktop app for Windows, MacOS, and Linux -- and a server app for Linux -- that make heavy use of the filesystem. Go has thrown me a fair number of curve balls, and I expect more :-/ |
As far as I know, no, this cannot be optional. Perhaps it could be if a more optimized
If we're going with my idea of returning an internal In my honest opinion, this behavior is fine. In fact, it would be preferable if
Worst case scenario, the |
Any caller needing to guarantee a fresh stat should call
I've wanted to retrieve unique file identifiers for both Linux (inode) and Windows (fileId) to improve performance for several applications, so I'd appreciate a solution too. However, I think |
I agree with what @diamondburned and @mpx said but just to reply directly as well.
In practice, when is that type absent? I'm not too worried if, say, VFAT file systems are slower to access.
The doc says the Info is either from the time of the ReadDir or the time of the Info call. So a stat during ReadDir can be cached, but not a stat during an earlier Info.
Adding cache state & manipulation to the API is needlessly complex. Lstat still exists.
There is no magic wand here. Unix and Windows are different. They will always be different. The path separator alone causes trouble in cross-platform apps. The answer is to use APIs like filepath.Clean and filepath.Join as appropriate. If you don't do that, you have trouble. It's not our job to make trouble impossible, only avoidable. The docs on ReadDir are very clear about what you can rely on and what you cannot. If you rely on more than that, again, you will have trouble, same as hard-coding use of / or \ in your program.
Pretty much all apps should probably move to calling ReadDir or, if it matters, Lstat/Stat directly. What's confusing is having two different APIs, but we clearly need a new one, and can't remove the old one. So be it.
[Note that Go would spell it FileID not FileId (the file does not have a mind).] The concept seems too special-purpose for a general interface, and a bit difficult to use correctly. Also, if you are writing "tree replication", then you are already stat'ing all the files to get the other metadata, and you're already writing very OS-specific code to preserve all the OS-specific attributes. That same code can easily grab the info you need as far as inode number and file system identifier. It's not our job to union together all the possible APIs on all the possible systems. Our job is to find a simple API that is enough for the vast majority of Go programs, with an option for the rest to get at what they need (FileInfo.Sys in this case). |
What about If we expect all users of The downside of |
Ian, wouldn't the error you're suggesting be returned by Or are you suggesting that |
I'm suggesting that |
Using the same name with different returns would mean a single type couldn't be both a DirEntry and a FileInfo |
I don't think this should be the case, which is why I said "that should stay there." I think I also think that |
[ reposted after edits for clarity ] It will cause confusion and bugs if You could attain either best performance or cross-filesystem consistency if you could ask
Let's not sacrifice sanity for the sake of simplicity.
Actually, you don't need to
To clarify, there is no way to get the Windows fileID from BTW Windows accepts |
Sorry, but that's the design constraint here: it must be possible to either return info learned during ReadDir or info learned during Info. Otherwise you are overfitting to either Unix or non-Unix and penalizing the other. |
Thanks for pointing this out. That's unfortunate but good to know. |
I think the argument is about the second call to Info, not the first: de.Info() // maybe cached depending on os/fs
de.Info() // always cached in all cases for this and all subsequent calls |
The proposed We can fix that with a simple adjustment to the API:
The second . |
A new enum argument is too verbose for something that simple, in my opinion. Furthermore, needing both Why should I would also like to point out again that having |
After There is no caching by Note: the proposed EDIT: |
Now that I think about it, I think trying to guarantee a consistent behavior with this API is almost moot. There isn't a global file system lock to be acquired, so changes could still happen while |
I amended my previous comment to clarify. |
Allowing/requiring a
We should not require complex caching/reuse logic to be implemented for How common is Readdirnames in performance code? I suspect there will be less overall performance impact since most platforms provide enough details to support walking a tree within their |
I don't think This is a bit on a tangent, but is Go 2 allowed to break existing code? |
This would not be a Go 2 issue as such, it be a os/v2 issue, so code that continues to use os would be unchanged (or swap os with io/ioutil if you like). But an os/v2 (or io/ioutil/v2) package is unplanned and unlikely. |
I wouldn't expect that it improve This proposal is justified by the fact 90% code doesn't use more than name and isDir, but 75% code is using |
See |
You mean, |
If this proposal is accepted we can worry about ioutil next. The most likely answer is to put the helper |
No change in consensus, so accepted. Update: this is what was accepted:
|
@rsc I don't know what you normally do for proposals like this, but is it worth updating the description at the top with the final proposal (including |
@benhoyt, I added a note and link to the top comment. Thanks. |
Change https://golang.org/cl/285592 mentions this issue: |
For #40700 For #41467 For #41190 Change-Id: Id94e7511c98c38a22b1f9a55af6e200c9df07fd3 Reviewed-on: https://go-review.googlesource.com/c/go/+/285592 Trust: Ian Lance Taylor <iant@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
os.File provides two ways to read a directory: Readdirnames returns a list of the names of the directory entries, and Readdir returns the names along with stat information.
On Plan 9 and Windows, Readdir can be implemented with only a directory read - the directory read operation provides the full stat information.
But many Go users use Unix systems.
On most Unix systems, the directory read does not provide full stat information. So the implementation of Readdir reads the names from the directory and then calls Lstat for each file. This is fairly expensive.
Much of the time, such as in the implementation of file system walking, the only information the caller of Readdir really needs is the name and whether the name denotes a directory. On most Unix systems, that single bit of information—is this name a directory?—is available from the plain directory read, without an additional stat. If the caller is only using that bit, the extra Lstat calls are unnecessary and slow. (Goimports, for example, has its own directory walker to avoid this cost.) In fact, a survey of existing Go code found that only about 10% of uses of ReadDir actually need more than names and is-directory bits.
It appears that a third way to read directories should be added, to let all this code be written more efficiently. Expanding on a suggestion by @mpx, I propose to add:
The FS proposal would then adopt this ReadDir and ignore Readdir entirely.
In #41188 I wrote:
I still believe that, but the survey convinced me that nearly all existing Readdir uses fall into this category, so it's not quite so bad to provide an optimized path for Unix systems. The DirEntry.Info method specification above allows both the eager info loading of Plan 9/Windows and the lazy loading needed on Unix. In contrast to #41188, the laziness is explicitly allowed from the beginning, and failures of the lazy loading can be reported in the error result.
Thoughts?
Update: A few clarifications to common questions:
Update 2: A few changes were made along the way to acceptnce. See #41467 (comment) for the final version.
The text was updated successfully, but these errors were encountered: