Skip to content
This repository has been archived by the owner on Sep 9, 2020. It is now read-only.

Ignore dot-prefixed files and subdirs when scanning packages #551

Merged
merged 5 commits into from
May 11, 2017

Conversation

ascandella
Copy link
Contributor

@ascandella ascandella commented May 10, 2017

As with the rest of the toolchain, according to @davecheney, we should ignore _ and . prefixed files as well as all directories in fillPackages which assumes it's working on a single directory.

Fixes #550

fillPackage is supposed to use the AST parser on individual files.
Unforntunately, `filepath.Glob("*.go")` returns directories as well as
files.

Fixes golang#550
@ascandella
Copy link
Contributor Author

I recognize that an extra filestat is not the greatest for every file in the user's project. If there's somewhere higher level we can filter these out, I'm all ears.

@@ -203,6 +203,12 @@ func fillPackage(p *build.Package) error {
if filepath.Base(file)[0] == '_' {
continue
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any file that starts with ., _ or called testdata should be ignored when scanning source directories. Rather than this check, extend the check on line 203 to check for `== '.'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Is this documented somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, that still leaves us open to the case where somebody names a directory "foo.go" inside their project. Perhaps we should handle that case separately, or at least give them a more helpful error message?

@ascandella ascandella changed the title Don't try to parse directories named "*.go" Ignore dot-prefixed files when scanning source May 11, 2017
@dcheney-atlassian
Copy link

dcheney-atlassian commented May 11, 2017

@sectioneight a package declaration cannot contain ., it's not a valid character for an identifier, but a package's directory entry can contain a dot.

deadwood(~/src) % cat foo.go/foo.go 
package foo

const A = 1
deadwood(~/src) % cat bar.go 
package main

import (
        "fmt"

        "foo.go"
)

func main() {
        fmt.Println(foo.A)
}

Copy link
Member

@sdboyer sdboyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, this is a definite improvement - thank you! 🎉

NOTE: Current implementation is still problematic if somebody has a subdir in their project that ends in ".go"

I'm not immediately seeing the problem. That function you're modifying is only responsible for deciding which files to examine; the closure inside ListPackages() is what determines which directories to visit. Seems to me the change you've made should exclude a dir ending in .go from being tokenized incorrectly, but won't affect traversal into the dir.

Could you provide details on the case you see as problematic?

@@ -1280,6 +1280,11 @@ func TestListPackages(t *testing.T) {
},
},
},
"skip directories starting with '.'": {
fileRoot: j("dotgodir"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add e.g. a foo.go dir to accompany the .go dir, so that we're testing both the dot-led and the dir-ending-in-.go case.

@sdboyer
Copy link
Member

sdboyer commented May 11, 2017

ah perfect, i crossposted my review with the fix you just made 😄

@ascandella
Copy link
Contributor Author

Yes, I recognize this is a bit of an edge case, but it's one I ran into internally when trying to build an internal project with dep. Basically, we have some non-go stuff in a directory, and that directory contains a folder called ".go". This probably doesn't affect 99.99% of users but at least this way it ignores the "bad directories" instead of error-ing out with a cryptic message.

continue
}

// Skip any directories that happened to get caught by glob

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this code, its not needed now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it still is - in the event of e.g. a directory named foo.go (as the latest commit introduces). no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still is. My comment about "NOTE: this doesn't handle directories" no longer applies, as demonstrated by the attached test case.

@davecheney
Copy link
Contributor

davecheney commented May 11, 2017 via email

@ascandella
Copy link
Contributor Author

@davecheney are you saying that you'd like to modify this PR such that if we encounter a directory ("foo.go") we recurse into it?

@dcheney-atlassian
Copy link

@sectioneight sort of. I'm saying that the go tool does not treat directories that end in .go specially.

deadwood(~/src) % mkdir -p foo.go/bar.go/quxx.go
deadwood(~/src) % echo "package quxx" >> foo.go/bar.go/quxx.go/wargh.go
deadwood(~/src) % go list foo.go/...
foo.go
foo.go/bar.go/quxx.go

@dcheney-atlassian
Copy link

To be sure, it's a dumb idea, but not forbidden by the tool.

@ascandella
Copy link
Contributor Author

Ok, sounds like fillPackage is in need of a refactor. Should I abandon this PR in favor of another approach? It seems we'll need to change the delineation of fillPackage and ListPackages

@sdboyer
Copy link
Member

sdboyer commented May 11, 2017

guys guys, i addressed this in my earlier review comments 😄

skipping a dir named foo.go in fillPackages() has no bearing on whether we later walk into foo.go.

@ascandella
Copy link
Contributor Author

Aha, excellent point :)

@sdboyer
Copy link
Member

sdboyer commented May 11, 2017

Though, given the discussion here, it probably would be worth adding some real .go files under the foo.go fixture dir to provide a check that we're respecting that.

@ascandella
Copy link
Contributor Author

Any concerns about the os.Stat calls for every source file? This is my first contribution to the project so I'm not aware of what the benchmarks/requirements are as far as introducing stuff that could "make dep slower"

@ascandella
Copy link
Contributor Author

@sdboyer sounds good, will do

@sdboyer
Copy link
Member

sdboyer commented May 11, 2017

This is my first contribution to the project

🎉 🎉 🎊 🕺 🎉 😄

Any concerns about the os.Stat calls for every source file?

Yes, it is a bit concerning; this is a hotpath, and we've already got #419 open to swap in a faster alternative to filepath.Walk() that gets a lot of its speedups from avoiding unnecessary stat calls.

However, it's fine for the purposes of this PR, as I think the best fix for it here may be, in #419, to have the replacement walker be responsible for injecting the results of a platform-sensitive, possibly-faster readDir() into the walk func. That would carry file vs. dir info, obviating the need for the Stat you've introduced.

So, I'm fine with it, as it brings us closer to existing tool behavior, and we have a concrete path to removing the performance regression.

/cc @jstemmer - fyi, since you're looking at #419, here's a bit of an evolved requirement 😄

@sdboyer
Copy link
Member

sdboyer commented May 11, 2017

just a license header, now

@ascandella ascandella changed the title Ignore dot-prefixed files when scanning source Ignore dot-prefixed files and subdirs when scanning packages May 11, 2017
@ascandella
Copy link
Contributor Author

Ok, I think we're all good now, performance considerations aside. Let me know if that's something I can help with -- I'm interested in moving my company to dep as soon as is reasonable :)

@sdboyer
Copy link
Member

sdboyer commented May 11, 2017

looks good, thank you! and yayyyy first contribution! 🎉

@sdboyer sdboyer merged commit ccd9993 into golang:master May 11, 2017
@sdboyer
Copy link
Member

sdboyer commented May 11, 2017

@sectioneight also, with any luck, it'll be reasonable to begin exploring moving your company to dep in earnest on Monday, once we stabilize the manifest and lock. 😄

re: things you could help with; TBH, i've been kicking the performance can down the road for a while. #67 lays out my basic mental roadmap for it, and why I've been vaguely OK with doing so (basically, there's a ton of headroom and a clear path to improvement, albeit with some now-mostly-cleared blockers). #289 is the meta issue for performance, and #431 is now the big ticket item there.

Not in the meta-issue, though, is establishing some broader benchmarks that might help us to identify performance regressions. If you're interested in that sort of thing, that'd be cool - open up an issue and we can brainstorm about what that might look like 😄

On a different front, I haven't forgotten about the use case you raised last year in Masterminds/glide#372. I'm still finding it to be a problem for the same basic reason - how arbitrary changes to source URLs affect generated file portability, and how we cope with that - though it's been discussed in somewhat greater detail in #174. At this point, I'm inclined to say that we might be best off having a toggle that basically causes the system to reach out to a go get metadata-like service the first time (per run) it encounters every URL in order to get a rewritten one. If you wanted to work on hammering something like that out, it'd also be super helpful 🎉

@ascandella
Copy link
Contributor Author

Ya, one of the reasons I wanted my team to get involved early is to make sure we can come up with an amenable "rewrite" / airgap solution that works for us as well as the community. I'm going to pick off a few smaller tasks and then will see if I can look into perf.

@ascandella ascandella deleted the ignore-directories-fillpackage branch May 11, 2017 15:03
@ascandella
Copy link
Contributor Author

Also big +1 on your idea, it would be trivial to stand up a metadata service internally and distribute a file to engineers computers in a well-known location that points to said server, where we can have control over rewrites rather than ending up in massive TOML files

@ascandella
Copy link
Contributor Author

FYI I opened golang/go#20337 for clarity on what the correct behavior should be for scanning directories. As far as I can tell this is the first tool (that I've used, at least) that ignores imports starting with "." or "_", and the documentation is unclear since it only talks about files. Depending on the response there we may want to rejigger the code so that it doesn't ignore dot-dirs:

// We do skip dot-dirs, though, because it's such a ubiquitous standard
// that they not be visited by normal commands, and because things get
// really weird if we don't.
if strings.HasPrefix(fi.Name(), ".") {
return filepath.SkipDir
}

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gps.ListPackages fails when project contains a directory named ".go"
5 participants