Skip to content
This repository has been archived by the owner on Sep 9, 2020. It is now read-only.

Adding new dependency logging #194

Merged
merged 1 commit into from
Jan 31, 2017
Merged

Adding new dependency logging #194

merged 1 commit into from
Jan 31, 2017

Conversation

xmattstrongx
Copy link
Contributor

@xmattstrongx xmattstrongx commented Jan 29, 2017

@googlebot
Copy link
Collaborator

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.

@sdboyer sdboyer self-requested a review January 29, 2017 19:15
@xmattstrongx
Copy link
Contributor Author

I signed it!

@googlebot
Copy link
Collaborator

CLAs look good, thanks!

}()

select {
case m := <-message:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's necessary to print results from the main goroutine. Doing so ends up serializing all the network calls, because the loop can't move on to the next local dep to scan until the syncSourceFor call completes.

Maybe instead, make a closure at the beginning of getProjectData() that we can invoke in a goroutine every time we encounter a new project root.

Of course, we'll need to take some care there to deal with printing from multiple goroutines - if nothing else, it might be better to print only either before or after SyncSourceFor completes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok doing so has increased speed drastically. Though the way Ive written it not sure it needs to be a closure. Im still analyzing the code to make sure the new goroutine to call the closure always has a chance to complete.

go func() {
goStart := time.Now()
defer fmt.Fprintf(os.Stderr, "Go Duration: %s\n", time.Since(goStart))
err2 := sm.SyncSourceFor(gps.ProjectIdentifier{ProjectRoot: pr})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in slack, I probably need to add an sm method that more accurately reports the existence of the repo - does it exist upstream? what about on disk?

@@ -265,7 +275,10 @@ func getProjectData(ctx *dep.Ctx, pkgT gps.PackageTree, cpr string, sm *gps.Sour
}

continue
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continue means no need for an else branch

@@ -237,6 +238,15 @@ func getProjectData(ctx *dep.Ctx, pkgT gps.PackageTree, cpr string, sm *gps.Sour
packages := make(map[string]bool)
notondisk := make(map[gps.ProjectRoot]bool)
ondisk := make(map[gps.ProjectRoot]gps.Version)

printNewDeP := func(pr gps.ProjectRoot, sm *gps.SourceMgr) {
message := "Cloning"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better as "Cloned", past tense, because this will only print after the sm is done pulling the dep down.

Copy link
Member

@sdboyer sdboyer Jan 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, might be better to switch out the word entirely - "cloning" applies to git, but git isn't all we're dealing with here. Maybe "Fetched latest of"? I think that's generic enough. It also glosses over whether, in git terms, it's actually doing a clone, or just ensuring our local clone is already up to date.

@sdboyer
Copy link
Member

sdboyer commented Jan 30, 2017

Ok doing so has increased speed drastically. Though the way Ive written it not sure it needs to be a closure.

My reasoning behind making it a closure was reuse - there are a couple more places from which it needs to be called. Right now, only the part of the code dealing with direct dependencies is instrumented. Look down in the dft closure - that's the depth-first explorer which transitively visits all dependencies. Everything discovered through there needs to be treated, as well.

Im still analyzing the code to make sure the new goroutine to call the closure always has a chance to complete.

They're all guaranteed to complete and print, but not by anything directly visible in the code here. All of those SyncSourceFor() calls are blocked on actions that are necessarily also called elsewhere (either in Solve(), or in a blocking manner by the txn writer); they'll complete and print as soon as the other calls do.

This approach doesn't guarantee we'll see everything printed - for example, transitive deps of deps that aren't in GOPATH won't ever get visited, so a printing goroutine will never be run for them. But I'm not even sure that's really solvable, so this is still a substantial improvement over what we have today.

EDIT: note that i'm not saying it's a good practice to just rely on the underlying implementation for completion - just that it will happen. We could certainly add a mechanism to ensure these goroutines complete in the code here - just so long as it doesn't cause us to lose concurrency.

@@ -237,6 +238,15 @@ func getProjectData(ctx *dep.Ctx, pkgT gps.PackageTree, cpr string, sm *gps.Sour
packages := make(map[string]bool)
notondisk := make(map[gps.ProjectRoot]bool)
ondisk := make(map[gps.ProjectRoot]gps.Version)

printNewDeP := func(pr gps.ProjectRoot, sm *gps.SourceMgr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/DeP/Dep/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@@ -266,6 +276,8 @@ func getProjectData(ctx *dep.Ctx, pkgT gps.PackageTree, cpr string, sm *gps.Sour

continue
}
go printNewDep(pr, sm)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no waiting finish of this goroutine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -363,6 +375,7 @@ func getProjectData(ctx *dep.Ctx, pkgT gps.PackageTree, cpr string, sm *gps.Sour
}
} else {
dependencies[pr] = []string{pkg}
go printNewDep(pr, sm)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@xmattstrongx
Copy link
Contributor Author

@sdboyer Is there anything else needed here for this issue to close?

@sdboyer
Copy link
Member

sdboyer commented Jan 30, 2017

@xmattstrongx i think some feedback from @peterbourgon would be useful, as it was his note that motivated putting this up in the first place. i'm adding him as a reviewer.

Copy link
Contributor

@peterbourgon peterbourgon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that it prints something for each dep. I'm confused by the "Fetched latest of dep" verbiage, both of those words signal something specific to me, and it's not what's actually going on. Init is just cloning each identified dep to GOPATH/pkg/dep/sources, right? What about just "Fetched dep" or "Cached dep"?

Also, I don't understand why each printNewDep call is invoked in a new goroutine rather than called synchronously, and the explanation you linked to @mattn doesn't help me understand. Can you explain?

@xmattstrongx
Copy link
Contributor Author

xmattstrongx commented Jan 31, 2017

@peterbourgon We decided to make each call happen in a new goroutine because the underlying call to gps.SourceMgr.SyncSourceFor was eating up about 1.5-1.8 seconds per dep.

As far as the output verbage @sdboyer and I went back and forth a couple times on what it should display. I do like the Cached dep

@sdboyer
Copy link
Member

sdboyer commented Jan 31, 2017

ditto what @xmattstrongx said re: separate goroutines.

What about just "Fetched dep" or "Cached dep"?

Ooh yeah, I liked "Cached dep" even more. I was trying to come up with something to avoid saying "cloned," as that's git/hg-specific, and also isn't the action it's performing if the dep is already present in the cache. So, definitely, this is better.

@peterbourgon
Copy link
Contributor

@xmattstrongx Thanks for the explanation. Is SyncSourceFor the bit that's actually doing the work of init?

@xmattstrongx
Copy link
Contributor Author

per a previous slack conversation with @sdboyer

this'll (SyncSourceFor) all be reused by the solver or exporter when it comes time to do their job, so there's no duplication of work happening

@peterbourgon
Copy link
Contributor

peterbourgon commented Jan 31, 2017

I'm sorry, that doesn't seem to answer my question. Is SyncSourceFor the bit that's actually caching the relevant dep?

@sdboyer
Copy link
Member

sdboyer commented Jan 31, 2017

@peterbourgon I guess that depends on what "the work of init" means? I'm not sure what you're asking by that.

SyncSourceFor just brings the local cached instance of a source up to date (or clones/checks it out initially). We're choosing to target it as the operation to report because it's intuitive and also the one that tends to be expensive, and so is the most likely to give user incremental and comprehensible feedback.

The same SyncSourceFor (or calls that have an equivalent result) are performed implicitly by both solving and exporting the dep tree to vendor. We're just promoting that action to happen out here, so that we can effectively signal when it's complete. We have to pull it out to here because, when it happens implicitly, there's no signaling mechanism to let us know when the op is complete.

@peterbourgon
Copy link
Contributor

peterbourgon commented Jan 31, 2017

OK. Calling the closure printNewDep belies its importance; it's doing more than just printing! I suggest syncNewDep or syncDep or something like this.

@xmattstrongx
Copy link
Contributor Author

xmattstrongx commented Jan 31, 2017

Was violating single responsibility. The new name makes sense to me. Gonna make the change to the function name to syncDep and output to read

Cached dep

@sdboyer
Copy link
Member

sdboyer commented Jan 31, 2017

OK, I think we're good to merge this now. Thanks!

@sdboyer sdboyer merged commit cd7b109 into golang:master Jan 31, 2017
krisnova pushed a commit to krisnova/dep that referenced this pull request Apr 21, 2017
The current rules for github usernames only allow for alphanumeric
characters or single hyphens and they cannot begin or end with a hyphen.
In the past however github username rules were less strict and we need
to support these (issue golang#194).

Using the Google BigQuery public github dataset, I've checked the
usernames of all public commits against the currently defined regex in
deduce.go. From these results I've concluded that usernames with
multiple consecutive hyphens and usernames that end with a hyphen were
allowed in the past and do exist. Fortunately these are the only
exceptions I've found, there were no usernames that started with a
hyphen or contained any other special characters.

In addition, this change now also allows one-letter usernames.
@sdboyer sdboyer mentioned this pull request Sep 6, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants