Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed plan diffing to handle non-unique service names. #10965

Merged
merged 6 commits into from
Oct 12, 2021

Conversation

apollo13
Copy link
Contributor

@apollo13 apollo13 commented Jul 29, 2021

This is a draft for #10964 -- What it does not handle now anymore is a change in port label. Still brainstorming on how to handle this properly :)

Edit: Fixes #10964 -- trying to get the issue ui notice it as linked PR

@apollo13
Copy link
Contributor Author

@shoenig If you have the capacity I'd love to get that into 1.1.3

@apollo13 apollo13 changed the title First draft at better service diffs. Fixed plan diffing to handle non-unique service names. Jul 29, 2021
@shoenig
Copy link
Member

shoenig commented Jul 30, 2021

I think really the correct thing to do here would be to leverage .Hash as they key for Service. It'll take a little plumbing because Hash takes in paramters for allocID, taskName, and canary. You can see from other calls of Hash what we feed in for group level services. ... Or actually I think we can get away with passing in zero values for allocID and taskName, because in diff.go we aren't concerned with services outside of the scope of the group or task we are diffing.

LMK if that doesn't quite make sense 😅

@apollo13
Copy link
Contributor Author

apollo13 commented Jul 31, 2021 via email

@apollo13
Copy link
Contributor Author

@shoenig Not trying to be annoying, but any chance you could give this another review?

@lgfa29
Copy link
Contributor

lgfa29 commented Sep 8, 2021

Hi @apollo13, sorry for the delay on getting this review.

This was a nice catch, but the more I thought about the more tricky I realized it was 😅

The heart of the problem is that services don't have a way to be uniquely identified. Every field can be changed, so it's impossible to keep track of which service is which when computing a diff.

Users usually have an intuition of this because they have this implicit notion of what is being modified as they edit the file, but when looking at just a before and after snapshot it's impossible to tell what modification were actually made. There is a similar problem in the frontend world, where the library needs to keep track of objects in a page. Unless the developer tells the library which object is which, it's impossible to figure out exactly. Here is a nice visual explanation: https://twitter.com/dan_abramov/status/1415279090446204929.

Your fix was great, but I wondered if it would be possible to get even better diffs if we considered even another facet of services: their relative position within the services list.

So I came up with #11148. I played around with it a bit, and also wrote some extra tests (I also moved your test to a new function because wow that TestTaskGroupDiff is long 😄) and it seems to work. I also think the final code is simpler and easier to understand, but I would love to hear your thoughts.

If that code works, we can cherry-pick it here and have your PR (finally) merged 🙂

@apollo13
Copy link
Contributor Author

apollo13 commented Sep 8, 2021

Hi @apollo13, sorry for the delay on getting this review.

Hi @lgfa29 -- no worries, thanks for getting to it.

This was a nice catch, but the more I thought about the more tricky I realized it was 😅

Oh yes, now imagine me when I ran into this rabbit hole :)

The heart of the problem is that services don't have a way to be uniquely identified. Every field can be changed, so it's impossible to keep track of which service is which when computing a diff.

That is a perfect summary, I couldn't have put it any better. Given that I think we should a) find a simple solution and b) focus on common cases. I think users can accept a sub optimal diff output in edge cases as long as it is at least technically correct (which the current one is not -- and I wish I knew that when I started using nomad since I always though I could just have one service with a name).

Your fix was great, but I wondered if it would be possible to get even better diffs if we considered even another facet of services: their relative position within the services list.

Talking about user intuition that is certainly a good idea.

If that code works, we can cherry-pick it here and have your PR (finally) merged 🙂

Well you do have the tests to show that your approach works. Where it falls down though is if I change the following:

diff --git a/nomad/structs/diff_test.go b/nomad/structs/diff_test.go
index c1c04af76..3ff0af206 100644
--- a/nomad/structs/diff_test.go
+++ b/nomad/structs/diff_test.go
@@ -7370,11 +7370,11 @@ func TestServicesDiff(t *testing.T) {
                        New: []*Service{
                                {
                                        Name:      "webapp",
-                                       PortLabel: "http",
+                                       PortLabel: "https",
                                },
                                {
                                        Name:      "webapp",
-                                       PortLabel: "https",
+                                       PortLabel: "http",
                                },
                        },
                        Expected: []*ObjectDiff{

(That is when adding a new service with the same name before the existing one -- may be because the user sorts port labels alphabetically or whatever).

Thinking about user intuition I am actually not sure what the best approach would be. Yours is probably as reasonable as mine and mine uses "service + port label" as rather strong identifier that the service did not change (At least that is what I was trying to put in code). I'd argue that from a user perspective one could consider service name + port label as kind of primary key and tags as metadata (at least that is how I saw it).

Thinking about it all I like my approach better wrt to the outcome and yours better with regard to simplicity. I have one idea which could make yours even better: Make the scoring actually a score. With that I mean that we give the position match a score of one and name and port label each a score of 2 and we accept the index with the highest score. I'll try to put that into code -- what do you think?

@apollo13
Copy link
Contributor Author

apollo13 commented Sep 8, 2021

I came up with the following diff for your PR:

diff --git a/nomad/structs/diff.go b/nomad/structs/diff.go
index 72ccadb0b..e9f9fd453 100644
--- a/nomad/structs/diff.go
+++ b/nomad/structs/diff.go
@@ -707,6 +707,9 @@ func serviceDiffs(old, new []*Service, contextual bool) []*ObjectDiff {
 // findServiceMatch returns the index of the service in the input services list
 // that matches the provided input service.
 func findServiceMatch(service *Service, serviceIndex int, services []*Service, matches []int) int {
+	oldScore := -1
+	newIndex := -1
+
 	for i, s := range services {
 		// Skip service if it's already matched.
 		if matches[i] >= 0 {
@@ -739,27 +742,28 @@ func findServiceMatch(service *Service, serviceIndex int, services []*Service, m
 		// None of these values are enough on their own, but they are also too
 		// strong when considered all together.
 		//
-		// So we consider two services to match if at least 2 of these 3 values
-		// are equal.
+		// So we try to score services by their main candidates with a slight preference
+		// towards name + portlabel over index position.
 		score := 0
 		if i == serviceIndex {
 			score += 1
 		}
 
 		if service.Name == s.Name {
-			score += 1
+			score += 2
 		}
 
 		if service.PortLabel == s.PortLabel {
-			score += 1
+			score += 2
 		}
 
-		if score >= 2 {
-			return i
+		if score > 0 && score > oldScore {
+			oldScore = score
+			newIndex = i
 		}
 	}
 
-	return -1
+	return newIndex
 }
 
 // serviceCheckDiff returns the diff of two service check objects. If contextual
diff --git a/nomad/structs/diff_test.go b/nomad/structs/diff_test.go
index c1c04af76..ad7d4ca6c 100644
--- a/nomad/structs/diff_test.go
+++ b/nomad/structs/diff_test.go
@@ -7370,11 +7370,11 @@ func TestServicesDiff(t *testing.T) {
 			New: []*Service{
 				{
 					Name:      "webapp",
-					PortLabel: "http",
+					PortLabel: "https",
 				},
 				{
 					Name:      "webapp",
-					PortLabel: "https",
+					PortLabel: "http",
 				},
 			},
 			Expected: []*ObjectDiff{

What do you think?

@apollo13
Copy link
Contributor Author

apollo13 commented Sep 8, 2021

Oh, with your changes we have some failures in TestTaskDiff. I'll see if we need to adjust the tests there or if we did fundamentally break it :d

@apollo13
Copy link
Contributor Author

apollo13 commented Sep 8, 2021

Okay, so I have fixed up the failures in the existing tests and pushed your (and my new changes) into this branch. WDYT?

@lgfa29
Copy link
Contributor

lgfa29 commented Sep 8, 2021

Looks good! Thanks for the extra testing and code.

I just pushed a few minor improvements to try and make some of the values in findServiceMatch more clear. I also noticed that I accidentally removed the line that sorted diffs before returning and that the last for loop was not actually required, since we were already looping over the old services list.

@lgfa29 lgfa29 requested a review from jrasell September 8, 2021 19:03
@apollo13
Copy link
Contributor Author

apollo13 commented Sep 8, 2021

Thank you for the cleanups, I've left an inline comment regarding the testfailure. Sorting of the diffs also make sense and is better than my naive reordering to fix the tests (I'll blame my unfamiliarity with the nomad codebase).

Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I going to go ahead and merge this as we have tested that this results in a better output than the current code.

Thank you @apollo13 🎉

@lgfa29 lgfa29 merged commit 75cd30c into hashicorp:main Oct 12, 2021
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nomad plan confused by multiple services with the same name.
3 participants