Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] [enhancement] Canary tagging in Nomad 0.6.0 #2920

Closed
mafonso opened this issue Jul 27, 2017 · 13 comments
Closed

[question] [enhancement] Canary tagging in Nomad 0.6.0 #2920

mafonso opened this issue Jul 27, 2017 · 13 comments

Comments

@mafonso
Copy link

mafonso commented Jul 27, 2017

Canaries are now a first class thing in the new update stanza and while within nomad is it possible to check what allocations are in which version, this does not propagate to the service registration in consul.

How can the canaries be filtered apart under the consul catalog? I was hopping for them to have a "canary" tag or something. But I could not find anything in the service catalog that allows me to filter out the canary.

If there is a recommended way of doing this, maybe it could mentioned in the docs, if there is not here is a suggestion:

Maybe nomad could add a pre-defined "canary" tag for the services in the canary state, and remove it after promotion. Or that tag could be configured with a new canary_tag keyword in the update stanza.

This would help tremendously marking the canary instances in the gateways in order to adjust it's weight in the pool or only allow access from a non-public endpoint

@burdandrei
Copy link
Contributor

burdandrei commented Aug 2, 2017

we're just adding tag with v:${VERSION} to distinguish between them
Cause your internal application version has much more information, than nomad task version ;)

@mafonso
Copy link
Author

mafonso commented Aug 2, 2017

We were already doing adding a version tag in our nomad templates. But the use case I'm describing it to be able to tell what instances are canaries regardless of the service version.

If you deploy a new nomad job version using with a canary update strategy but using with the same service artefact (let's say you just changed job parameters, like resources, ports, etc) a version tag will be ambiguous, and yet these changes can dictate the success of failure of the deployment. I agree that is this a stretched corner case, but it's just for sake of example.

Having a well known tag, like canary of a custom static one, during the deployment stage as per nomad definition: "the state between to jobs version" will make it much easier to filter it from the catalog in a more generic way. This tag would be obviously removed after the promote.

@bittrance
Copy link

According to https://www.nomadproject.io/docs/operating-a-job/update-strategies/blue-green-and-canary-deployments.html the canary parameter is meant to support both Canary deploys (inserting a single upgraded member into a cluster) and blue/green deployments (directing a select part of your load to a separate cluster). I don't see how to achieve the second through Consul given that the canaries show up under any services declared by the tasks, effectively becoming indistinguishable members of the cluster.

Being able to point your staging frontend to a DNS name likecanary.backend.service.consul. exported by the consul DNS interface would be really awesome but it does not give us proper blue/green deployments (without writing scripts), because for that we would need a mechanism to instruct the baseline/production load balancer to direct traffic only to members that don't have the canary tag, something e.g. fabio or the Consul DNS interface does not support.

One idea would be to have a CanaryServiceName config within the service declaration that would be used during the canary transition period instead of the normal service name. You could also have an OnCanaries yes/no directive that decided whether canaries joins this service; you could then set up canary and non-canary service pairs. Another idea would be to have an TagBlacklist directive in the update stanza which would make sure the canaries does not get a specific tag (e.g. production) so that we can use that tag to identify the baseline members.

@aantono
Copy link

aantono commented Jan 25, 2018

For what it's worth, even having a rudimentary ability to add a canary tag to Consul service registrations while they are in Canary/Blue-Green deployment mode will go a LONG way. It will then becomes possible to add prepared queries to Consul to allow for various scenarios to filter out or not those services and have different DNS routes, or do service lookups against Consul API from proxies like Traefik, Fabio or LinkerD, etc.

Looking at Nomad code it also seems fairly straight forward to add the extra tag during the deployment start and remove it upon promote.

@shantanugadgil
Copy link
Contributor

Just a thought; could canary deployments during upgrade get a feature of connection draining to services rather than the host.

What this could do is have a safe manner of upgrade which is transparent to the user, and possibly help the user to achieve the right thing.

@skyrocknroll
Copy link

Hi, @dagar looking how straight forward to add and remove the canary can we add this feature in the 0.8 release ? I can raise an pull request if it ok to add this feature .

@dadgar
Copy link
Contributor

dadgar commented Apr 5, 2018

Just to update folks, we will be tackling this in the near term. It will hopefully be in a 0.8.X release and at latest in 0.9 (worst case).

@hsmade
Copy link
Contributor

hsmade commented Apr 17, 2018

While you're thinking about this, please also consider changing the alloc index. If I now add one canary to my job, it gets index 0. So there's 2 instances of index 0. This seriously screws up metrics. Another thought: since we're not replacing exiting instances, but adding new ones, the overall load on the job gets less (as there's more instances). If you are trying to do performance tuning, this is quite annoying.

@dadgar
Copy link
Contributor

dadgar commented May 9, 2018

Fixed by #4200 and #4259. Will be part of 0.8.4

@vvitayau
Copy link

vvitayau commented Oct 23, 2018

I've confirmed this is still an open issue for my hashi-ui.nomad job that "Requires Promotion"

Running

  • Consul v1.4.0-rc1 (1757fbc0a)
  • Nomad v0.8.6 (ab54ebc+CHANGES)
  • traffik v1.7.3

Using

        canary_tags = [
          "traefik.enable=true",
          "traefik.frontend.rule=Host:canary-hashiui.localhost",
        ]

        tags = [
          "traefik.enable=true",
          "traefik.frontend.rule=Host:hashiui.localhost",
        ]

where my http://consul.localhost/ui/dc1/services/hashi-ui has all three allocation tagged with both traefik.frontend.rule. IMHO only the canary running service should have the canary tag and nothing else.
Nor should the original two allocation have any new canary tag added.

Now http://canary-hashiui.localhost/ works because it is the final route rule defined as
Host:canary-hashiui.localhost for all three instances. And http://hashiui.localhost/ no longer works.

Latest Deployment
ID          = 2a2de0cb
Status      = running
Description = Deployment is running but requires promotion

Deployed
Task Group  Auto Revert  Promoted  Desired  Canaries  Placed  Healthy  Unhealthy
server      true         false     2        1         1       1        0

Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created     Modified
5ce80ae4  c24de578  server      7        run      running   30s ago     11s ago
ad129abe  c24de578  server      6        stop     complete  2m46s ago   25s ago
f4898f13  c24de578  server      4        stop     complete  24m58s ago  18m42s ago
7c134bfb  c24de578  server      3        stop     complete  26m5s ago   24m53s ago
7e6a67c9  c24de578  server      5        run      running   2h18m ago   18m31s ago
e862eae4  c24de578  server      5        run      running   2h18m ago   18m31s ago

@vvitayau
Copy link

vvitayau commented Oct 23, 2018

I would have reopened #3340 but it was closed since it was marked as a duplicate of this ticket.

maybe it would have to have canary_name as well as canary_tag
this way I can redefine the service name=canary-hashi
and hashi-ui.service.consul would not be improperly altered

@vvitayau
Copy link

nvm ... ignore my comments, I figured it out.

      service {
        port = "http"
      	name = "canary-hashi-ui"
        canary_tags = [
          "traefik.enable=true",
          "traefik.frontend.rule=Host:canary-hashiui.localhost",
        ]
        tags = []
      }

      service {
        port = "http"
      	name = "hashi-ui"
        tags = [
          "traefik.enable=true",
          "traefik.frontend.rule=Host:hashiui.localhost",
        ]
      }

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants