Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add latest Tag to Docker Release #10498

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

smashedr
Copy link
Contributor

@smashedr smashedr commented Sep 4, 2024

One other change I would like to see, is adding a latest tag to the docker image that always points to the latest version published.

This will allow easily updating self-hosted deployments without having to make code changes every time.

I also updated the setting of the date output that is soon to be deprecated with the new format. Reference https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/

Copy link
Contributor

github-actions bot commented Sep 4, 2024

Messages
📖 ✨ Thanks for your contribution to Shields, @smashedr!
📖

Thanks for contributing to our documentation. We ❤️ our documentarians!

Generated by 🚫 dangerJS against 42ae39a

@@ -68,6 +68,6 @@ jobs:
context: .
push: true
platforms: linux/amd64,linux/arm64
tags: ghcr.io/badges/shields:server-${{ steps.date.outputs.date }}
tags: ghcr.io/badges/shields:server-${{ steps.date.outputs.date }},shieldsio/shields:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tags: ghcr.io/badges/shields:server-${{ steps.date.outputs.date }},shieldsio/shields:latest
tags: ghcr.io/badges/shields:server-${{ steps.date.outputs.date }},ghcr.io/badges/shields:latest

@chris48s
Copy link
Member

chris48s commented Sep 4, 2024

I think having a rolling tag that points at the latest snapshot makes sense.

I reckon using latest for that is also probably sensible, but I wonder if you have a view on that @calebcartwright ? Currently we don't have a latest tag at all and next is pushed every time we commit to master.

Note: Before we merge this, update https://github.com/badges/shields/blob/master/doc/self-hosting.md#docker again

@chris48s chris48s added operations Hosting, monitoring, and reliability for the production badge servers self-hosting Discussion, problems, features, and documentation related to self-hosting Shields labels Sep 4, 2024
@calebcartwright
Copy link
Member

I think having a rolling tag that points at the latest snapshot makes sense.

I reckon using latest for that is also probably sensible, but I wonder if you have a view on that @calebcartwright ? Currently we don't have a latest tag at all and next is pushed every time we commit to master.

Note: Before we merge this, update https://github.com/badges/shields/blob/master/doc/self-hosting.md#docker again

as far as i know, the latest tag is still widely viewed as bad practice and it's one i'm not fond of as i've had teams bitten by (ableit outside the Shields ecosystem).

we made the decision to not include it intentionally when we started producing our own, and i'm not keen on reversing that decision, at least not absent exceptionally compelling reasons

@chris48s
Copy link
Member

chris48s commented Sep 5, 2024

To be clear:

Are you saying you think we shouldn't use the name latest (because it is the default if no tag is specified) and we should call a rolling tag pointing to the most recent snapshot something else (so it is more desctiptive and users have to explicitly opt in to following that)?

Or are you completely against having a rolling tag pointing to the latest snapshot with any name?

@calebcartwright
Copy link
Member

TL;DR

Yes I'm adamantly opposed to adding any latest tag

Yes I'm also generally opposed to adding any new rolling tag (though I'm not pushing for us to drop the existing rolling next tag)

I think our starting position needs to be opposition to adding any new tags, with the paradigm for any new proposals to change that position needing to sufficiently make the case as to why we should make an exception to that, as opposed to starting from the position of having to defend why we don't want a new tag


If we're going to entertain this then I'd really like to get a better understanding of the problem we're trying to solve. To expand on my prior comment with some receipts, deploying images using the latest tag is a genuinely bad practice

https://kubernetes.io/docs/concepts/containers/images/#image-pull-policy

Note:

You should avoid using the :latest tag when deploying containers in production as it is harder to track which version of the image is running and more difficult to roll back properly.

Instead, specify a meaningful tag such as v1.42.0 and/or a digest.

That's a short and sweet summary, but the underlying reasons/blast radius from deploying rolling tags (latest being especially egregious) are a lot more extensive and I can expand on it with a few very real examples if needed. Suffice to say there's good reasons why this gets called out explicitly in places like the official kubernetes docs.

The docker images we produce are for the Shields server, so it's something that gets deployed as opposed to being used as a base image (where rolling tags are less problematic) by developers using them to construct their own deployable images.

Accordingly, or at least from my perspective, if as a project we start producing a latest tag the side effects could very realistically include:

  • we're now promoting/supporting a known anti-pattern
  • there's more tags for consumers to have to process/consider
  • there's an increased chance of our maintainer team having to spend time helping users troubleshoot issues that stem from the flaws/challenges with the latest tag

Conversely, I don't see what substantive gains would be achieved from doing this

shields.io needs to deploy basically every update we get, but i believe it's still the case that level of cadence would be extremely rare for self-hosters, many (perhaps most if not all?) of whom can just deploy and forget about it as their surface of badge consumption is much narrower than the entire corpus of shields.io users (e.g. how many self-hosting users absolutely urgently need #10489, #10497, etc.)

i run a self-hosted shields instance within a highly regulated corporate environment that has resultant policies which set a relatively low ceiling on the age of running images which in turn results in me having to redeploy more frequently than needed from a shields-updates/features perspective. i'd posit my own self-hosted deployment cadence is much more frequent than the overwhelming majority of self-hosters.

i have no issues tweaking the variable i use to grab the new monthly tag. it's not something i find burdensome whatsoever, i wouldn't switch to any rolling tag to spare myself that highly marginal effort i go through every < 45 days, and if i was adequately motivated i'm sure i could script that away.

we provide the next tag, at least in part, so that users who still really want a rolling tag anyway can do so, but as i recall we opted to intentionally name it something different than latest so that they also have to be very explicit on the pull side to do so, functionally opting in to the risks of deploying rolling tags (whereas latest would be implicit due to client behavior)

finally, our monthly server tags aren't anything special relative to next other than being more nicely bundled up with a changelog for more human friendly consumption. they don't undergo any different levels of testing nor rigor, it's just a "hey if you're a self-hoster considering an upgrade, here's a more easily digestable view of a time-based batch of commits".

As such, if there's a self-hosting use case that wants to completely automate the process of upgrading their self-hosted instance by removing the human element of specifying a calver tag, why can't they use the existing, rolling next tag? what value are they gaining by having a rolling latest or rolling foobar tag above and beyond the next tag?

e.g. If there's a self-hoster that deployed a latest tag which at the time was aligned to the server-2024-06-01 tag and we subsequently had some security vulnerability and fix that went out in the server-2024-11-01 tag, does their original deployment with the latest tag offer them anything above and beyond the next tag as far as making it easier for them to determine whether they have the fix in their local environment or need to redeploy? I don't really think so, I think they'll still be in the position of having to compare image digests or having to explicitly force a clean deploy (either on new infra or ensuring the local image cache is nuked in the running environments)

@chris48s
Copy link
Member

chris48s commented Sep 5, 2024

Having shot us in the foot with next I'm on board with "don't deploy from rolling tags". Lesson learned from that one.

#9391 (comment)
https://github.com/badges/shields-ops/issues/34
https://github.com/badges/shields-ops/pull/35

Certainly not in an environment where you've got more than one instance in a cluster

The docker images we produce are for the Shields server, so it's something that gets deployed as opposed to being used as a base image (where rolling tags are less problematic) by developers using them to construct their own deployable images.

I think this is quite a good point. First off in that situation (base image) you only pull from the upstream tag at build time and then you're locked once your image is built. But also, in a situation where the tag you're tracking is node:20-alpine or python:3.10-bullseye or something there is an expectation that tracking that tag is going to to give you bug/security fixes only and isn't going to make a BC-break. There's value in tracking the latest changes to that tag on every build. For something like this where we use CalVer rather than SemVer and any monthly release could in principle include a BC-break (despite our best efforts), I can see why it is more useful to nudge people to have a look at the changelog once a month, or however often they upgrade.

In any case. This is not an issue I have really strong views on. Hence tagging you in - I figured you have probably thought about this one than I have. I think I'll leave it with @smashedr to say why this would be good/useful if they are not convinced. Otherwise we can just close it.


tangent

as i recall we opted to intentionally name it something different than latest

When this issue popped up, it actually got me thinking about why we use next rather than latest. I went back to the thread #3558 and I think that is just what Paul happened to call it when he set it up tbh, unless the conversation happened elsewhere (discord?). Anyway, I couldn't find any reason for it that hasn't disappeared into the ether.

@smashedr
Copy link
Contributor Author

smashedr commented Sep 5, 2024

In a production environment, you should always pin versions.

But for a simple self-deployment, using a latest tag makes it very easy to update to the latest version. All I would have to do to update is click 1 button:

firefox-20240905-120430053

And as far as knowing what version is running, all I have to do is click on the image, and its clear as day:

firefox-20240905-120401409

But when pinning versions, I have to get the latest version tag, open my IDE, commit the changes, then redeploy the application: https://github.com/smashedr/shields.io

The latest tag is just a convenience for people who are not in production environments and don't want to pin versions.

@calebcartwright
Copy link
Member

calebcartwright commented Sep 6, 2024

@smashedr thanks for the quick response. i'll reply to a couple items inline below, but i do want to center on what i view as the main question that still remains unanswered (or at least unclear in my mind):

for your particular use case, what gaps/challenges do you have with the next tag that would prevent you from using that as your rolling tag? is there something that could be added/changed with the next tag that would make it viable for your use case?


But for a simple self-deployment, using a latest tag makes it very easy to update to the latest version. All I would have to do to update is click 1 button:

I get the simplicity that comes with a model that utilizes rolling tags, though I'm unsure what your personal distinction is between "simple self-deployment" and "production".

Regardless though, we have to think about the artifacts (which includes tags) that we produce, how those could potentially be consumed by any of our users, and the potential effects that could have on us. We can reasonably say that we don't know if/how the app may behave on e.g. powerpc or any other non-x86 instruction set architecture, and that we don't have the time/expertise/funds to be able to test, validate, and support, but we do already spend precious cycles supporting self-hosting users and extending our main image with more rolling tags (especially latest) has to big a risk of exacerbating that strain.

To be clear, I'm not suggesting that any and every usage of a rolling tag will result in a catastrophic explosion, and it may not be in an issue for your particular circumstances. But we can't focus solely on that, as it's not an image tag we'd be producing for your use case alone.

And as far as knowing what version is running, all I have to do is click on the image, and its clear as day:

Good shout, i forgot we put that env var in there. Though worth noting that just makes troubleshooting a bit easier in that it simplifies the process of identifying one of the problematic situations (e.g. different versions running on different nodes) that can and do arise with deploying rolling tags

But when pinning versions, I have to get the latest version tag, open my IDE, commit the changes, then redeploy the application: https://github.com/smashedr/shields.io

The latest tag is just a convenience for people who are not in production environments and don't want to pin versions.

This is where I come back to trying to differentiate between one user, one use case (the perspective you're understandably focused on) versus all users, any use case (the perspective from which we have to evaluate this)

what's the scenario where someone is:

a. not in production, by extension presumably open to accepting some risk
b. needs/wants the convenience of a rolling tag
c. but cannot use the existing rolling next tag

@smashedr
Copy link
Contributor Author

smashedr commented Sep 6, 2024

I am not asking anyone else to switch to using the rolling tag latest. Everyone who prefers to pin a version, can continue to use the server-xxxx tag. I just want it to be provided for those who know how and want to use it for their workflow.

All this does is add the ability for additional workflows to be used, vs the one you want people to use; pinning versions. It up to the end users to decide what they want in their environment.

There is a big difference in next and latest; as next is published on every commit to master, and latest will always point to the latest server-xxxx release. Additionally, next only builds linux/amd64 so can't be used for linux/arm64 deployments.

As far as the a,b,c scenario you mentioned. I am that person. A, I fully understand how it works, and it makes my workflow much, much easier to manage. B, See A, it makes the workflow much, much easier to manage for me. C, I can not use next because my swarm is arm64, and I do not want to be on the bleeding edge. When I click the update button on my stack, I would prefer it to update to the latest server-xxxx release, which is what a latest tag would do.

So again, all I am asking is to add the ability for people who want to use a latest tag, to be able too.

@calebcartwright
Copy link
Member

re: #10498 (comment)

For the sake of trying to expedite the conversation, I'll reiterate that I know what you're asking for. I understand what the PR is proposing, and I understand when and how we're publishing our tags, so we can leave that there 👍

Your comment did include the answer to my question though which is helpful:

You need to deploy on arm, you'd prefer to use a rolling tag, and there's not a rolling arm-compatible tag today.

That's the objective. You've proposed one way that would achieve that, but that's not the only way we could provide a rolling tag that'll run on arm.

I'm going to go ahead and say that latest is not going to happen, and I don't think it's worth anyone's time to belabor that argument; the tag being named later vs rolling vs monthly vs foobar wouldn't make a difference one way or the other for your use case.

The main point I want to drill into now though to see what alternatives might be viable is this 👇

and I do not want to be on the bleeding edge.

I'm curious what "beeding edge" means to you, specifically in this context. What do you think are the meaningful differences between our next tag and some server-xxxx tag, do you think one is any more stable or has gone through more extensive validation than the other (and if so how does that fit in with your prior characterizations about non-production)? Would you dispute that for large chunks some months those two tags have the exact same contents?

@smashedr
Copy link
Contributor Author

smashedr commented Sep 6, 2024

The name of the tag is irrelevant, I just want to see a rolling tag that I can use to make updating my self-hosted server much easier.

As far as the other tags, I don't fully understand how this project decides to make a server-xxxx release/tag, but I do know it is not just a mirror of the master branch and changes less frequently.

@chris48s
Copy link
Member

chris48s commented Sep 6, 2024

I don't fully understand how this project decides to make a server-xxxx release/tag

It is pretty aribtrary. That is why I refer to it as a snapshot. There is a scheduled job that runs once a month
https://github.com/badges/shields/blob/master/.github/workflows/draft-release.yml
That opens a PR with a draft changelog based on the commits since the last tag. Then one of us manually checks/cleans up the changelog, merges the PR, and then that merge triggers making the git tag, building/pushing the docker image, etc.

It is fairly aribtrary and hasn't gone through any additional QA.
We might also choose to do an ad-hoc server- release if we were fixing a major securiy issue, for example. That would allow us to point to a "fixed version" you can upgrade to in a security advisory.

There are some slight qualifiers to this though. I might delay a release if there's a fix I'd particularly like to get merged first (e.g: #8467 (comment) ). I do also try to make sure we don't cut a snapshot at a time when there is some known problem. Occasionally we do change something that causes an issue. For example, we merge a PR that seems reasonable at review time, get it into production and realise it introduces a problem we hadn't considered (example: #10125 ). This is not super frequent, but it has happened. If the automated PR saying "time to do a release" popped up when something like that had happened and I knew master was in a buggy state, I would hold off on cutting the release until that issue had been resolved. The other thing is I will not usually cut a monthly snapshot if there are changes on master we are not yet running in production. So I try to make sure everything that goes into a snapshot has served some level of prod traffic without immediately falling over.

In that sense, I would say there is a slightly higher level of "stability" associated with the monthly releases than just tracking next/master.

@calebcartwright
Copy link
Member

calebcartwright commented Sep 6, 2024

It is pretty aribtrary. That is why I refer to it as a snapshot.
In that sense, I would say there is a slightly higher level of "stability"

I'd partially agree but maybe with some nuance. I think snapshot is a fair description and feel it echoes my main original point I wanted to ensure was clear about it being arbitrary: next and server-xxxx are both point-in-time publishes from the master branch, with the exact same of tests executed, and there is some indeterminate amount of time each month when they can be identical; if you happen to deploy server-xxxx during the first week of the month (or if we've just not merged any PRs recently) then server-xxxx often is exactly master and exactly next

Where I'd still draw a little nuance is that I wouldn't want anyone to infer a label of "stability" on any of the server tags. We don't use them ourselves, and it's very conceivable that a server-xxxx tag, even the most recently published one, could have issues not present on Shields.io, as we're not going to go back and yank a bad server-xxxx tags and we're not necessarily promising to publish an updated server tag ad-hoc to address an issue (and a minor issue with badge Foo may seem not worth it to us, but which could be pretty impactful to someone running a self-hosted instance that uses badge Foo heavily).

For me the snapshot/server-xxxx tagged images have the potential to have a more human intentionally selected set of diffs, but it's also true that it could have an issue deemed significant to some users


We've given @smashedr some conflicting thoughts on next steps so I think we as a maintainer team need to discuss a couple things to get on the same page and figure out to proceed here, the only hill I'll die on is that I don't want any tag named latest but we need to find agreement on whether we're going to add a new rolling tag, and if so (a) what to name it, (b) when it gets published, and perhaps (c) any stance we want to take in docs around recommended usage, and support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
operations Hosting, monitoring, and reliability for the production badge servers self-hosting Discussion, problems, features, and documentation related to self-hosting Shields
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants