-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: nightlies off master and release-23.1 report incorrect version information #100532
Comments
git describe
is not included any more in the crdb build tag
I think we should revert edc4b47 and then consider an alternate approach later after the 23.1 release. |
I have tried to investigate how come #97214 was made that way and how this major problem wasn't caught there. At the time we already knew we need to have the SHA in the build tag and the recommendation was to add (not replace) a field containing a "release series" with a more stable version number. |
The commit SHA is still in there. Can we not simply update the |
yes we can perhaps do that (we can even reduce to just the subset of the commit SHA that uniquely describes the commit, like git-describe does). we also need to ensure the call to override the version in many |
I agree the revert is only the fall-back solution if we cannot do something more clever. |
dev-inf cannot singlehandedly make this change (we don't have a way to validate all the necessary invariants are met). I am assigning @renatolabs to this. Unless just updating the
Can you say more about this? |
We have in many packages a call like this: defer build.TestingOverrideVersion("v999.0.0")() because the tests perform comparisons against the version string and we want these tests to have a stable output. |
The build tag should have the revision attached to it if the build is not built with Line 70 in cd933d3
So I think what is actually being requested here is to not tag these nightlies as |
When I run |
Cross-builds ( |
ok thank you for explaining. i'm going to run a cross build to satisfy myself that all is well and then we can close this. |
Wait, I’m still getting it: we don’t want all builds sha identified ie we want a build that is actually of 23.1.4 to identify itself as 23.1.4, not via |
David, look waht this release stamping gives us:
I think this build tag is OK to me. What do you think? |
So to wrap up here is my understanding:
@rickystewart is that understanding correct? |
I think we need something stronger than this, specifically a mechanism that ensures that for a given content of |
Even if we do this -- figure out a mechanism by which we can ensure we only pass release=true for a specific SHA -- I still think identifying publicly -- in status, in telemetry, in crash reports, etc -- using this version in file is a step backwards: we've gone from a safe-by-default mechanism, where the identifier for the binary, used in all the places where it is critically important that we correctly identify it accurately, was derived from the source of truth (git, i.e the version of the code itself), to unsafe-by-default: now if we build two binaries, from two different SHAs, but happen pass release=true both times, we end up with two different binaries that incorrectly claim to be the same thing. |
My recollection is that identified the need to shift to using a checked-in file so that tests which needed to determine the predecessor version would change what version they thought they were running at when they went to look up the predecessor for that version? right? Why not just check in "predecessor_version.txt" for that purpose then, versus changing how a binary determines its own version? |
Just want to point out: the old approach was "safe" but I also think it was incorrect and confusing, as I described in #75287 |
Confusing, sure, when your tags aren't fresh. But "incorrect"? I'd say it was less capable of being incorrect in that it would never claim to be a SHA that it isn't -- it might say it is SHA foo that is 10000 commits after v1.2, which, like, is less informative than it saying it is v8.9, but that's still more correct -- it accurately and more importantly uniquely identified the running code -- than a scheme that can describe two completely different versions of code as being the same version. Is It confusing to call v8.9 v1.2+1000 ? yes. but still much more correct than to call something that isn't actually 8.9 "8.9". |
I'm not disputing that it's less incorrect. Just saying, if we have any alternative, let's not go back to something which is confusing and incorrect (bearing in mind that the current situation is also incorrect). |
I think my short-term proposal would be, rather than a bool to SHA-suffix the file content or not, which seems like it has a pretty bad failure mode if we ever mess it up, instead we add back an empty string package var that can then set using ldflags to the git describe output, called something like If We would pass the ldflag to set Basically, I think this is similar to the suggestion above to just use IsRelease, but with the slight tweak that when you use it, you provide a unique value directly, vs just hoping the content of version.txt is unique at the time. |
@knz Yes, you have summarized the situation accurately. The previous
This will not happen in the normal course of use. The Of course, anyone can just build their own binary that says it's whatever SHA, but anyone could do that at any point. |
I'm trying to confirm what is being proposed -- is it to combine @renatolabs and @dt 's ideas to do something like:
|
You have to explicitly set the build environment to If you build two different SHA's with the same In any case, the actual SHA is still there in the same place it has always been. With regard to solving the nightly build tag problem, I propose all of the following.
From that point forward people writing new code will have to decide whether |
Thinking a bit more here: If we go the |
@rickystewart How about this slight adjustment to your suggestion: Have |
A few points.
|
One thing I'll point out: there seems to be a strong desire to not to base the binary's reported version of the unique tag in the git, but in basically all debugging and support interactions, we use the binary's reported version to go look at git and what it says the code as of that tag was. When a customer says "I'm running 22.2.1 and see the job retrying when it does X", we go look at git to see what the code as of tag 22.2.1 did, if a given fix is in its parents, etc. The git tag is our source of truth for what "22.2.1" was. But, if we're dead-set on not forcing the binary's reported version to match the canonical scm tag, then I'd say we should at least ensure the publication pipeline ensures we cannot publish the same reported version twice: use the version.txt-derived identifier as the "Build ID", fail to upload artifacts if it already exists, etc, so that we ensure we never have two published binaries floating around that claim to be the same thing the way we currently do. |
I drafted something in #100694 that I think hits both goals: (a) use the version file instead of |
We do have nightly builds. The "Make and Publish Build" family of build configs produces them. (Sometimes we forget things.) On the proposals:The things I'd like to see in whichever solution we choose (Ricky's, Ricky's + DTs mod, Renato's) are:
This needs to be done for Make and Publish Build as well. Given that need, how about we make the value either more generic (
If the value is more specific, we could identify where in the release process a given binary was generated. |
I started working on implementing @rickystewart's suggestion above to have a different build.typ value other than But when I got to hooking that up in make-and-publish-build-artifacts.sh, I'm now realizing something else: even once we implement it, we've introduced a subtle but worth discussing change, perhaps a breaking change, in how nightlies describe themselves. Previously, for the past few years, a nightly or custom build from a release branch, say from release-22.2 on some SHA after As it now stands, assuming we fix the IsRelease thing, that same build, which is half way between Maybe this is better -- not that we follow semvar, but if we did, it would say this is the correct way to identify a pre-release of 22.2.7 which anything taken from release-22.2 after 22.2.6 is tagged would be -- but it is certainly a change in behavior, and perhaps one on which we should consult with the TSE, L2 and field teams? In particular, it is a little scary we're changing it in-place: previously a nightly identified as "v22.2.6-blahblah" was greater than 22.2.6, but now it'll be the exact opposite -- if you see |
One option: we could very narrowly, i.e. only for release builds for 23.1, revive the old behavior for now. That is, we could re-open #100581, so that release builds and only release-builds would use use have the $build_name, which is currently picked with the old sam-state-based logic, passed into the release binary to be the value of the version identifier. I believe doing so this would mean there is no user-visible change between reported version behaviors in 23.1 release builds compared to 22.2 release builds, meaning 23.1 could then be unblocked to proceed while we figure out the longer-term solution (introduce more release build types, figure out how to present them and their derived version identifiers to users). I don't think using #100581 as in interim measure to unblock 23.1 would bring back too much of the negative effects of SCM-backed version identifiers either -- IIRC most of the acute drawbacks to SCM-picked version identifiers were their effect on dev builds and specifically tests, not release builds, but tests should be unaffected by #100581 given its narrow applicability to only release builds. |
What would be the advantage of doing this? So we don't have to notify TSE/L2/field that the build tags for nightly builds of I think is probably sufficient to clarify that "You should not be reading any special meaning into whatever characters we add at the end to disambiguate between alphas of the same number (but just so you know, after 23.1 it means...)", and ASAP before our next major release is as good a time as any to do it. Of course my hunch about that may be wrong, so reaching out to all of these groups is still a good idea.
AFAIK the only thing that is actually a blocking issue is that the alphas don't have disambiguated build tags. We can (I think) easily address this specifically without much churn. I think it is highly preferable to fix forward rather than go back and do a bunch of monotonous reverts and backports just to add some regressive logic back into the build system. |
I can write a PR to implement the fix, but I don't want to collide with @dt who apparently has been working on this. |
My point is that we're doing the opposite: even if you say to ignore everything on the end, we're changing the meaning of the stuff at the front. We push Is that okay? Maybe it is! You can argue that "22.2.6 plus stuff" is by definition a pre-release of 22.2.7, so maybe this is fine, but as far as I know, we never asked CC SREs, TSEs, L2s, etc about this (if there was an RFC or doc, I didn't see it?).
FWIW, my suggestion above was one commit changing ~20 lines specifically for release builds and not reverting anything else, just to keep status quo for 23.1 release builds while we vet the proposed behavior change. |
I don't know if anyone has already created a vehicle to having this conversation? If not, I've created [DRAFT] Build Tag for Nightly, Custom and Production Release Builds:
Once we 👍🏼 on correctness / completeness we can loop in the groups mentioned. |
I've renamed the title of the bug to reflect the issue which is that the nightly builds report incorrect, non-unique build tags. The PR I will file to close this is targeted toward that one problem specifically. We will follow up with other work that will update the build tags further to add more precision to the edge cases we've described. |
When we build nightly builds, we build them in "release" configuration. Because we do this, the build tag reported by these binaries from `cockroach version` has been identical to that of an actual release binary, which is very confusing. Here we update the script to build these nightlies (`make-and-publish-build-artifacts.sh`) to inject an appropriate, identifiable build tag. It is probably wrong that we are building these nightlies as if they were "releases". We'll follow up with work to fix this and refine the build tags further. Closes cockroachdb#100532. Epic: None Release note (build change): Update reported `Build Tag` for nightly (non-release) builds
When we build nightly builds, we build them in "release" configuration. Because we do this, the build tag reported by these binaries from `cockroach version` has been identical to that of an actual release binary, which is very confusing. Here we update the script to build these nightlies (`make-and-publish-build-artifacts.sh`) to inject an appropriate, identifiable build tag. It is probably wrong that we are building these nightlies as if they were "releases". We'll follow up with work to fix this and refine the build tags further. Closes cockroachdb#100532. Epic: None Release note (build change): Update reported `Build Tag` for nightly (non-release) builds
101937: catalog: utilize scans for a large number of descriptors r=fqazi a=fqazi Previously, the catalog descriptor would fetch descriptors via point lookups using Get when scanning large batches of descriptors. This was further extended to also look up ZoneConfigs and comments in a similar way. Recently, we we started seeing regression on the Django test suite involving the pg_catalog tables, which tend to do read large number of descriptors, likely linked to extra overhead linked to both comments and zone configs in 23.1. To address this, this patch ill now start using scans for runs of descriptor IDs for batch scans which reduces the overall cost of fetching a large number of descriptors hiding this cost. Fixes: #100871 Release note: None 101943: build,release: provide way to inject build tag override, use in MAPB r=rail a=rickystewart When we build nightly builds, we build them in "release" configuration. Because we do this, the build tag reported by these binaries from `cockroach version` has been identical to that of an actual release binary, which is very confusing. Here we update the script to build these nightlies (`make-and-publish-build-artifacts.sh`) to inject an appropriate, identifiable build tag. It is probably wrong that we are building these nightlies as if they were "releases". We'll follow up with work to fix this and refine the build tags further. Closes #100532. Epic: None Release note (build change): Update reported `Build Tag` for nightly (non-release) builds 101944: sem/tree: remove TODO r=mgartner a=mgartner This commit removes a TODO that was addressed by #96045. Epic: None Release note: None 101952: ui: fix linegraph render for large clusters r=zachlite a=zachlite We were feeding `Math.min` and `Math.max` very large arrays (length equal to ~10e7). These functions take variadic arguments, and the runtime can't handle that many arguments. As a result we blow the stack, causing uncaught range errors. Instead, we'll use lodash min and max which are not variadic. Resolves #101377 Epic: None Release note: None Reproduction and Fix demonstrated on the command line: <img width="722" alt="Screenshot 2023-04-20 at 4 56 41 PM" src="https://user-images.githubusercontent.com/5423191/233486016-fd740a98-9a0d-4fef-a6d8-67e9cb4e318e.png"> Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com> Co-authored-by: Ricky Stewart <rickybstewart@gmail.com> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: Zach Lite <zach@cockroachlabs.com>
When we build nightly builds, we build them in "release" configuration. Because we do this, the build tag reported by these binaries from `cockroach version` has been identical to that of an actual release binary, which is very confusing. Here we update the script to build these nightlies (`make-and-publish-build-artifacts.sh`) to inject an appropriate, identifiable build tag. It is probably wrong that we are building these nightlies as if they were "releases". We'll follow up with work to fix this and refine the build tags further. Closes cockroachdb#100532. Epic: None Release note (build change): Update reported `Build Tag` for nightly (non-release) builds
When we build nightly builds, we build them in "release" configuration. Because we do this, the build tag reported by these binaries from `cockroach version` has been identical to that of an actual release binary, which is very confusing. Here we update the script to build these nightlies (`make-and-publish-build-artifacts.sh`) to inject an appropriate, identifiable build tag. It is probably wrong that we are building these nightlies as if they were "releases". We'll follow up with work to fix this and refine the build tags further. Closes cockroachdb#100532. Epic: None Release note (build change): Update reported `Build Tag` for nightly (non-release) builds
Describe the problem
On
master
andrelease-23.1
, the version, or buildinfo tag, as used in output ofcockroach version
, reported telemetry, debug zips, version information presented on to operators in cockroach status and the db console, etc -- specifically the build tag -- was switched to be backed by the content of a checked in file.This checked in file is not edited in every commit, and thus two commits, with different content, different bugs and fixes, etc can both describe themselves as the same version. The downstream effects of this on operators needing to know if they need to deploy a fix, or support and field debugging issues, could be significant: it becomes impossible to ascertain which version of the code is actually running when troubleshooting a problem:
cockroach version
will not change.git commit --amend
), the output ofcockroach version
will also not change.To Reproduce
Run
cockroach version
and observe.For example, a 23.1 nightly, built from 5d5abe7, is accurately described as
v23.1.0-alpha.8-219-g5d5abe704b0
(5d5a, which is 219 commits on to of alpha.8), describes itself asalpha.9
:But a nightly build from 63b683e, described as
v23.1.0-alpha.8-223-g63b683eab3e
which is not the same code as the one above -- as seen by the fact it has more commits and a different SHA -- also describes itself as alpha.9! if you ask it what version it is, or in reported telemetry/panics, or look at status to figure out if you need to roll out a fix, etc:This will cause considerable confusion in production.
Expected behavior
The build tag should include character that uniquely identify the commit SHA.
Jira issue: CRDB-26479
The text was updated successfully, but these errors were encountered: