-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Retrospective] Release Version 2.17.0 #4909
Comments
I think our best bet for the future is to cut the release branch on core as the "freeze" trigger. Maybe core should freeze a couple of days before plugins? Then backports to the release branch on core would only be for critical bug fixes. This would be in comparison to the past ~24 hours, where a flood of features get merged to main, backported to 2.x, and backported to release. With the update to release+1 across main and 2.x, it would only disrupt merging of a small number of critical bug fixes (versus the current flood of last-minute features). |
(Of course we should also stop trying to squeeze features in the last 24-48 hours before a release, but try telling that to people whose paychecks depend on shipping features.) |
Coming from #4908 (comment), I think that it would be nicer to have a trigger for the manifest update job to publish artifacts for branches that have been cut. Once a branch has been cut, the expectation is that the branch is ready to build and artifacts can be published. We then increment the version on 2.x to the next minor version and bwc gets an update as well. Once 2.x is onto the next minor version and has the current minor version as bwc, gradle checks on main that trigger the bwc test expect to have build artifacts ready for the current minor version. But since the manifest update runs on a cron job, it can take some time for this to happen and PRs on main start failing. This happened adter the 2.16 branch was cut as well as on 2.17. A better way would be to trigger a manifest update along with the branch cut so that the artifacts are available immediately. |
Delay in merging version increment PRs causing build failures and delay in RC generation. |
ISM and ISM Dashboards had failing tests in 2.16, that again got flagged in 2.17 release testing. The tests were failing due to logic in ISM tests that would cleanup test suites by deleting ISM system indices. Originally planned in 2.16 (now 2.17), the security plugin introduced a change to identify if a request matches system indices by checking a central System Index Registry from core. The index patterns in the registry are indices registered with the SystemIndexPlugin.getSystemIndexDescriptors extension point. Before 2.17, the security plugin relied on an ISM tests were flagged in 2.17 again because the CI Checks in ISM and ISM Dashboards repos are insufficient. The existing ISM tests with security are only operating on a subset of integ tests. Opened a PR here to address the ISM plugin issues. ISM Dashboards has cypress tests that are only testing w/o security for PR checks. I opened a PR to address the failing checks for ISM Dashboards, but there should be a follow-up change to add a PR check to run cypress tests with security. There was also a change made in the FTR repo where there is similar test cleanup logic that removes system indices. |
@cwperks The real problem is not ISM not making required changes on time to support breaking change pushed by security but security making breaking changes without any campaign. We have seen other instances in case of CA certs where changes were pushed by security which led to failures in other plugins. Adding cypress test in PR is still a reactive approach where a plugin spends time investigating, figures out that issue is not related to the plugin, follows up with security and gets the issue fixed. I don't see any value in this approach although I agree on the overall AI of executing tests on PR workflow runs. |
@vikasvb90 The ISM tests are an issue that needs to be addressed. If the ism indices are supposed to be treated like regular indices then there is no need to register them with the SystemIndexPlugin.getSystemIndexDescriptors() extension point. An index is a system index if its important to the integrity of the cluster and would cause a cluster to enter a corrupted state if deleted. For example, the security index ( All plugins should run integ tests with and without security to avoid catching test failures only at release time. For the demo cert renewal, the demo certs were initially updated to modify the SAN to include IPv6 loopback address. @DarshitChanpura updated the certs and opened PRs on all repos that maintained copies of the certs, but ISM was missed in 2.13: opensearch-project/security#4061 In a future release, I'd like to run a campaign across plugins to remove copies of demo certs and instead to refer to the certs centrally. See example of how SQL plugin pulls them from the security repo here: https://github.com/opensearch-project/sql/blob/4303a2ab755d53903094dd94a5100572677a27a1/integ-test/build.gradle#L107-L111 |
@cwperks I am not questioning the intention of the change. I do agree that change needs to be there but they need to be called out proactively as a campaign or as any other mechanism. By the time plugin finds this out and calls out, it is already too late. |
@vikasvb90 One of the biggest pro-active measures to catch these early-on is to add a CI check that runs tests with security enabled, since we run security-enabled tests for all RCs. This would enable plugin owners to debug any failures, and/or reach out to security team to get these addressed as soon as possible. |
@DarshitChanpura That's not being pro-active. If you rely on foreign plugins to come and tell you that something is buggy in security then that is being reactive. And it ends up wasting a lot of dev cycles. Tests in other plugins should just be treated as last line of defense. |
Related release issue?
#4908
How to use this issue?
Please add comments to this issue, they can be small or large in scope. Honest feedback is important to improve our processes, suggestions are also welcomed but not required.
What will happen to this issue post release?
There will be a discussion(s) about how the release went and how the next release can be improved. Then this issue will be updated with the notes of that discussion along side action items.
The text was updated successfully, but these errors were encountered: