Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(RELEASE-1283): process-file-updates resources #712

Merged
merged 1 commit into from
Nov 26, 2024

Conversation

scoheb
Copy link
Collaborator

@scoheb scoheb commented Nov 25, 2024

  • new pipeline and task for internal process-file-updates

- new pipeline and task for internal process-file-updates

Signed-off-by: Scott Hebert <scoheb@gmail.com>
@scoheb scoheb requested a review from a team as a code owner November 25, 2024 21:56
@scoheb
Copy link
Collaborator Author

scoheb commented Nov 26, 2024

force merging since the RHOAI needs to start testing their add-on in stage environment

@scoheb scoheb merged commit 5ecc457 into konflux-ci:stable Nov 26, 2024
5 checks passed
@mmalina
Copy link
Contributor

mmalina commented Nov 26, 2024

This is not good. You're not supposed to merge anything to stable like this. Stable is supposed to be in sync with production. This will break promotion to production tomorrow, because we do just fast forward (git push without -f): https://github.com/konflux-ci/release-service-catalog/blob/development/.github/scripts/promote_branch.sh#L157

@scoheb
Copy link
Collaborator Author

scoheb commented Nov 26, 2024

This is not good. You're not supposed to merge anything to stable like this. Stable is supposed to be in sync with production. This will break promotion to production tomorrow, because we do just fast forward (git push without -f): https://github.com/konflux-ci/release-service-catalog/blob/development/.github/scripts/promote_branch.sh#L157

my bad. I did the same type of thing for going to the internal branch.

Is there some other way that we can be more tolerant of hotfixes or patches that are needed prior to the scheduled promotion? There is no way with internal-services to allow users to use sanctioned content from the trifecta of branches like we do in the catalog.

@johnbieren
Copy link
Collaborator

How do we get to a point where we can make users wait for more than one day to get a new feature they requested? I've brought this up in two meetings now and it was a frequent topic at our F2F. We are just merging things to unblock a user and creating technical debt. I keep hearing "oh this is just for high priority product X" and there is a new product X every week. If they want bleeding edge, put them on the staging SRE cluster and use the development branch. Why do we even have the staging SRE cluster if we aren't using it to enable users to use things before they are going to production?

@scoheb
Copy link
Collaborator Author

scoheb commented Nov 26, 2024

Why do we even have the staging SRE cluster if we aren't using it to enable users to use things before they are going to production?

for:

  • e2e tests
  • manual integration testing

right now, we are limited on how we can allow access to the staging SRE cluster from a prod Konflux cluster. To achieve this, you'd have to:

  1. setup a new deployment on the staging SRE cluster that connects to the prod Konflux cluster.
  2. Then once you add that managed workspace to the allow list
  3. the internal services controller would start acting on every single IR coming from the Konflux prod cluster....meaning we start signing everything in double. not feasible or desirable.

if you have any ideas, I am all ears.

How do we get to a point where we can make users wait for more than one day to get a new feature they requested?

why one day? you'd have to wait maybe a week if the window is passed.

this is not some willy-nilly new untested feature we are trying to get into prod. it was battle-tested on the stage p01 Konflux cluster and has soaked in internal/staging for at least a week.

but if you want to generalize, we can go that route - nothing gets to prod outside the promotion window.

@johnbieren
Copy link
Collaborator

Why do we even have the staging SRE cluster if we aren't using it to enable users to use things before they are going to production?

for:

  • e2e tests
  • manual integration testing

right now, we are limited on how we can allow access to the staging SRE cluster from a prod Konflux cluster. To achieve this, you'd have to:

  1. setup a new deployment on the staging SRE cluster that connects to the prod Konflux cluster.
  2. Then once you add that managed workspace to the allow list
  3. the internal services controller would start acting on every single IR coming from the Konflux prod cluster....meaning we start signing everything in double. not feasible or desirable.

? The e2e tests run in staging. They use internal requests. What are you talking about we would need internal services to act on IRs from the prod cluster? I am saying put them on stage.

if you have any ideas, I am all ears.

How do we get to a point where we can make users wait for more than one day to get a new feature they requested?

why one day? you'd have to wait maybe a week if the window is passed.

Today is Tuesday. We promote on Wednesdays

this is not some willy-nilly new untested feature we are trying to get into prod. it was battle-tested on the stage p01 Konflux cluster and has soaked in internal/staging for at least a week.

So we can battle test this code on stage cluster but we can't put this product team on the stage cluster for a short amount of time? I don't understand that

but if you want to generalize, we can go that route - nothing gets to prod outside the promotion window.

I've never been a part of a product that force merges to production without review at 9pm. Sure, maybe it doesn't break anything. What if you made a typo? What if you pulled in an extra commit on accident? It could break just e2e. It could break users. Who knows? But then we will have an extra support load the next day if anything went wrong because users will complain. And making a mistake can certainly happen.. which is the whole point of getting a review in the first place. As is, this breaks our promotion tomorrow as Martin mentioned. So this is a perfect example of why this type of thing shouldn't happen

@mmalina
Copy link
Contributor

mmalina commented Nov 26, 2024

? The e2e tests run in staging. They use internal requests. What are you talking about we would need internal services to act on IRs from the prod cluster? I am saying put them on stage.

I think you're missing one thing. That doesn't work. If you want to use stage app sre cluster for IRs in prod Konflux, that just won't work - stage app sre cluster doesn't listen for IRs in prod Konflux.

But I have a different question - I might have asked before. Is it really not possible to use git resolvers in app sre clusters? Because that would solve our problem - you could then use development branch of catalog including all internal resources.

@johnbieren
Copy link
Collaborator

? The e2e tests run in staging. They use internal requests. What are you talking about we would need internal services to act on IRs from the prod cluster? I am saying put them on stage.

I think you're missing one thing. That doesn't work. If you want to use stage app sre cluster for IRs in prod Konflux, that just won't work - stage app sre cluster doesn't listen for IRs in prod Konflux.

No. I am saying nothing about prod Konflux. I am saying put them on stage. And stage Konflux has IRs, or else our e2e tests would not be ever pass

But I have a different question - I might have asked before. Is it really not possible to use git resolvers in app sre clusters? Because that would solve our problem - you could then use development branch of catalog including all internal resources.

@scoheb
Copy link
Collaborator Author

scoheb commented Nov 26, 2024

I am saying put them on stage.

so that means having the product team onboard to a new cluster, merge a whole bunch of onboarding PRs, re-add all their secrets that were created manually, re-create ALL their builds (since snapshots are pushed to a different user-workloads location on stage clusters) ? only to then re-do all of it 1 week later once the stable internal release catalog is promoted. not a pretty user experience.

Today is Tuesday. We promote on Wednesdays

and I was generalising. This could have gone down Thursday as an example.

I've never been a part of a product that force merges to production without review at 9pm. Sure, maybe it doesn't break anything. What if you made a typo? What if you pulled in an extra commit on accident? It could break just e2e. It could break users. Who knows? But then we will have an extra support load the next day if anything went wrong because users will complain. And making a mistake can certainly happen.. which is the whole point of getting a review in the first place. As is, this breaks our promotion tomorrow as Martin mentioned. So this is a perfect example of why this type of thing shouldn't happen

and I apologized first thing in my comment.

my comments were possibly misconstrued.

I was trying to get at what Martin was saying -- how can we then use development branch of catalog including all internal resources

seems like you use git resolvers now: https://console-openshift-console.apps.rosa.appsrep09ue1.03r5.p3.openshiftapps.com/k8s/ns/openshift-pipelines/configmaps/resolvers-feature-flags/yaml

@johnbieren
Copy link
Collaborator

I am saying put them on stage.

so that means having the product team onboard to a new cluster, merge a whole bunch of onboarding PRs, re-add all their secrets that were created manually, re-create ALL their builds (since snapshots are pushed to a different user-workloads location on stage clusters) ? only to then re-do all of it 1 week later once the stable internal release catalog is promoted. not a pretty user experience.

Well this use case has to be figured out. As far as I have been told, it is my responsibility as tech lead to ensure only good code goes into our repos that does not introduce tech debt and considers future requirements. Force merging things into production goes directly against this IMO

Today is Tuesday. We promote on Wednesdays

and I was generalising. This could have gone down Thursday as an example.

I've never been a part of a product that force merges to production without review at 9pm. Sure, maybe it doesn't break anything. What if you made a typo? What if you pulled in an extra commit on accident? It could break just e2e. It could break users. Who knows? But then we will have an extra support load the next day if anything went wrong because users will complain. And making a mistake can certainly happen.. which is the whole point of getting a review in the first place. As is, this breaks our promotion tomorrow as Martin mentioned. So this is a perfect example of why this type of thing shouldn't happen

and I apologized first thing in my comment.

my comments were possibly misconstrued.

I was trying to get at what Martin was saying -- how can we then use development branch of catalog including all internal resources

seems like you use git resolvers now: https://console-openshift-console.apps.rosa.appsrep09ue1.03r5.p3.openshiftapps.com/k8s/ns/openshift-pipelines/configmaps/resolvers-feature-flags/yaml

@mmalina
Copy link
Contributor

mmalina commented Nov 27, 2024

seems like you use git resolvers now: https://console-openshift-console.apps.rosa.appsrep09ue1.03r5.p3.openshiftapps.com/k8s/ns/openshift-pipelines/configmaps/resolvers-feature-flags/yaml

OK, that looks promising. If we could use git resolvers, then I think that would solve this problem. That way you could use development branch in prod Konflux and prod app sre cluster. @johnbieren should we explore that?

@johnbieren
Copy link
Collaborator

seems like you use git resolvers now: https://console-openshift-console.apps.rosa.appsrep09ue1.03r5.p3.openshiftapps.com/k8s/ns/openshift-pipelines/configmaps/resolvers-feature-flags/yaml

OK, that looks promising. If we could use git resolvers, then I think that would solve this problem. That way you could use development branch in prod Konflux and prod app sre cluster. @johnbieren should we explore that?

We should test using git resolvers. It would be surprising to me if we are allowed to, considering the fight from sre against us using a floating branch to define our tekton resources instead of a specific commit.

@mmalina can you clarify? I am not sure I am following your proposal. Are you saying add the taskGitRepo and taskGitRevision stuff to the internal-requests+internal-pipelines+internal-tasks and call them that way?

@mmalina
Copy link
Contributor

mmalina commented Nov 27, 2024

@mmalina can you clarify? I am not sure I am following your proposal. Are you saying add the taskGitRepo and taskGitRevision stuff to the internal-requests+internal-pipelines+internal-tasks and call them that way?

Yes. This would require a change in the internal-services controller - it would then use git resolver to reference the pipeline when creating the PLR, same as the release-controller. And of course you also need it as pipeline params to use it for task reference as well.

@johnbieren
Copy link
Collaborator

@mmalina can you clarify? I am not sure I am following your proposal. Are you saying add the taskGitRepo and taskGitRevision stuff to the internal-requests+internal-pipelines+internal-tasks and call them that way?

Yes. This would require a change in the internal-services controller - it would then use git resolver to reference the pipeline when creating the PLR, same as the release-controller. And of course you also need it as pipeline params to use it for task reference as well.

I am cool with this approach. I don't see any way to enable the use case Scott wants. Do you want to create the ticket or me? I think it can all be one ticket. It is a lot of things, but all small and easy things

@johnbieren
Copy link
Collaborator

This depends on git resolvers truly working in the app sre clusters of course

@mmalina
Copy link
Contributor

mmalina commented Dec 3, 2024

@mmalina can you clarify? I am not sure I am following your proposal. Are you saying add the taskGitRepo and taskGitRevision stuff to the internal-requests+internal-pipelines+internal-tasks and call them that way?

Yes. This would require a change in the internal-services controller - it would then use git resolver to reference the pipeline when creating the PLR, same as the release-controller. And of course you also need it as pipeline params to use it for task reference as well.

I am cool with this approach. I don't see any way to enable the use case Scott wants. Do you want to create the ticket or me? I think it can all be one ticket. It is a lot of things, but all small and easy things

I created a jira to explore using git resolver in app sre clusters: https://issues.redhat.com/browse/RELEASE-1311

If it works, we can follow up and create more jiras for the actual changes.

@johnbieren
Copy link
Collaborator

@mmalina can you clarify? I am not sure I am following your proposal. Are you saying add the taskGitRepo and taskGitRevision stuff to the internal-requests+internal-pipelines+internal-tasks and call them that way?

Yes. This would require a change in the internal-services controller - it would then use git resolver to reference the pipeline when creating the PLR, same as the release-controller. And of course you also need it as pipeline params to use it for task reference as well.

I am cool with this approach. I don't see any way to enable the use case Scott wants. Do you want to create the ticket or me? I think it can all be one ticket. It is a lot of things, but all small and easy things

I created a jira to explore using git resolver in app sre clusters: https://issues.redhat.com/browse/RELEASE-1311

If it works, we can follow up and create more jiras for the actual changes.

#721 is going to test using git resolvers in the internal clusters actually, as the simple-signing-pipeline uses git resolvers and I am putting that on the internal clusters with that PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants