Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rework gangplank and pipelines #2860

Closed
dustymabe opened this issue May 16, 2022 · 22 comments
Closed

rework gangplank and pipelines #2860

dustymabe opened this issue May 16, 2022 · 22 comments
Labels
jira for syncing to jira

Comments

@dustymabe
Copy link
Member

Currently the only known "productized" use of gangplank that I know of is in our multi-arch builds for Fedora CoreOS. Occasionally we have issues that need investigating and we currently have no one on the team that really knows the code base well. Since we're only using one of the most basic features of gangplank and we don't have maintainers or even semi-experts on the code base I suggest we supplant the use gangplank with just using podman --remote directly for our uses.

I've done some basic testing and I think we can get everything we want from podman --remote. Thought I would test the waters here to see what other people think of the direction. WDYT?

@miabbott
Copy link
Member

@sosiouxme I believe you had hopes for gangplank for orchestrating multi-arch builds

@cgwalters
Copy link
Member

The way I think of Gankplank is ~three things:

  1. Basically implementing podman --remote again
  2. Better supporting splitting up build components into separate kube pods
  3. Trying to move the codebase more towards Go (I filed that separately as Make programming language default to Go #2821 )

I agree with you that for what the FCOS pipeline is doing now, directly using podman --remote seems much better.

Now of the above I think 2. is still relevant. However...there's always been a huge tension between that and e.g. Prow which also wants to be the thing that spawns pods. @cheesesashimi has done some work today on using Prow to shard test pods for example. The Jenkins pipelines also spawn pods.

It may be that this aspect of gangplank should remain.

@travier
Copy link
Member

travier commented May 16, 2022

Now that we are on OCP 4.x for all clusters, we might also want to evaluate using Pipelines / Tekton for our CI:

@mike-nguyen
Copy link
Member

Now that we are on OCP 4.x for all clusters, we might also want to evaluate using Pipelines / Tekton for our CI:

* https://docs.openshift.com/container-platform/4.10/cicd/pipelines/understanding-openshift-pipelines.html

* https://tekton.dev/

Our multi-arch pipelines are still OCP 3.x

@ravanelli
Copy link
Member

A huge amount of effort has ben done in the gangplank implementation, it feels like sad to drop it entirely.

I tried to summarize where we are today here. Nonetheless, I agree with @dustymabe, we still need more improvements in order to use it, specially in the cases where the builds finish as success and we can see errors (kola tests and others) + more log infos that are not fully returned. For the fcos case I don't see any issue in replacing it with podman remote, if we can't fix these issues right away.
But for sure, if we still want to keep/use gangplank around, more work needs to be done, and more knowledge about the code needs to be learned.

@cverna
Copy link
Member

cverna commented May 17, 2022

I think a good question to answer is: Is this going to be the best use of our time ? Based on the previous replies I would lean on replying No to that question. While gangplank is a neat idea, I am not 100% sold that it fits our team's mission, we can only focus on a limited set of problems to solve and moving away from gangplank is giving us an opportunity to refocus elsewhere.

@sosiouxme
Copy link
Contributor

sosiouxme commented May 17, 2022

The appeal of gangplank for ART is to enable more customization at the job spec level (something declarative, more flexible than job parameters, but not as low-level as COSA). For at least three use cases:

  1. regular old production builds (with or without bootimages)
  2. embargoed builds (send the output to private storage)
  3. hotfixes (CoreOS builds with tweaks targeted to a single customer)

Exploration stalled again because we're easily distracted, but we still have a business need to do all these things. If not gangplank, then what? I don't know anything about the podman remote approach you mention. Of course as much as possible we want to be using well-supported tooling. If ART became the only user of gangplank that's probably not tenable.

@cgwalters
Copy link
Member

hotfixes (CoreOS builds with tweaks targeted to a single customer)

Doesn't ocp coreos layering generally obsolete this, particularly once we support e.g. kernel overrides?

@dustymabe
Copy link
Member Author

We are now no longer running gangplank tests in CI (#2895). In my opinion we can move forward with exploring an alternative for our specific needs and eventually remove gangplank once that is proven out.

Please speak up if you disagree.

@dustymabe dustymabe added the jira for syncing to jira label Jun 6, 2022
jlebon added a commit to jlebon/coreos-assembler that referenced this issue Jun 22, 2022
This directory included Dockerfiles and manifests for running cosa as an
OpenShift build via gangplank. We're not using that and are planning to
phase out gangplank, so let's remove these files.

Part of coreos#2860.
cgwalters pushed a commit that referenced this issue Jun 22, 2022
This directory included Dockerfiles and manifests for running cosa as an
OpenShift build via gangplank. We're not using that and are planning to
phase out gangplank, so let's remove these files.

Part of #2860.
@darkmuggle
Copy link
Contributor

It become clear to me that Gangplank never meet the vision and there was resistance to the idea anyway. It was a fun project, but ultimately if it is not maintained and not useful, removing it seems to be the sensible path.

@cgwalters cgwalters changed the title potentially remove gangplank rework gangplank and pipelines Jun 30, 2022
@cgwalters
Copy link
Member

I don't think of this as removing gangplank, as I said above for example I think pushing the build system to be Go makes total sense and as part of #2919 I was reading through and looking at the gangplank code there. Conceptually that sub-thread is like "move parts of gangplank code into cosa toplevel", not get rid of it!

Also I definitely like the opinionated flow for generating kube pods. I think we should try to elevate that part of gangplank too to the toplevel. A tricky part has been intersecting that with Prow...what we ended up with today (raw cosa, not using gangplank) for this is honestly very hacky and it would be much more elegant with a gangplank-like approach.

This issue started with more "let's use podman --remote directly" which is a bit of what gangplank is doing, but not all of it. So...I took the liberty of just retitling this issue to reflect that.

@cgwalters
Copy link
Member

strawman proposal:

This would make it easier for us to basically incrementally merge the two things (cosa and gangplank) that are today somewhat artificially distinct. (I also think that writing new code doing container stuff in Python is the wrong move, and this helps counter that)

@dustymabe
Copy link
Member Author

dustymabe commented Jul 7, 2022

It become clear to me that Gangplank never meet the vision and there was resistance to the idea anyway. It was a fun project, but ultimately if it is not maintained and not useful, removing it seems to be the sensible path.

I don't think the problem was that Gangplank didn't "meet the vision". The problem is more that (at least for FCOS) we were never able to gain access to non x86_64 kube/openshift clusters so we were never able to take advantage of the features of gangplank. We have been using the gangplank "podman remote" mode for multi-arch builds, but that mode is less valuable than the other modes of gangplank and we can use podman remote direclty.

@cgwalters
Copy link
Member

Yeah, I understand. But OTOH, while apparently Fedora Infra only has x86_64, RHEL/RHCOS does have OCP clusters with all target architectures. I hope that the Fedora Infra thing is going to be fixed (as we've discussed outside of this issue) - there can't be any blockers to using OCP on aarch64 in e.g. AWS for Fedora right?

Anyways...I think we're agreeing, or at least not disagreeing. It makes sense to use podman --remote directly. But...I am not so sure it makes sense to write nontrivial wrappers around that in Python versus Go, and specifically making use of the Go connections to the container ecosystem that we already have in gangplank code.

@travier
Copy link
Member

travier commented Jul 11, 2022

We won't be able to have OCP clusters for all architectures for RHCOS in the future AFAIK, only individual nodes.

@cgwalters
Copy link
Member

We won't be able to have OCP clusters for all architectures for RHCOS in the future AFAIK, only individual nodes.

I can't imagine that's true; I mean, simply to even test what we're shipping we need to run clusters...

@cgwalters
Copy link
Member

OK new tweaked proposal; after #2979 lands and fedora-coreos-pipeline switches over to it vs gangplank, we:

@dustymabe
Copy link
Member Author

I'm OK with leaving the code around, but I'd really like to not get alerts (i.e. dependabot) about things that need updating for it if we're not using it. If we can get away from that then I'm happy.

@cgwalters
Copy link
Member

Yep; by moving the vendor directory to the toplevel, we will drop out everything that's not used by cosa/mantle. (And then, hopefully shortly after, make mantle into a Go library too so we get down to one vendor directory)

@dustymabe
Copy link
Member Author

Yep; by moving the vendor directory to the toplevel, we will drop out everything that's not used by cosa/mantle.

I think what you mean by that is:

  • move the mantle/vendor directory to the toplevel (merging with the now existing toplevel vendor dir)
  • delete the gangplank/vendor directory and remove steps to build gangplank

Is that right?

@dustymabe
Copy link
Member Author

Some work on this front to remove our use of gangplank in the FCOS pipeline:

Once that second one merges we'll no longer be using gangplank for FCOS and I don't think there are any other current uses.

cgwalters added a commit to cgwalters/coreos-assembler that referenced this issue Jul 20, 2022
Following the plan in coreos#2860 (comment)

This will help us move the Go code to the toplevel; we're going from
*three* vendor/ directories to two.  Then when we merge the mantle/
bits in we'll go down to the sanity of one vendor/ tree.

To emphasize: There's a lot of good ideas in gangplank, we're going
to be pulling *some* of this code for sure, so keep the code around
to make that explicitly easier.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this issue Jul 20, 2022
Following the plan in coreos#2860 (comment)

This will help us move the Go code to the toplevel; we're going from
*three* vendor/ directories to two.  Then when we merge the mantle/
bits in we'll go down to the sanity of one vendor/ tree.

To emphasize: There's a lot of good ideas in gangplank, we're going
to be pulling *some* of this code for sure, so keep the code around
to make that explicitly easier.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this issue Jul 21, 2022
Following the plan in coreos#2860 (comment)

This will help us move the Go code to the toplevel; we're going from
*three* vendor/ directories to two.  Then when we merge the mantle/
bits in we'll go down to the sanity of one vendor/ tree.

To emphasize: There's a lot of good ideas in gangplank, we're going
to be pulling *some* of this code for sure, so keep the code around
to make that explicitly easier.
dustymabe pushed a commit that referenced this issue Jul 21, 2022
Following the plan in #2860 (comment)

This will help us move the Go code to the toplevel; we're going from
*three* vendor/ directories to two.  Then when we merge the mantle/
bits in we'll go down to the sanity of one vendor/ tree.

To emphasize: There's a lot of good ideas in gangplank, we're going
to be pulling *some* of this code for sure, so keep the code around
to make that explicitly easier.
@dustymabe
Copy link
Member Author

With #2979 and coreos/fedora-coreos-pipeline#567 our pipelines have now been converted to using podman --remote directly for multi-arch builds.

As of #3001 gangplank is no longer used or built, but the code is still in the code base for future reference/use.

We got rid of the vendor directory so all vendored deps have also been dropped.

I'm thinking this can be closed now.

jlebon added a commit to jlebon/coreos-ci-lib that referenced this issue Oct 17, 2022
We've moved away from Gangplank for now in favour of `podman remote`:
coreos/coreos-assembler#2860
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira
Projects
None yet
Development

No branches or pull requests

9 participants