Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hybrid operators #670

Closed
hasbro17 opened this issue Oct 30, 2018 · 18 comments
Closed

Support hybrid operators #670

hasbro17 opened this issue Oct 30, 2018 · 18 comments
Labels
design kind/feature Categorizes issue or PR as related to a new feature. language/ansible Issue is related to an Ansible operator project language/helm Issue is related to a Helm operator project lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs discussion
Milestone

Comments

@hasbro17
Copy link
Contributor

hasbro17 commented Oct 30, 2018

EDIT(@estroz): this issue originally addressed a bug where users could create apis, controllers, and k8s deepcopy code in non-Go projects. Since significant discussion related to hybrid operators is present here, I changed the title and am keeping this issue open to further track hybrid operator discussion. #672 fixed the original bug.


Original issue text:

The following SDK commands are only relevant to a Go type project and should not run in another project type like Ansible(or Helm in the future).

  • operator-sdk add api ...
  • operator-sdk add controller ...
  • operator-sdk generate k8s
@hasbro17 hasbro17 added the kind/bug Categorizes issue or PR as related to a bug. label Oct 30, 2018
@estroz estroz self-assigned this Oct 30, 2018
@hasbro17 hasbro17 added needs discussion language/ansible Issue is related to an Ansible operator project labels Oct 30, 2018
@hasbro17
Copy link
Contributor Author

We need to consider the use case of transitioning from a pure Ansible type project to a hybrid or Go type project before we restrict these commands from running in an ansible type project.

@shawn-hurley
Copy link
Member

I was thinking that the steps for transition are:

  1. In Go path.
  2. drop cmd/manager/main.go on disk (this will need to be the correct main.go though maybe there is enough overlap between helm and ansible here that we can make this really easy and only 1 file?).
  3. add/change dockerfile to incorporate the binary
  4. dependencies for golang.

All of these are steps I could document today and we would have a transition path, before or after operator-sdk add api is called. Maybe it makes sense that we have to do these steps before you can call these? I am torn here because I could see myself creating the API and then doing the main file bits etc..., but on another hand sometimes taking away options just leads to fewer bugs.

I think either way the above commands must be in the gopath is that correct?

@joelanford
Copy link
Member

  1. drop cmd/manager/main.go on disk (this will need to be the correct main.go though maybe there is enough overlap between helm and ansible here that we can make this really easy and only 1 file?).

@shawn-hurley I think the only differences would be loading the watches.yaml file and calling Add() with the correct controller. If those aren't the only differences, I would think we could get there.

Also, do we need to consider transitions between or hybrid combinations of arbitrary types (e.g. Ansible to Helm, or hybrid Ansible/Helm)?

I think this discussion raises the question of what the strategy for the CLI design should be. The current design holds up well when there's only one operator type, but now that we'll have three, each adding new subcommands and/or implementing existing subcommands differently, some issues arise:

  • As a user, how do I know which subcommands make sense for my project?
  • If we support hybrid operators, what should the SDK do when more than one of the operator types is present in the project and a subcommand is executed that has different implementations for different operator types? Maybe we detect that multiple types exist and fail telling the user that --type is required for hybrid operators?

Is it possible to invert the precedence of operator type and subcommand in the CLI so that the operator type drives what subcommands are available and makes the subcommand execution explicit?

@estroz
Copy link
Member

estroz commented Nov 1, 2018

A combination of good transition/hybrid documentation and strict checks/plenty of warnings on CLI usage should be enough.

Instead of thinking of all the ways a user can combine operator types, we should be opinionated about how they should be combined, and to what extent combination is allowable, if at all. For example, what parts of an Ansible project can be customized with Go, and what cannot?

If we want to allow hybrid projects, we need to reconsider how OperatorType is used since type checks are currently exclusive. I suggest we implement a bitfield.

@estroz estroz changed the title SDK commands for Go operators should only run in Go type projects Support hybrid operators Nov 7, 2018
@estroz estroz added kind/feature Categorizes issue or PR as related to a new feature. design and removed kind/bug Categorizes issue or PR as related to a bug. labels Nov 7, 2018
@hasbro17
Copy link
Contributor Author

ref: #860
With #887 and #897 the operator-sdk now has a migrate command to transition Helm and Ansible operator projects to hybrid projects.

@joelanford and @mhrivnak I don't think we have docs or extra sections in the ansible and helm user-guides showing an example of migrating to a hybrid project. We have the CLI reference but I think an example would be beneficial.

Is there anything else that we would need with regards to hybrid operators before we close out this issue? We can follow up on the docs in separate issues.

@mhrivnak
Copy link
Member

On the ansible side (haven't looked closely at the helm side, but might be similar) we have a working migrate command that might be useful. But we have not put any thought yet into specifically how we'll enable users to add their own logic. We don't have a good story yet for: "ok, I have a main.go file. What now?"

What we do have lays the groundwork for improving the experience, and it was important to do in concert with the other refactoring around where and how we're building the ansible/helm operator binary and base image. But it might not yet be useful enough to make noise about, which is why I haven't rushed to write docs about it.

The next step would be to spend a little time designing what we want to expose in main.go that would be useful to a user who wants to add some go code to their ansible or helm operator. How that relates to this github issue depends on how you prefer to track it. I'm happy if you want to close this as done and follow up with a separate design effort, o if yo'd rather keep this open until it's fully baked.

@hasbro17
Copy link
Contributor Author

Agreed. We should spend some time on the overall user story around hybrid operators.
I'll keep this issue open until we have something more concrete on that front.

@devnulled
Copy link

Hi all. I'm an experienced programmer and also experienced with Kubernetes, but operators and golang are both new to me.

I'm in the process of developing a hybrid operator. I'm doing this by generating a Helm operator and then using the migrate command as I want the operator to depend on Helm templates for installs/uninstalls but write my own logic for anything outside of that.

I can keep a running log of things I run/have ran into in this process and then provide some feedback in this ticket if you'd like? I think I'm mostly just running into small things here and there that you probably wouldn't even notice or seem obvious to work around being that you are experienced with developing the operator-sdk and golang itself if that makes sense?

One of the small problems I'm currently running into I'll probably end-up submitting a patch for just so I can keep moving forward.

@joelanford
Copy link
Member

@devnulled This is great timing actually. Yesterday, I opened #1186 for discussion about how to prepare the Helm and Ansible operator code for a v1.0.0 release of the SDK. The primary discussion point for that will be how we support hybrid operators.

We will likely be unexporting some of our existing packages/types/functions so that we have more flexibility to implement new features without dealing with backwards compatibility guarantees. However, we want to make sure we leave enough of the operator internals exposed to make hybrid operators useful.

I'd definitely be interested in understanding more about your hybrid operator and the issues you've been running into. Definitely don't hesitate to comment here with your issues. Even if they seem minor to you, it's likely others have run into similar things. The more we know about how people are using hybrid operators, the better we'll be able to support them.

@joelanford joelanford added the language/helm Issue is related to a Helm operator project label Mar 8, 2019
@devnulled
Copy link

I think whatever problems I was having were somehow environment related, or maybe my dep cache being corrupted, not sure. I eventually ended-up clearing out dep, updating the operator-sdk again, regenerating a project, and then converting it to a hybrid one again and it worked just fine. I wanted to make sure I wasn't doing something wrong before commenting here again and glad I did. So the issues I had were self-inflicted.

I can give you some feedback on how I conceptually plan to use them if that's helpful? I know you are looking more for things about the API should be itself, but I'm just not that far along with building one yet at the moment.

Here's my own use case for why I prefer (at least conceptually) to create hybrid operators based on Helm operators:

As I see it, Helm seems to be the defacto package manager for Kubernetes at this point. Along those lines, I'd rather depend on community-developed Helm packages for software when possible and be able to contribute back to them as well (which I've done a couple of times) vs building the YAML by hand in a standard operator. I like that Helm basically handles all of the templating of YAML and has its own tools for debugging charts/YAML.

When you build an operator by hand with operator SDK, you end up with a whole bunch of statically compiled code that essentially represents configuration. In my opinion, that is a pretty hard thing to try and debug, and something that probably changes quite a bit until a given piece of software has matured. Also, when the Kubernetes API is updated, it seems like it would be a big pain point to have to update all that static code. I'm just especially not a fan of having what are essentially configuration templates as generated and staticly compiled code.

Depending on Helm templates allows you to still have the option to deploy a service manually or via scripts if needed rather than having to debug an operator in real time if you have some sort of fire in production. You can also depend on other helm charts as well; An example would be Kafka depending on Zookeeper. I mentioned it before, but you have the facilities of Helm available to debug the charts/YAML. That allows you to leave your operator to focus on functionality that is specific to your service.

So to me, an operator that uses a Helm chart to manage config/versioning/installs of services and is left mostly to bake stuff on top of that to monitor/fix/tune/maintain/ and scale up/down said service is a big win.

I'm in the process of building an operator around automatically managing a database platform to deal with the problems running them in the cloud presents. IE, monitoring for outages or performance problems and taking action.

The main use case I'll be tackling first is to automate the work required to fix pods in statefulsets if a node gets terminated and replaced with another one (the pod won't get rescheduled by k8s because of the way it works with statefulsets). So having to manually kill a pod in a statefulset if a node dies, do something with it's PVC, start up a new pod, any process that needs to run to move data again, deal with mounting local storage, join a cluster again, etc.

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 21, 2019
@lilic
Copy link
Member

lilic commented Jun 21, 2019

/remove-lifecycle stale

Still relevant.

@openshift-ci-robot openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 21, 2019
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 19, 2019
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 19, 2019
@hasbro17
Copy link
Contributor Author

/lifecycle frozen

@openshift-ci-robot openshift-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Oct 20, 2019
@estroz estroz removed their assignment Jan 29, 2020
@estroz estroz added this to the v1.0.0 milestone Mar 2, 2020
@huizengaJoe
Copy link

@devnulled your comment
#670 (comment)
really resonates, I know its an old post but wondering if you had pursued your hybrid approach and might have some more insight

@devnulled
Copy link

@huizengaJoe I'm no longer working at the company where I was working on that solution, and it's been some time since I was there. Something I will likely end-up digging back into within the next year or so, I think.

@varshaprasad96
Copy link
Member

Hybrid operators are being supported currently with SDK. The current hybrid model involves both Go and Helm APIs to be scaffolded together: https://github.com/operator-framework/helm-operator-plugins/tree/main. Closing this issue for now. If there is any follow up needed, we can pick it up later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design kind/feature Categorizes issue or PR as related to a new feature. language/ansible Issue is related to an Ansible operator project language/helm Issue is related to a Helm operator project lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs discussion
Projects
None yet
Development

No branches or pull requests