Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Plugins Phase 2 #1378

Closed
estroz opened this issue Feb 20, 2020 · 11 comments
Closed

Feature Request: Plugins Phase 2 #1378

estroz opened this issue Feb 20, 2020 · 11 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Milestone

Comments

@estroz
Copy link
Contributor

estroz commented Feb 20, 2020

Overview

Phase 1 of the proposed scaffold plugin system is being implemented (#1290) at the time of writing. Progress on phase 2, which adds support for chaining plugins together (more below), will begin shortly afterwards, starting with a proposal doc. Before writing that doc, which will contain implementation details for chaining, I'd like to start a general discussion of how chaining will occur.

Phase 2 requirements, summarized from previous discussions
  1. mdbook-like structure
    1. A Plugin receives a JSON blob "universe" from upstream plugins containing file "nodes", each of which contain file metadata.
    2. The plugin can add or delete nodes (nodes are not mutable), then returns the modified universe so downstream plugins can do the same.
  2. A debug function dumps the universe with configurable verbosity and some intelligence as to which node failed (latter is a nice-to-have).
  3. Pre-generate a project file (for complex plugin config use cases).
  4. Enhanced "layout" PROJECT file key.
    1. Either and ordered list of plugin {name}/{version}, or ordered map of {name}/{version} to plugin-specific config.
    2. Order matters for certain plugins, ex. a bazel plugin would need to run after all Go files are generated.

Initial thoughts

Hypothetical CLI setup: a global --plugins flag that takes an ordered list of plugin names. The subcommand invoked with --plugins executes "downstream" plugins from that list that match those the kubebuilder binary knows about.

Given that we want to pass some initial universe between plugins, that universe should be initialized by the plugin invoked via CLI (the "base" plugin). For example, kubebuilder init --plugins addon will invoke an Init plugin, which passes its generated state to the addon plugin, modifying scaffolded files then writing them to disk.

Open questions

Assuming this flow is what we want, a few questions pop up:

  1. For the initial state: at what point do we pass a base plugin's state to downstream plugins?
    1. Approach 1: aggregate files being scaffolded and generated by the base plugin in a universe and pass that universe to a downstream plugin
      • Pro: fully encapsulates all plugin logic within Run().
      • Pro: all files have been scaffolded and post-scaffolding has been run by this point so we don't have to thread a bunch of plugin execution logic through scaffolds, just filepath.Walk the whole directory.
      • Con: requires reading the entire directory after scaffolding and post-scaffolding code runs, since external generators (controller-gen or go mod init) will have modified on-disk state.
    2. Approach 2: pass a universe subset to plugins on each scaffold event (each call to Scaffold.Execute())
      • Pro: plumbing already exists to do this; we'd need to change method signatures of existing plugins, some types etc. In this case, the universe is a model.Universe and a node is a model.File.
      • Pro: the scaffolder controls what is being written to disk, with all the features that the current plumbing provides.
      • Pro: writing a debug function may be easier than in approach 1, since we have more fine-grained control with per-scaffold event plugin execution.
      • Con: model.Universe will have to be modified to guarantee node immutability.
      • Con: A downstream plugin won't receive the full universe, just portions on each scaffold event, and will never "see" post-scaffolded files.
      • Con: each downstream plugin will be run on each scaffolding event, which may cause issues (in the naive implementation) if Run() is not idempotent. A non-naive implementation may require aggregation of scaffold events.
  2. How can we encourage, if not enforce, modification of a universe over just writing files to disk in Run()?
    • Perhaps we could modify Run() to return a list of files to add/delete from a universe rather than having Run() write such logic itself.
  3. Do we want PROJECT to be present in the universe passed between plugins?
    • Probably not, since we gate access via Read() vs. Load().

For 1., the first approach makes the most sense to me given its pros, but reading the full directory after Run() completes feels like a heavy cost to incur for one plugin execution.

More context

Related issues: #1249
Related PR's: #1250, #1290

/kind feature

@estroz estroz added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 20, 2020
@estroz
Copy link
Contributor Author

estroz commented Feb 20, 2020

@Adirio
Copy link
Contributor

Adirio commented Feb 20, 2020

A few comments on model.Universe. I'm working on a couple PRs that end up modifying it a bit. Basically, the file models will be stored in a map with strings keys (their paths) so that they can be accessed without having to iterate over all of them:

// Universe describes the entire state of file generation
type Universe struct {
	// Config stores the project configuration
	Config *config.Config `json:"config,omitempty"`

	// Boilerplate is the copyright comment added at the top of scaffolded files
	Boilerplate string `json:"boilerplate,omitempty"`

	// Resource contains the information of the API that is being scaffolded
	Resource *resource.Resource `json:"resource,omitempty"`

	// Files contains the model of the files that are being scaffolded
	Files map[string]*file.File `json:"files,omitempty"`
}

One of the main goals behind those PRs is to have file update logic inside Scaffold.Execute. This is, updating a file will also be done including it as the list of files in a Execute call. I don't know if you were considering file updates, but they are out-of-execute file modifications like controller-gen or go mod init. If they get included in Execute instead of directly writting the changes to disk, there will be one less out-of-execute file modifications. Could we also follow this approach for post-scaffolding scripts? Instead of actually running the script to generate the file on disk, running it to get the output as a []byte or string inside kubebuilder so that it can be inseted in a model.Universe. This will prevent the whole directory read that you are mentioning.

About passing the configuration file info, I do think that it should be passed in order to have the access to this file centralized. Also, this would allow plugins to update it, kubebuilder could make a diff of the passed and returned configuration to each plugin and decide to accept that change or not. For example, a plugin should probably not be able to update the repo, domain, version or plugins fields, but adding a resource may be allowed to some plugins, and thus we could have some sort of authorithation mechanism for plugins.

@estroz
Copy link
Contributor Author

estroz commented Feb 20, 2020

@Adirio great to hear that you're working on integrating file Update logic with Scaffold.Execute. I was doing something similar to test out some theories for (1ii).

Concerning out-of-execute file generation/updates: if we can figure out a general method of piping bytes from a binary into the Go runtime, sure. AFAIK that isn't possible for 2 reasons:

  1. We don't know what external (arbitrary) generators are being run so we can't control how plugins not implemented by kubebuilder write bytes. We could possibly run generators in a temp dir and read files, although this approach becomes problematic considering point 2 below.
  2. Generators often require some on-disk state to exist first (ex. controller-gen requires an APIs dir to exist) so we'd have to write everything in the universe to disk first, run the generators, then collect output.

Having kubebuilder control what exists in PROJECT after a plugin potentially modifies its structure makes sense. This exposes the config to arbitrary modification (outside of what kubebuilder allows of course), permitting a plugin to have their own section of the config.

@Adirio
Copy link
Contributor

Adirio commented Feb 20, 2020

@Adirio great to hear that you're working on integrating file Update logic with Scaffold.Execute. I was doing something similar to test out some theories for (1ii).

@estroz feel free to take a look at #1375

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 20, 2020
@estroz
Copy link
Contributor Author

estroz commented May 20, 2020

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 20, 2020
@hasbro17
Copy link

So to refresh, the main requirements of plugins phase 2 is to have a true plugin architecture that supports:

  1. Plugins as separate binaries that can be discovered by the Kubebuilder CLI binary via user specified plugin filepaths.
  2. Plugins that can be written in any language.

The existing experimental plugin interface outlines these aims for the future to replace the in-process demo plugin with an external process plugin.

I wanted to propose exploring the use of hasicorp's go-plugin system which allows you to write plugins that communicate over gRPC. This would fulfill our above requirements, plus it's been tested with other CLI tools like terraform, vault etc.
The examples demonstrate some basic plugins for bidirectional communication and how to write a plugin in python.

Go has an official plugins package but as far as I am aware it does not have cross-language support and seems to be intended more for in-tree plugins than for external users to write their own plugin code.

@DirectXMan12
Copy link
Contributor

DirectXMan12 commented Sep 10, 2020

Go's plugin package is a bit of a mess, and basically not useful unless the plugins are built alongside the main process.

The only thing I'm hesitant about for the hashicorp go-plugin library is that it seems like it's built for long-running plugins and not short-lived plugins, but if it works well for that case, that seems fine. The alternative is just passing JSON/proto back and forth across stdin/stdout, and logging on stderr.

@rashmigottipati
Copy link
Contributor

Based on the investigation on plugin libraries such as the built-in Go's plugin package and Hashicorp’s go-plugin library, we have come to the conclusion that it is more suitable to write our own custom typed plugin library.

The built-in plugin library seems to be a non-starter as it is more suitable for in-tree plugins rather than out-of-tree plugins and it doesn’t offer cross language support. Hashicorp’s go-plugin system seems more suitable than built-in go-plugin library as it enables cross language/platform support. However, it is more suited for long running plugins as opposed to short lived plugins and it could be overkill with the usage of protobuf.
For the stated reasons, we would like to write our own plugin system that passes JSON blobs back and forth across stdin/stdout.

The plugin specification will include type metadata to allow the potential for using other plugin libraries in the future if the need arises. From an implementation standpoint, we could create a new implementation of our plugin interface based on the type. We can do a type switch on the type of plugin the data is being passed to. For e.g. if it is native go, then we pass the universe directly & if it is a binary wrapper, then we pass the serialized stream of JSON bytes to it.

@camilamacedo86 camilamacedo86 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Nov 3, 2020
@camilamacedo86 camilamacedo86 added this to the next milestone Nov 3, 2020
@camilamacedo86
Copy link
Member

camilamacedo86 commented Jun 1, 2021

just to share; The RFE #2198 seems more +1 motivation for this use case.

@camilamacedo86
Copy link
Member

It is partial done, and we have a meta issue which is required as follow up #2600
Therefore, I am closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

8 participants