Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploration: WASM in Contour #4276

Closed
5 tasks
sunjayBhatia opened this issue Jan 18, 2022 · 12 comments
Closed
5 tasks

Exploration: WASM in Contour #4276

sunjayBhatia opened this issue Jan 18, 2022 · 12 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/needs-triage Indicates that an issue needs to be triaged by a project contributor. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@sunjayBhatia
Copy link
Member

sunjayBhatia commented Jan 18, 2022

Users have often requested the ability to provide custom logic in handling (particularly HTTP requests/responses) as part of their ingress configuration. Lua scripting is a common request as it is a common pattern in proxies like nginx and Envoy supports it. We have so far been reticent to support these features because of the burden of validation, support work etc. that may fall on the Contour maintainers. WASM support in Envoy gives us another avenue/opportunity to fulfill similar use cases with a bit more boundaries that can guard against misuse and misconfiguration.

This issue is to track the explorations and work that may be required if Contour is to support user-configured WASM modules. From what we find here, we can possibly come up with a design for implementation or a detailed document why we may not pursue this feature.

In particular we would like to ensure we provide a good user experience, handle failures gracefully, and provide a solution that is operationally smooth.

There are a few things that we believe are prerequisites:

Some implementation questions to solve/investigate

  • How are WASM modules delivered to Envoy?
    • Volume mounted files? etc.?
  • Do we allow precompiled modules?
  • Are there any particular WASM VM options that should be configurable? User code specific options?
  • If a user provides a valid (compiles and is delivered to Envoy) WASM module that has runtime failures, how does the user get notified?

Links:

@sunjayBhatia sunjayBhatia added kind/feature Categorizes issue or PR as related to a new feature. lifecycle/needs-triage Indicates that an issue needs to be triaged by a project contributor. labels Jan 18, 2022
@sunjayBhatia
Copy link
Member Author

cc @skriss if you have anything to change/add since you started investigating

@youngnick
Copy link
Member

I like the outline, I added in a link to #1176, since that's the NACK issue.

@xaleeks
Copy link

xaleeks commented Apr 4, 2022

some other questions that might be worth considering

  • performance impact if every request is now suddenly undergoing WASM extension like regex or manipulating a header or body
  • how to define priority of execution of WASM extensions provided by end users versus existing filter chains that we implement like external auth and rate limiting
  • can we use this to implement some commonly asked features like JWT authentication which is not covered by ongoing work on OIDC auth support

@github-actions
Copy link

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 10, 2022
@PKizzle
Copy link

PKizzle commented Dec 10, 2022

I think this issue is still not solved and custom request handling is one of the bigger missing features in my opinion

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 11, 2022
@izturn
Copy link
Member

izturn commented Jan 3, 2023

I'd like to implement this

@sunjayBhatia
Copy link
Member Author

Before we move toward implementing this we'll need a pretty thorough design doc to address the questions above, and probably a spike to go along with it we can play around given as an example

If you would like to work on this let us know!

@wilsonwu
Copy link
Member

Some summary for WASM support:

  1. How are WASM modules delivered to Envoy?
    a. Use local file path or HTTP to load Wasm plugins for Envoy, and Contour only needs to tell Envoy the address/path.
    b. In Istio's implementation, Wasm is run through containerization using an init container in the same pod with Envoy, and then Envoy can find the corresponding Wasm package for loading. This approach is more resource-intensive compared to the first approach as it requires image building and resource allocation. However, it allows for the management of Wasm using image management techniques, including security and version control. (Maybe not recommended, as it adds an additional layer of complexity.)

  2. Do we allow precompiled modules?
    For Contour, if there are risks, it can be considered to not enable it at the beginning, and enable it later, which is relatively easy.
    Why enabling pre-compilation can increase the occurrence of NACKs.

  3. Are there any particular WASM VM options that should be configurable? User code specific options?
    suggest v8 as engine now
    for root_id and more specific configs, suggestion provide some common default configs first

  4. If a user provides a valid (compiles and is delivered to Envoy) WASM module that has runtime failures, how does the user get notified?
    From contour side: as same as normal NACK or ACK
    export metric from contour
    log from contour
    For runtime envoy side:
    envoy metrics is better
    use envoy log for notification currently

  5. performance impact if every request is now suddenly undergoing WASM extension like regex or manipulating a header or body
    as per talk with an istio expert colleague, Wasm can be lost about 50% performance than envoy native functions.

  6. how to define priority of execution of WASM extensions provided by end users versus existing filter chains that we implement like external auth and rate limiting
    Following Istio's design, add a stage for warm (which we can contribute).

  7. can we use this to implement some commonly asked features like JWT authentication which is not covered by ongoing work on OIDC auth support
    Yes we can do it, it is not blocker.

Below questions related: #1176

  1. For NACK
    Istio does not handle NACKs in a special way.
    Consider implementing a DAG cache to reduce the impact of Envoy being affected by incorrect configuration.

  2. How to handle the initial conditions.
    There is no way to avoid incorrect initial configuration, so it doesn't seem necessary to consider it, based on the existing logic of Contour.

  3. Distribute the correct DAG as configuration to Envoy through caching or file.
    If Contour is restarted and finds that the configuration is incorrect, it needs to deserialize the DAG from the file and distribute it as the correct configuration to Envoy.
    The consistency of the DAG between the cache and the file must be ensured.

@skriss
Copy link
Member

skriss commented Feb 14, 2023

@wilsonwu I'm also interested to hear your thoughts on #5038 as a possible related item here.

@tsaarni
Copy link
Member

tsaarni commented Feb 14, 2023

Great write-up!

There is also some interesting discussion about various extension mechanisms in here envoyproxy/envoy#15152 which led to the addition of golang filter.

One question came to my mind when it comes to invalid configuration: do we see difference when talking about e.g. Lua filters that individual user in a namespace would inject vs having WASM or Go extension that is configured by administrator as part of global config? Is NACK handling equally relevant for the latter case as well, or mostly just for the first case?

BTW: one interesting use case for the extension could be integrating Web Application Firewall (WAF) like coraza, curiefense...

@github-actions
Copy link

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 16, 2023
@github-actions
Copy link

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/needs-triage Indicates that an issue needs to be triaged by a project contributor. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

8 participants