Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Custom Webhooks #19307

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

WIP: Custom Webhooks #19307

wants to merge 10 commits into from

Conversation

jolheiser
Copy link
Member

@jolheiser jolheiser commented Apr 2, 2022

This is a draft of custom webhooks.

It will close #14693 as well, as it includes both HTTP and binary-based webhooks.
This should also resolve #1089, at least as closely as I can think of without a true plugin system.

A configuration looks like this:

Executable

id: executable          # This is used to map the webhook type to this webhook
label: Executable       # The display name of the webhook
docs: https://google.com # The URL to the webhook's documentation (if any)
exec: ["./executable"]  # Command args to run
form:
  - id: username        # Required
    label: Username     # Required, the label for the input
    type: text          # Required, the type of input (text, number, bool, secret)
    required: true      # Optional, defaults to false
  - id: favorite-number
    label: Favorite Number
    type: number
    default: 12        # Optional, a default value
  - id: pineapple
    label: Pineapple on Pizza
    type: bool
  - id: passphrase
    label: Passphrase
    type: secret

Executables are executed relative to the path the custom webhook lives in, which is currently CUSTOM_PATH/webhooks

HTTP

id: http
label: HTTP
docs: https://duckduckgo.com
http: "https://dev.jolheiser.com" # The URL to send the payload to
# Same form config as above

Each custom webhook requires 2-3 files, namely an image.png, a config.yml, and possibly an executable if required.

On startup, Gitea scans the custom webhook directory (again, currently set to CUSTOM_PATH/webhooks) and adds them to a map of webhooks.

The payload sent to either type of webhook (via stdin for executables) is

{
  "form": {...},
  "payload": {...}
}

For executables, any headers are set in the environment as the header name, ie X-Gitea-Signature


Notes:

  • What to do if a custom webhook is ever removed?
    • Do I set all matching ones to inactive, or remove them? I lean towards the former.
  • I want to get this implementation as solid as possible prior to moving forward, so as to not create a fragile custom webhook ecosystem for developers.

@jolheiser jolheiser added type/feature Completely new functionality. Can only be merged if feature freeze is not active. pr/wip This PR is not ready for review type/changelog Adds the changelog for a new Gitea version labels Apr 2, 2022
@jolheiser jolheiser added this to the 1.17.0 milestone Apr 2, 2022
@wxiaoguang

This comment was marked as outdated.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Apr 3, 2022
@jolheiser
Copy link
Member Author

jolheiser commented Apr 3, 2022

And even as now, users can write their own webhook delegators to receive the calls from Gitea by Gitea Webhook and re-deliver them.

Thats essentially what the custom http hooks do.
The config allows custom fields, but past that the payload is essentially a gitea webhook.

While I am in favor of writing bridge software, it seems to have not caught on for one reason or another. Instead we keep adding more and more hard-coded webhooks.

Running external commands was mostly a shim for people who didn't want to run extra services for a bridge.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Apr 3, 2022

I have no objection to introduce a local command mechanism (although I think it's not the solution for cloud native, if there is a chance, we could use some other methods ....).

(update: I misread some context, I only worry about the local command. Indeed HTTP is cool and fine!)

If the local command is a must, then it could be kept as simple as possible. For example, maybe Gitea shouldn't do the form things. Gitea can provide a textarea to let users input their customized yaml config, and use a yaml-schema-validator to validate the config with the pre-defined yaml schema in the webhook definition.

@jolheiser
Copy link
Member Author

This is both local command and http.

As for form, that was how I had imagined letting webhook makers define a limited UI to be passed to their webhook.

Regarding pasting configs, while I suppose that's possible, I was thinking this would more admin controlled rather than user.

Perhaps details should be discussed in Discord or the forum? I'm happy to figure out a good solution, as our webhook code continues to get more complex and messy as we add more hook types and even different hook scopes.

Signed-off-by: jolheiser <john.olheiser@gmail.com>
Signed-off-by: jolheiser <john.olheiser@gmail.com>
Signed-off-by: jolheiser <john.olheiser@gmail.com>
Signed-off-by: jolheiser <john.olheiser@gmail.com>
Signed-off-by: jolheiser <john.olheiser@gmail.com>
@codecov-commenter
Copy link

codecov-commenter commented May 13, 2022

Codecov Report

Merging #19307 (cdec7a6) into main (59b30f0) will decrease coverage by 0.13%.
The diff coverage is 35.66%.

❗ Current head cdec7a6 differs from pull request most recent head b4eda5e. Consider uploading reports for the commit b4eda5e to get more accurate results

@@            Coverage Diff             @@
##             main   #19307      +/-   ##
==========================================
- Coverage   47.39%   47.26%   -0.14%     
==========================================
  Files         956      961       +5     
  Lines      133115   133894     +779     
==========================================
+ Hits        63093    63282     +189     
- Misses      62393    62947     +554     
- Partials     7629     7665      +36     
Impacted Files Coverage Δ
cmd/hook.go 7.11% <0.00%> (ø)
cmd/serv.go 2.33% <0.00%> (ø)
cmd/web.go 0.00% <0.00%> (ø)
cmd/web_acme.go 0.00% <0.00%> (ø)
models/error.go 35.87% <ø> (-0.90%) ⬇️
models/org.go 62.99% <0.00%> (ø)
models/statistic.go 0.00% <0.00%> (ø)
models/webhook/webhook.go 70.00% <ø> (ø)
modules/context/permission.go 26.08% <0.00%> (ø)
modules/doctor/dbconsistency.go 4.54% <0.00%> (-0.52%) ⬇️
... and 130 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cafa2dc...b4eda5e. Read the comment docs.

@lunny
Copy link
Member

lunny commented May 13, 2022

How about to store them into database?

Signed-off-by: jolheiser <john.olheiser@gmail.com>
Signed-off-by: jolheiser <john.olheiser@gmail.com>
@jolheiser
Copy link
Member Author

How about to store them into database?

Store what in the database?

@lunny
Copy link
Member

lunny commented May 14, 2022

How about to store them into database?

Store what in the database?

The custom webhooks yaml content.

@jolheiser
Copy link
Member Author

How about to store them into database?

Store what in the database?

The custom webhooks yaml content.

Why? A custom webhook definition still requires an image and potentially an executable.

Unless we want to get rid of the executable idea, but imo there's benefit to not needing to spin up another server for a webhook translation.

@lunny
Copy link
Member

lunny commented May 14, 2022

How about to store them into database?

Store what in the database?

The custom webhooks yaml content.

Why? A custom webhook definition still requires an image and potentially an executable.

Unless we want to get rid of the executable idea, but imo there's benefit to not needing to spin up another server for a webhook translation.

Images could be svg or an embed one into data or uploaded into some places. And I also think we could use an embed javascript engine(i.e. https://github.com/dop251/goja ) to execute http request but an executable, because I think it's an overkill to run a command to fire a webhook. Or could you give some real cases to have to use a local command?

@luwol03
Copy link

luwol03 commented May 14, 2022

Why not having the web hook specifications in the repo under .gitea? And isn't this like some kind of CI feature?

@lunny
Copy link
Member

lunny commented May 14, 2022

Why not having the web hook specifications in the repo under .gitea? And isn't this like some kind of CI feature?

That would make this exclusively repo level webhook. Also yes, webhooks are more or less a CI feature.

For running a command, mostly the idea was ease of use. Instead of having to call out to a translation service. Embedding a JS engine limits it to only JS. Running a command makes custom webhooks accessible to even the smallest Gitea instances, where you drop a dir in custom and call it a day.

I'm not against making this purely an HTTP translation service, however that basically boils it down to the existing gitea webhook + some custom UI.

If that's the case, I almost think a register API would work, that way there's no DB involved at all.

I think we can implement HTTP only in this PR and discuss local command one in other PRs.

Why I prefer store them into database, because I think puting the custom webhooks in files may make multiple Gitea instances work difficult.

@jolheiser
Copy link
Member Author

jolheiser commented May 14, 2022

So we're putting a yaml config in the db.
The image will be stored in assets somewhere and link back to a webhook via some ID.

Pasting yaml and an image is arguably less friendly than being able to unzip (or git pull) to a directory, but that might depend on the person. Having things on disk is similar to how template customizations work.

Is this going to be a new page on the admin dashboard, then?

@lunny
Copy link
Member

lunny commented May 15, 2022

arguably

Since we have more and more configurations in ini file, I think we should move configurations out of ini file as possible.

I think yes. We could move some webhook settings from ini to database(#18058) and have a configuration UI there including which webhooks should be enabled. There is a button to create a custom webhook type, then we can go to a new page. We can upload an image and have a yaml edit box, there are example and some template variables explanation there and yml syntax highlight?

@jolheiser
Copy link
Member Author

Sounds like a lot of work for the user to gain a few UI boxes imo

Am HTTP-only version means there's very little difference between this and the existing generic gitea webhook. The main advantage is being able to define a limited UI, but if users need to click through pages and mannually copy/paste yaml to gain that UI, is there much advantage?

@lunny lunny modified the milestones: 1.17.0, 1.18.0 Jun 4, 2022
@oliverpool
Copy link
Contributor

Regarding the payload customization part, mentioned in

#19307 (comment)

past that the payload is essentially a gitea webhook.

While I am in favor of writing bridge software, it seems to have not caught on for one reason or another. Instead we keep adding more and more hard-coded webhooks.

and in #19307 (comment)

For running a command, mostly the idea was ease of use. Instead of having to call out to a translation service.
Embedding a JS engine limits it to only JS.
Running a command makes custom webhooks accessible to even the smallest Gitea instances, where you drop a dir in custom and call it a day.

Did you consider a system like bloblang ?
That way the (custom) webhook can easily adjust the payload (adding, removing fields / conditionals...)

root = {}
root.message = if this.commits.length()>1 {
    this.commits.length().string() + "commits"
} else {
    "1 commit"
}

2022-07-07-081914

The URL, method and headers (authorization) have to be configured elsewhere.

There are (at least) 2 security aspects to study before choosing to go this way (with this processor or any other document mapping tool):

  • can it be used as a denial-of-service attack? (by passing far to complex queries or infinite loop)
  • what functions must be enabled/disabled (for instance env, file) - already possible by using the go library

@oliverpool
Copy link
Contributor

Regarding the security aspects, they are only relevant if the processor is accessible by the end-user. If this is limited to the instance owner, it shouldn't be an issue.

From a quick look, the current custom webhooks could be replaced by such a system (it also supports unit-testing ;)

@oliverpool
Copy link
Contributor

I've been playing a bit with this. Here is how it would look like for matrix.Create (html_link, html_to_markdown, ref_end_name and path_escape_segments are custom functions/methods created in newBloblangExecutor):

// Create implements PayloadConvertor Create method
func (m *MatrixPayloadUnsafe) Create(p *api.CreatePayload) (api.Payloader, error) {
	exe, err := newBloblangExecutor(`
let repoLink = html_link(this.repository.html_url, this.repository.full_name)
let refName = this.ref.ref_end_name()

let refURL = this.repository.html_url + match {
	this.ref.has_prefix("refs/heads/") => "/src/branch/"
	this.ref.has_prefix("refs/tags/") => "/src/tag/"
	_ => "/src/commit/"
} + $refName.path_escape_segments()

let refLink = html_link($refURL, $refName)
let text = "[%s:%s] %s created by %s".format($repoLink, $refLink, this.ref_type, this.sender.login)

root.body = $text.html_to_markdown()
root.msgtype = "TODO"
root.format = "org.matrix.custom.html"
root.formatted_body = $text
`)
	if err != nil {
		return nil, err
	}
	return exe.Query(p)
}

Let me know if you think this is worth exploring further. I can make a draft PR to share the whole code.

@jolheiser
Copy link
Member Author

@oliverpool What would be the advantage over JS in this case?

@oliverpool
Copy link
Contributor

I tried to make a comparison here #1089 (comment) (so as to not pollute this pull-request)

Signed-off-by: jolheiser <john.olheiser@gmail.com>
@oliverpool
Copy link
Contributor

@jolheiser looks very good!

I did some local tests and I think jsonnet is a promising choice 👍

I played a bit with your latest commit and added a (basic) test:
https://github.com/jolheiser/gitea/compare/custom-webhook...oliverpool:gitea:custom-webhook?expand=1

Following this experiment, I had the following thoughts:

  • instead of naming this "custom webhook", maybe this could be named "file-based-webhook" (to better reflect, that the configuration is file-based)
  • instead of having an id in config.yml, why not directly take the directory name as id ?
  • the CustomID is currently stored as a dedicated column. Do you think storing it inside the Meta would make sense? (so that the GetCustomPayload only needs `Payloader, EventType, Meta')
  • I didn't see any use of CustomMeta.Secret yet, how do you plan to use it?
  • code.gitea.io/gitea/modules/webhook.Path is a string. Maybe it could be replaced with a fs.FS to be more flexible (os.ReadFile becomes fs.ReadFile(webhook.FS, filename)). This would especially help in testing
  • instead of hardcoding the http URL in the config, it could also be returned by some url.jsonnet (to allow for per-hook customization - like in telegram and matrix -- actually matrix as a url per hooktask 🤯 ):
local form = import 'form.libsonnet';

"http://" + form.host + "/api/post-message"

I am very excited about this, thank for your hard work!

@jolheiser
Copy link
Member Author

jolheiser commented Jul 27, 2022

Hey, thanks for giving my code a test! I hadn't quite finished up or tested my changes. Glad to see I'm on the right track, though!

instead of naming this "custom webhook", maybe this could be named "file-based-webhook" (to better reflect, that the configuration is file-based)

I'm not tied to "custom" as the name, but I think "file-based" might not accurately convey that they are "programmable" in a sense. Perhaps calling them jsonnet webhooks? (Who knows, maybe in the future jsonnet dies out and we need a different transformation lang?)

instead of having an id in config.yml, why not directly take the directory name as id ?

I thought about this, but overall I like that id in a config is more explicit, since that will be what ties everything together. For what it's worth, I think the recommendation should be that people try to namespace their custom webhooks to avoid collisions anyways.

the CustomID is currently stored as a dedicated column. Do you think storing it inside the Meta would make sense? (so that the GetCustomPayload only needs `Payloader, EventType, Meta')

The only reason I had it as a dedicated column was as an admittedly premature optimization in case any queries/migrations/etc needed to happen in the future for specific webhooks.

I didn't see any use of CustomMeta.Secret yet, how do you plan to use it?

Ah, I will need to look a bit further. Secret is used to encode the HMAC signature of the payload, however I don't know that it needs to be in Meta as it should already be added to the webhook table.

code.gitea.io/gitea/modules/webhook.Path is a string. Maybe it could be replaced with a fs.FS to be more flexible (os.ReadFile becomes fs.ReadFile(webhook.FS, filename)). This would especially help in testing

This makes sense, and I think in general the custom webhooks likely need a bit of refactoring anyways. I don't love having a global map, so I may swap them for an interface. That way they can live in-mem, on disk, or however someone else wants to add support in the future.

instead of hardcoding the http URL in the config, it could also be returned by some url.jsonnet (to allow for per-hook customization - like in telegram and matrix -- actually matrix as a url per hooktask 🤯 ):

Ah, yes, this will be removed. It was there because previously this would have been running against a bridge service, but now it is simply transformative.

@lunny lunny modified the milestones: 1.18.0, 1.19.0 Oct 17, 2022
lafriks pushed a commit that referenced this pull request Nov 3, 2022
_This is a different approach to #20267, I took the liberty of adapting
some parts, see below_

## Context

In some cases, a weebhook endpoint requires some kind of authentication.
The usual way is by sending a static `Authorization` header, with a
given token. For instance:

- Matrix expects a `Bearer <token>` (already implemented, by storing the
header cleartext in the metadata - which is buggy on retry #19872)
- TeamCity #18667
- Gitea instances #20267
- SourceHut https://man.sr.ht/graphql.md#authentication-strategies (this
is my actual personal need :)

## Proposed solution

Add a dedicated encrypt column to the webhook table (instead of storing
it as meta as proposed in #20267), so that it gets available for all
present and future hook types (especially the custom ones #19307).

This would also solve the buggy matrix retry #19872.

As a first step, I would recommend focusing on the backend logic and
improve the frontend at a later stage. For now the UI is a simple
`Authorization` field (which could be later customized with `Bearer` and
`Basic` switches):


![2022-08-23-142911](https://user-images.githubusercontent.com/3864879/186162483-5b721504-eef5-4932-812e-eb96a68494cc.png)

The header name is hard-coded, since I couldn't fine any usecase
justifying otherwise.

## Questions

- What do you think of this approach? @justusbunsi @Gusted @silverwind 
- ~~How are the migrations generated? Do I have to manually create a new
file, or is there a command for that?~~
- ~~I started adding it to the API: should I complete it or should I
drop it? (I don't know how much the API is actually used)~~

## Done as well:

- add a migration for the existing matrix webhooks and remove the
`Authorization` logic there


_Closes #19872_

Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
Co-authored-by: Gusted <williamzijl7@hotmail.com>
Co-authored-by: delvh <dev.lh@web.de>
@lunny lunny modified the milestones: 1.19.0, 1.20.0 Jan 31, 2023
@delvh delvh removed this from the 1.20.0 milestone Jun 4, 2023
@delvh delvh removed the type/changelog Adds the changelog for a new Gitea version label Oct 7, 2023
6543 pushed a commit that referenced this pull request Mar 7, 2024
Refactor the webhook logic, to have the type-dependent processing happen
only in one place.

---

## Current webhook flow

1. An event happens
2. It is pre-processed (depending on the webhook type) and its body is
added to a task queue
3. When the task is processed, some more logic (depending on the webhook
type as well) is applied to make an HTTP request

This means that webhook-type dependant logic is needed in step 2 and 3.
This is cumbersome and brittle to maintain.

Updated webhook flow with this PR:
1. An event happens
2. It is stored as-is and added to a task queue
3. When the task is processed, the event is processed (depending on the
webhook type) to make an HTTP request

So the only webhook-type dependent logic happens in one place (step 3)
which should be much more robust.

## Consequences of the refactor

- the raw event must be stored in the hooktask (until now, the
pre-processed body was stored)
- to ensure that previous hooktasks are correctly sent, a
`payload_version` is added (version 1: the body has already been
pre-process / version 2: the body is the raw event)

So future webhook additions will only have to deal with creating an
http.Request based on the raw event (no need to adjust the code in
multiple places, like currently).

Moreover since this processing happens when fetching from the task
queue, it ensures that the queuing of new events (upon a `git push` for
instance) does not get slowed down by a slow webhook.

As a concrete example, the PR #19307 for custom webhooks, should be
substantially smaller:
- no need to change `services/webhook/deliver.go` 
- minimal change in `services/webhook/webhook.go` (add the new webhook
to the map)
- no need to change all the individual webhook files (since with this
refactor the `*webhook_model.Webhook` is provided as argument)
@lunny
Copy link
Member

lunny commented Apr 4, 2024

Since #29145 merged, this PR can merge and do some refactors.

@lunny lunny mentioned this pull request May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. pr/wip This PR is not ready for review type/feature Completely new functionality. Can only be merged if feature freeze is not active.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Webhook External Command Custom Webhook Templates
9 participants