Add support for ephemeral agent (single-use). #2176

hpidcock · 2023-08-08T12:42:05Z

This adds support for ephemeral/single-use agents. Agents given the --ephemeral flag will at most run one pipeline. After running a single pipeline, the agent will taint itself (which for now just disables the agent by marking it as no_schedule).

The purpose for this feature for agents, is to allow potentially privileged pipelines to run on an agent, isolated from other pipelines. Then after the pipeline has run, effectively throw away the agent. It is intended that these agents are provisioned by a seperate system, external to woodpecker. An external system would dynamically provision agents (based on number of waiting pipelines and their labels), run those agents with --ephemeral, wait for them to be tainted, then destroy them.

Also this includes a fix to no_schedule causing the disabled agents to hammer the Next rpc method. Could be a problem in a large cluster with many disabled agents (effectively a self-inflicted ddos on the woodpecker server).

Moving the agent tainting to inside the runner so that the agent is tainted right after it has been assigned a workflow. This ensures the agent is tainted just before the system is affected by the workflow.

When an agent is disabled, the rpc client will not receive an error and will be retried via the outer loop. This allows the client retry logic to step in and retry the rpc call after a delay.

codecov-commenter · 2023-08-12T04:03:42Z

Codecov Report

❗ No coverage uploaded for pull request base (main@cbb1c46). Click here to learn what that means.
Patch has no changes to coverable lines.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2176   +/-   ##
=======================================
  Coverage        ?   40.72%           
=======================================
  Files           ?      182           
  Lines           ?    10899           
  Branches        ?        0           
=======================================
  Hits            ?     4439           
  Misses          ?     6121           
  Partials        ?      339

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Conflicts: - cmd/agent/agent.go

woodpecker-bot · 2023-08-12T07:21:36Z

Deployment of preview was successful: https://woodpecker-ci-woodpecker-pr-2176.surge.sh

agent/runner.go

cmd/agent/agent.go

server/grpc/server.go

qwerty287 · 2023-08-12T12:13:22Z

agent/runner.go

+	// if ephemeral, taint the agent before running any workload.
+	if r.ephemeral {
+		err = r.client.TaintAgent(runnerCtx)
+		if err != nil {
+			return fmt.Errorf("tainting agent: %w", err)
+		}


Isn't is possible to move this to cmd/agent/agent.go (l. 232) too? Or has this some side-effect?
So mainly that the Runner struct does not need the ephemeral field anymore

It is possible to move out, but the ordering is important here.

The agent needs to be tainted after receiving the workflow, just before it runs it.

If it is tainted before, then it will never receive one.

If it is tainted after a workflow, and that workflow causes the agent to restart in some way, it will never be tainted.

I'm open to alternative ways we can achieve this. It could be handled server side instead, but I was trying as much as possible to limit the scope of the changes.

If it is tainted after a workflow, and that workflow causes the agent to restart in some way, it will never be tainted.

How can a workflow restart the agent? Since the number of parallel workflows is 1, it shouldn't take new ones so I don't see an issue with tainting it after the workflow. It's possible that I oversaw something though

This is only probable with a malicious workflow. If there was a workflow using privileged steps or was running on a local backend agent, it could potentially take over the agent before it is "tainted", and pull whatever workflows it wants until it gets some information or run some other nefarious action.

My goal with this work is to enable me to run privileged steps on my agents by making them ephemeral. This is the first step in doing that.

anbraten · 2023-08-13T07:29:37Z

Did you had a look at the woodpecker autoscaler. Its an external tool which is also watching agents and disabling them based on a specific condition. So I guess a similar tool could solve your usecase without needing to adjust the core

hpidcock · 2023-08-15T00:00:44Z

Did you had a look at the woodpecker autoscaler. Its an external tool which is also watching agents and disabling them based on a specific condition. So I guess a similar tool could solve your usecase without needing to adjust the core

I have looked at this tool, it is what I want in terms of the creation/destruction of the agents (I'll need to add AWS+OpenStack support for my needs). But this PR's use case is specifically that I cannot trust an agent after it has started a workflow, and I need it to be able limit itself from running any more workflows.

wez · 2024-04-19T14:07:59Z

Just came here to say: I'm looking for this single-use agent feature, and here's the context for my use case:

Specifically what I'm looking for is:

Run a single workflow, privileged
Reset the machine containing the agent to a known-good snapshot
Repeat

I intend on running this in a local proxmox so that I can apply appropriate isolation (vlan + firewall to prevent access to other networks)

I'm open to other approaches that would enable me to run privileged workflows on my local hardware, with the same kind of isolation mentioned above, and without any state on the agent leaking between workflow runs.

zc-devs · 2024-04-19T21:42:16Z

pipelines to run on an agent

I cannot trust an agent after it has started a workflow, and I need it to be able limit itself from running any more workflows

Seems, you are talking about local backend.

I'm open to other approaches...

It might be

libvirt backend;
kubevirt backend;
or you can try to run Kata Containers with Kubernetes now, maybe it would work with docker backend also.
using Mirantis' virtlet CRI.

theanurin · 2024-06-04T07:03:01Z

I'm interested in the feature too.

My use-case is same to described by @wez.

I'm migrating GitLab -> Woodpecker.
My setup described here https://aljax.us/how-to-setup-gitlab-runners-in-kvm-qemu-virtual-machines/
This feature may cover GitLab's cleanup_exec

I'm happy to help with the PR!
I start from testing this implementation in my env...

6543 · 2024-07-13T21:22:44Z

well please resolve conficts :)

6543 · 2024-07-13T21:24:49Z

through waiting for #3895 might be a good idea ...

EDIT: once my pull got merged I'll finish this one here ... :)

Add support for ephemeral agent (single-use).

99c8b2b

6543 added agent feature add new functionality labels Aug 8, 2023

6543 added this to the 1.1.0 milestone Aug 8, 2023

hpidcock added 3 commits August 12, 2023 13:57

Document WOODPECKER_AGENT_EPHEMERAL

2dfe227

Move agent tainting to runner.

3233427

Moving the agent tainting to inside the runner so that the agent is tainted right after it has been assigned a workflow. This ensures the agent is tainted just before the system is affected by the workflow.

Stop agent from hammering the Next rpc method.

d05b07b

When an agent is disabled, the rpc client will not receive an error and will be retried via the outer loop. This allows the client retry logic to step in and retry the rpc call after a delay.

Merge branch 'main' into ephemeral-agent

ae38952

Conflicts: - cmd/agent/agent.go

hpidcock marked this pull request as ready for review August 12, 2023 04:09

qwerty287 reviewed Aug 12, 2023

View reviewed changes

agent/runner.go Outdated Show resolved Hide resolved

qwerty287 reviewed Aug 12, 2023

View reviewed changes

cmd/agent/agent.go Show resolved Hide resolved

cmd/agent/agent.go Outdated Show resolved Hide resolved

server/grpc/server.go Outdated Show resolved Hide resolved

Improve ephemeral agent handling.

662971c

hpidcock requested a review from qwerty287 August 12, 2023 11:49

qwerty287 reviewed Aug 12, 2023

View reviewed changes

pat-s modified the milestones: 2.0.0, 2.x.x Oct 13, 2023

zc-devs mentioned this pull request Jan 2, 2024

Agents cleaning #3023

Closed

4 tasks

anbraten removed this from the 3.x.x milestone Jan 30, 2024

qwerty287 added this to the 2.6.0 milestone Jun 5, 2024

anbraten modified the milestones: 2.6.0, 2.7.0 Jun 10, 2024

6543 removed this from the 2.7.0 milestone Jul 13, 2024

6543 added this to the 2.8.0 milestone Jul 13, 2024

6543 self-assigned this Jul 13, 2024

6543 removed this from the 2.8.0 milestone Jul 22, 2024

qwerty287 added this to the 3.0.0 milestone Jul 24, 2024

theanurin mentioned this pull request Aug 9, 2024

Run only a single workflow per agent #4019

Draft

6543 self-requested a review August 17, 2024 11:12

pat-s modified the milestones: 3.0.0, 3.x.x Nov 24, 2024

pat-s marked this pull request as draft January 5, 2025 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for ephemeral agent (single-use). #2176

Add support for ephemeral agent (single-use). #2176

hpidcock commented Aug 8, 2023 •

edited

Loading

codecov-commenter commented Aug 12, 2023 •

edited

Loading

woodpecker-bot commented Aug 12, 2023 •

edited

Loading

qwerty287 Aug 12, 2023

hpidcock Aug 13, 2023

qwerty287 Aug 13, 2023

hpidcock Aug 13, 2023

anbraten commented Aug 13, 2023

hpidcock commented Aug 15, 2023

wez commented Apr 19, 2024

zc-devs commented Apr 19, 2024 •

edited

Loading

theanurin commented Jun 4, 2024

6543 commented Jul 13, 2024

6543 commented Jul 13, 2024 •

edited

Loading

Add support for ephemeral agent (single-use). #2176

Are you sure you want to change the base?

Add support for ephemeral agent (single-use). #2176

Conversation

hpidcock commented Aug 8, 2023 • edited Loading

codecov-commenter commented Aug 12, 2023 • edited Loading

Codecov Report

woodpecker-bot commented Aug 12, 2023 • edited Loading

qwerty287 Aug 12, 2023

Choose a reason for hiding this comment

hpidcock Aug 13, 2023

Choose a reason for hiding this comment

qwerty287 Aug 13, 2023

Choose a reason for hiding this comment

hpidcock Aug 13, 2023

Choose a reason for hiding this comment

anbraten commented Aug 13, 2023

hpidcock commented Aug 15, 2023

wez commented Apr 19, 2024

zc-devs commented Apr 19, 2024 • edited Loading

theanurin commented Jun 4, 2024

6543 commented Jul 13, 2024

6543 commented Jul 13, 2024 • edited Loading

hpidcock commented Aug 8, 2023 •

edited

Loading

codecov-commenter commented Aug 12, 2023 •

edited

Loading

woodpecker-bot commented Aug 12, 2023 •

edited

Loading

zc-devs commented Apr 19, 2024 •

edited

Loading

6543 commented Jul 13, 2024 •

edited

Loading