Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add --ephemeral mode for persistent worker #813

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

0xB10C
Copy link

@0xB10C 0xB10C commented Nov 26, 2024

When passing --ephemeral to cirrus worker run, the worker will accept one task and then exit the process once the task completed.

This can be used inside e.g. ephemeral VMs which should be shutdown after each task. A user has to take care of cleaning up after the worker has finished.

resolves #809

@0xB10C
Copy link
Author

0xB10C commented Nov 26, 2024

This still needs a bit of work on the backend side, I think. Ideally, the backend would understand that the worker is ephemeral: the worker should tell the backend in some proto message, possibly WorkerInfo. The backend should clean up ephemeral workers faster than normal persistent workers.

When I run ephemeral workers with unique names, my worker pool grows quite fast with mostly inactive, old workers still present. When I reuse the same name, scheduling tasks after the VM comes back online sometimes doesn't work (the backend probably thinks it already assigned a task).

Would love your input on this @fkorotkov.

When passing `--ephemeral` to `cirrus worker run`, the worker will
accept one task and then exit the process once the task completed.

This can be used inside ephemeral VMs which should be shutdown after
each task. A user has to take care of cleaning up after the worker
has finished.
@fkorotkov
Copy link
Contributor

Seem we'll need an option for the worker to unregister itself similar to cirrus worker pause command. Let us look into this next week after Thanksgiving.

@0xB10C
Copy link
Author

0xB10C commented Dec 2, 2024

Seem we'll need an option for the worker to unregister itself similar to cirrus worker pause command. Let us look into this next week after Thanksgiving.

Sounds good! I remember trying to use SetDisabled, but that turned out to only pause the worker and unpausing didn't fix the scheduling problems I had.

@0xB10C
Copy link
Author

0xB10C commented Dec 5, 2024

I've observed some of my --ephemeral runners not getting a task and log level=error msg="failed to poll upstream https://grpc.cirrus-ci.com:443: rpc error: code = AlreadyExists desc = entity already exists: app: \"s~cirrus-ci-production\"\npath <\n Element {\n type: \"PersistentWorkerTaskAssignment\"\n id: 0x1226f1681c0000\n }\n>\n"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

persistent worker: exit after a worker ran a single task
2 participants