Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update timer logic #568

Open
pmengelbert opened this issue Dec 29, 2022 · 0 comments
Open

Update timer logic #568

pmengelbert opened this issue Dec 29, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@pmengelbert
Copy link
Contributor

Describe the solution you'd like
Currently, the collector/scanner are triggered immediately upon startup. We should be able to configure this behavior: either it triggers immediately or triggers after the delay so it can run on a timer.

The timer mechanism is presently somewhat brittle and intolerant to faults. After the first job completes, subsequent jobs are re-enqueued after a delay, calculated based on the user-provided value. The delay begins when the job completes.

Unclear is what should happen when a job fails; the current logic is complex and somewhat confusing. We essentially have our jobs running "in series", meaning if one fails, another will not be triggered.

The logic will be greatly simplified by allowing the user to specify a schedule rather than a delay. For example, "run every day at 12am" instead of "run 86400 seconds after the last job completes". It's easier to reason with and easier to implement. If there is already an ImageJob with a Running status, report it via the logs and prevent the scheduling of a new job; i.e. skip this one and wait for the next one.

Anything else you would like to add:
This change will also simplify several of the end-to-end tests. We have several tests which have to either wait for collector jobs to complete, or remove them manually. If the immediate triggering of those jobs can be disabled, it will make the tests a lot less flakey.

Environment:

  • Eraser version: v1.0.0-beta.2
@pmengelbert pmengelbert added the enhancement New feature or request label Dec 29, 2022
pmengelbert added a commit to pmengelbert/eraser that referenced this issue Dec 29, 2022
This code should be reverted once the timing logic has been sorted out.
See issue eraser-dev#568 for details.

Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>
pmengelbert added a commit to pmengelbert/eraser that referenced this issue Dec 29, 2022
This code should be reverted once the timing logic has been sorted out.
See issue eraser-dev#568 for details.

Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>
pmengelbert added a commit to pmengelbert/eraser that referenced this issue Dec 30, 2022
This code should be reverted once the timing logic has been sorted out.
See issue eraser-dev#568 for details.

Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>
pmengelbert added a commit to pmengelbert/eraser that referenced this issue Dec 30, 2022
This code should be reverted once the timing logic has been sorted out.
See issue eraser-dev#568 for details.

Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>
pmengelbert added a commit to pmengelbert/eraser that referenced this issue Dec 30, 2022
This code should be reverted once the timing logic has been sorted out.
See issue eraser-dev#568 for details.

Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>
pmengelbert added a commit to pmengelbert/eraser that referenced this issue Dec 30, 2022
This code should be reverted once the timing logic has been sorted out.
See issue eraser-dev#568 for details.

Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>
pmengelbert added a commit that referenced this issue Jan 5, 2023
* Use PodTemplates to support ImageJobs

This PR was caused by limits on the size of k8s API requests. By
default, `kubectl apply` will add a field, `metadata.annotations.kubectl."kubernetes.io/last-applied-configuration"`,
which is a JSON representation of the object. This effectively increases
the size of the request.

We ran into size limits when adding a `v1` of the eraser API. Because we
want to support a multi-version API, the request to create the CRD for
imagejobs was essentially doubled by the addition of the new version.

The component of the CRD that took up the most space was the
`ImageJob.Spec` field, a `PodTemplateSpec` type. The generated YAML for
this type is extremely long, with a lot of documentation. Being a core
kubernetes resource, a `PodTemplateSpec` is very long and complex.

This commit leverages the kubernetes resource `PodTemplate` in order to
remove the `ImageJob.Spec` field entirely. Instead of using the
`ImageJob` to specify configuration of the `eraser` or `collector` pods,
that responsibility is offloaded to a `PodTemplate`.

There is a 1:1 ratio between imagejobs and podtemplates. A given
imagejob is expected to have a corresponding podtemplate with the same
name, in the same namespace as the controller-manager pod. If no such
podtemplate exists, an error is returned and the reconciliation loop
spins until the podtemplate is found.

The OwnerReference of the podtemplate is set to its corresponding
imagejob. When the imagejob is deleted, the podtemplate will be cleaned
up as well.

Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>

* Address failure in imagelist_rm_images e2e test

Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>

* Test without automated collector

This code should be reverted once the timing logic has been sorted out.
See issue #568 for details.

Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>

* Fix error in initial trigger logic

Also, remove unnecessary goroutine.

Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>

* Address PR comments

Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>

* Address more PR comments

- rename `repeatImmediate` to `scheduleImmediate`
- remove kubebuilder comment for k8s `jobs` which we no longer use

Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>

* Re-generate manifests

Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>

* Fix typo in e2e test

Mistakenly had the old flag, `--repeat-immediate` instead of the new,
`--schedule-immediate`

Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>

Signed-off-by: Peter Engelbert <pmengelbert@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant