Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ImageListPullJob to simplify ImagePullJob #1222

Merged
merged 1 commit into from
May 19, 2023

Conversation

diannaowa
Copy link
Contributor

@diannaowa diannaowa commented Mar 14, 2023

Define ImageListPullJob

ImageListPullJob:

// ImageListPullJob is the Schema for the imagelistpulljobs API
type ImageListPullJob struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   ImageListPullJobSpec   `json:"spec,omitempty"`
	Status ImageListPullJobStatus `json:"status,omitempty"`
}
......
// with same fields as ImagePullJobSpec except the Images
// ImageListPullJobSpec defines the desired state of ImageListPullJob
type ImageListPullJobSpec struct {
	// Images is the image list to be pulled by the job
	Images []string `json:"images"`
	......

	// PullPolicy is an optional field to set parameters of the pulling task. If not specified,
	// the system will use the default values.
	// +optional
	PullPolicy *PullPolicy `json:"pullPolicy,omitempty"`

	// CompletionPolicy indicates the completion policy of the job.
	// Default is Always CompletionPolicyType.
	CompletionPolicy CompletionPolicy `json:"completionPolicy"`
  ......
}

Define ImageListPullJobStatus

// ImageListPullJobStatus defines the observed state of ImageListPullJob
type ImageListPullJobStatus struct {
......
	// The desired number of ImagePullJobs, this is typically equal to the number of len(spec.Images).
	Desired int32 `json:"desired"`

	// The number of actively running ImagePullJobs(status.ACTIVE>0).
	// +optional
	Active int32 `json:"active"`

	// The number of ImagePullJobs which status.CompletionTime!=nil
	// +optional
	Completed int32 `json:"completed"`

	//The number of ImagePullJobs which status.Failed==0.
	// +optional
	Succeeded int32 `json:"succeeded"` 

	// with format Succeeded/Completed
	// +optional
	Status string `json:"status"`
}

About the deletion of ImageListPullJob

ImageListPullJob will be deleted when this condition is met.

​(CompletionPolicy.Type=Always && CompletionPolicy.TTLSecondsAfterFinished >0)

Webhook

Added webhook for ImageListPullJob, validating-webhook is used to verify the validity of some values (such as spec.Images field), and mutating-webhook is used to set some default values. The webhook behavior here is basically similar to that of ImagePullJob.

Controller: Watch & Generate ImagePullJob

Added webhook for ImageListPullJob, validating-webhook is used to verify the validity of some values (such as spec.Images field), and mutating-webhook is used to set some default values. The webhook behavior here is basically similar to that of ImagePullJob.

Implementation

ImageListPullJob Event Handler: Watch the add, delete, update events

ImagePullJob Event Handler: Watch the ImagePullJob that has changed the status.

Create a new ImagePullJob according to spec.Images or delete redundant ImagePullJob (when the job is completed)

  1. When the current ImageListPullJob.StatusCompletionTime!=nil, when CompletionPolicy.Type=Always && CompletionPolicy.TTLSecondsAfterFinished, try to delete the current ImageListPullJob

  2. Get the ImagePullJobs currently owned by ImageListPullJob

  3. Calculate the status of the current ImageListPullJob, the ImagePullJob that needs to be created, and the ImagePullJob that needs to be deleted(the spec.image not in ImageListPullJob.Spec.Images)

  4. Synchronize ImagePullJob(create new ImagePullJob and delete the ImagePullJob which spec.image not in ImageListPullJob.Spec.Images)

  5. Update ImageListPullJob to the latest status

The example is as follows

apiVersion: apps.kruise.io/v1alpha1
kind: ImageListPullJob
metadata:
  name: job-list-with-always
spec:
  images:
  - nginx
  - busybox:1.35
  - httpd:2.4.38
  completionPolicy:
    type: Always
  1. Create a new ImageListPullJob based on the above Yaml

  2. The controller will create ImagePullJob with the Spec.Images

MacBook-Pro kruise % kubectl get imagepulljob
NAME                         TOTAL   ACTIVE   SUCCEED   FAILED   AGE   MESSAGE
job-list-with-always-2t424   2       1        0         0        2s    job is running, progress 0.0%
job-list-with-always-8vxc7   2       1        0         0        2s    job is running, progress 0.0%
job-list-with-always-mhxzx   2       1        0         0        2s    job is running, progress 0.0%
  1. The status of ImageListPullJob
MacBook-Pro kruise % kubectl get imagelistpulljob
NAME                   TOTAL   ACTIVE   STATUS   AGE
job-list-with-always   3       3        0/0      58s

TOTAL: the number of ImagePullJob

ACTIVE: Indicates that there are 3 Active ImagePullJobs

STATUS: The format is Succeeded/Completed, which means {the number of successful ImagePullJobs}/{the number of completed ImagePullJobs}

Note:

Successful ImagePullJob definition: ImagePullJob.Status.Desired==ImagePullJob.Status.Succeeded (and ImagePullJob.Status.CompletionTime!=nil)

Ⅱ. Does this pull request fix one issue?

fixes #1211

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

@codecov-commenter
Copy link

Codecov Report

Merging #1222 (ac91f6a) into master (54079d1) will increase coverage by 0.07%.
The diff coverage is 46.45%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master    #1222      +/-   ##
==========================================
+ Coverage   50.04%   50.11%   +0.07%     
==========================================
  Files         143      146       +3     
  Lines       19898    20131     +233     
==========================================
+ Hits         9958    10089     +131     
- Misses       8844     8928      +84     
- Partials     1096     1114      +18     
Flag Coverage Δ
unittests 50.11% <46.45%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...imagelistpulljob/imagelistpulljob_event_handler.go 0.00% <0.00%> (ø)
...er/imagelistpulljob/imagelistpulljob_controller.go 49.38% <49.38%> (ø)
...troller/imagelistpulljob/imagelistpulljob_utils.go 83.33% <83.33%> (ø)

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@zmberg
Copy link
Member

zmberg commented Mar 14, 2023

@diannaowa Are you convenient to discuss it together at the zoom community meeting at 7:30 p.m. on 3.23?

@diannaowa
Copy link
Contributor Author

Are you convenient to discuss it together at the zoom community meeting at 7:30 p.m. on 3.23?

Thanks for your reply. If I have time next Thursday, I will attend the meeting.

@zmberg
Copy link
Member

zmberg commented Mar 15, 2023

Are you convenient to discuss it together at the zoom community meeting at 7:30 p.m. on 3.23?

Thanks for your reply. If I have time next Thursday, I will attend the meeting.

This ability is still a little complicated, so I still hope to communicate in meetings.

@diannaowa
Copy link
Contributor Author

Are you convenient to discuss it together at the zoom community meeting at 7:30 p.m. on 3.23?

Thanks for your reply. If I have time next Thursday, I will attend the meeting.

This ability is still a little complicated, so I still hope to communicate in meetings.

Thanks, I try to make time for meetings. I have added the topic to the document of the biweekly meeting.

@diannaowa
Copy link
Contributor Author

diannaowa commented Mar 23, 2023

As discussed on the biweekly, I updated the design of ImageListPullJobStatus:

// ImageListPullJobStatus defines the observed state of ImageListPullJob
type ImageListPullJobStatus struct {
	// Represents time when the job was acknowledged by the job controller.
	// It is not guaranteed to be set in happens-before order across separate operations.
	// It is represented in RFC3339 form and is in UTC.
	// +optional
	StartTime *metav1.Time `json:"startTime,omitempty"`

	// Represents time when the all the image pull job was completed. It is not guaranteed to
	// be set in happens-before order across separate operations.
	// It is represented in RFC3339 form and is in UTC.
	// +optional
	CompletionTime *metav1.Time `json:"completionTime,omitempty"`

	// The desired number of ImagePullJobs, this is typically equal to the number of len(spec.Images).
	Desired int32 `json:"desired"`

	// The number of actively running ImagePullJobs(status.ACTIVE>0).
	// +optional
	Active int32 `json:"active"`

	// The number of ImagePullJobs which status.CompletionTime!=nil
	// +optional
	Completed int32 `json:"completed"`

	// The number of image pull job which status.Failed==0.
	// +optional
	Succeeded int32 `json:"succeeded"`

	// Succeeded/Completed
	// +optional
	Status string `json:"status"`

	// The status of ImagePullJob which has the failed nodes(status.Failed>0) .
	// +optional
	FailedImagePullJobStatuses []FailedImagePullJobStatus `json:"failedImagePullJobs,omitempty"`
}

// FailedImagePullJobStatus the state of ImagePullJob which has the failed nodes(status.Failed>0)
type FailedImagePullJobStatus struct {
	// The name of ImagePullJob which has the failed nodes(status.Failed>0)
	// +optional
	Name string `json:"name,omitempty"`

	// The number of pulling tasks which reached phase Failed of this ImagePullJob.
	// +optional
	Failed int32 `json:"failed"`

	// The nodes that failed to pull the image of this ImagePullJob.
	// +optional
	FailedNodes []string `json:"failedNodes,omitempty"`
}

Any other idea on this would be helpful. Thanks.
ping @zmberg

@veophi
Copy link
Member

veophi commented Mar 24, 2023

How about

failedNodes:
- name: 32.234.2.1
   images:
   - busybox:1.32
   - centos:latest

Moreover, we should consider to limit the number of failed nodes in status.

@diannaowa
Copy link
Contributor Author

How about

failedNodes:
- name: 32.234.2.1
   images:
   - busybox:1.32
   - centos:latest

Moreover, we should consider to limit the number of failed nodes in status.

If the design about FailedImagePullJobStatuses is ok, because in my design, each image will generate an ImagePullJob object, then the status will be displayed as follows:

failedImagePullJobStatuses:
  - name:  imagepulljob-xxx-xxxx
    failed: 27
    failedNodes:
     - name: 32.234.2.1
       images:
        - busybox:1.32
     - name: 32.234.2.2
       images:
        - busybox:1.32
     - name: 32.234.2.3
       images:
        - busybox:1.32

How about:

failedImagePullJobStatuses:
  - name:  imagepulljob-xxx-xxxx
    image: busybox:1.32
    failed: 27
    failedNodes:
    - name: 32.234.2.1
    - name: 1.1.1.1
    - name: 2.2.2.2

the failedNodes will be same as ImagePullJob.Status.FailedNodes

About the limit on the maximum number of failed nodes:
This number is affected by the node selector in ImagePulJob.Spec.I think we need to limit the number of FailedNodes in ImagePullJob.

@veophi
Copy link
Member

veophi commented Mar 24, 2023

@diannaowa good idea!

@diannaowa
Copy link
Contributor Author

diannaowa commented Mar 25, 2023

@veophi To make ImageListPullJob more flexible.
How about

imageListPullJobSpec:
  pullSecrets: 
  selector: 
  podSelector: 
  sandboxConfig: 
  completionPolicy:
  imagePullJobTemplate:
  - image: nginx:1.1.1
    pullSecrets: 
    selector: 
    podSelector: 
    sandboxConfig: 
    completionPolicy:
    pullPolicy:
    parallelism:
  - image: httpd:2.34
    pullSecrets: 
    selector: 
    podSelector: 
    sandboxConfig: 
    CompletionPolicy:
    pullPolicy:
    parallelism: 
  - image: busybox
    pullSecrets: null
    selector: null
    podSelector: null
    sandboxConfig: null
    CompletionPolicy:
    pullPolicy:
    parallelism: 

The imagePullJobTemplate.pullSecrets will be override imageListPullJobSpec.pullSecrets.

@diannaowa
Copy link
Contributor Author

diannaowa commented Mar 27, 2023

For more efficient discussion, I added the Proposal DOC. And changed the PR to Draft status (but I don’t know why the PR in the Draft status will also trigger the execution of CI). @veophi we can discussion based on the proposal doc. Proposal

@diannaowa diannaowa force-pushed the feat_1211 branch 2 times, most recently from 5fe4e82 to aa2db64 Compare April 18, 2023 06:14
@diannaowa
Copy link
Contributor Author

@diannaowa diannaowa force-pushed the feat_1211 branch 2 times, most recently from 8b832f8 to 3ec1630 Compare April 29, 2023 02:23
@furykerry
Copy link
Member

/lgtm

@kruise-bot kruise-bot added the lgtm label May 8, 2023
Signed-off-by: liuzhenwei <dui_zhang@163.com>

calculate status for imagelistpulljob

Signed-off-by: liuzhenwei <dui_zhang@163.com>

make generate manifests

Signed-off-by: liuzhenwei <dui_zhang@163.com>

add imagelistpulljob.status.status

Signed-off-by: liuzhenwei <dui_zhang@163.com>

make generate manifests

Signed-off-by: liuzhenwei <dui_zhang@163.com>

regist webhook handler

delete image pull job which is not existed in ImageListPullJob.Spec.Images

Signed-off-by: liuzhenwei <dui_zhang@163.com>

support the same behavior as image pull job for TTLSecondsAfterFinished and CompletionTime fields

Signed-off-by: liuzhenwei <dui_zhang@163.com>

resourceVersionExpectations

Signed-off-by: liuzhenwei <dui_zhang@163.com>

add ut

Signed-off-by: liuzhenwei <dui_zhang@163.com>

verify the maximum number of images cannot > 255

Signed-off-by: liuzhenwei <dui_zhang@163.com>

make generate manifests

Signed-off-by: liuzhenwei <dui_zhang@163.com>

add failled image pull job status

Signed-off-by: liuzhenwei <dui_zhang@163.com>

simplify imageListPullJobStatus and spec

Signed-off-by: liuzhenwei <dui_zhang@163.com>

fix mdlint

Signed-off-by: liuzhenwei <dui_zhang@163.com>

define ImagePullJobTemplate & fix imageliststatus when completionPolicy.Type is Never

Signed-off-by: liuzhenwei <dui_zhang@163.com>

fix,some print info

Signed-off-by: liuzhenwei <dui_zhang@163.com>

trigger ci

Signed-off-by: liuzhenwei <dui_zhang@163.com>

fix some issues of code

Signed-off-by: liuzhenwei <dui_zhang@163.com>

fix some logic of Expectations

Signed-off-by: liuzhenwei <dui_zhang@163.com>

Check for duplicate values of spec.images

Signed-off-by: liuzhenwei <dui_zhang@163.com>

move proposal doc to other PR

Signed-off-by: liuzhenwei <dui_zhang@163.com>

trigger ci&& modify comment

Signed-off-by: liuzhenwei <dui_zhang@163.com>

add e2e

Signed-off-by: liuzhenwei <dui_zhang@163.com>

remove phase field from status and and remove the unnecessary deepcopy

add ut for computeImagePullJobActions and fix some bugs

Signed-off-by: liuzhenwei <dui_zhang@163.com>
@diannaowa
Copy link
Contributor Author

@zmberg PTAL

@zmberg
Copy link
Member

zmberg commented May 11, 2023

/lgtm

@furykerry
Copy link
Member

/lgtm

1 similar comment
@veophi
Copy link
Member

veophi commented May 18, 2023

/lgtm

@zmberg
Copy link
Member

zmberg commented May 19, 2023

/approve

@kruise-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zmberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kruise-bot kruise-bot merged commit db83b70 into openkruise:master May 19, 2023
diannaowa added a commit to diannaowa/kruise that referenced this pull request Jun 2, 2023
calculate status for imagelistpulljob



make generate manifests



add imagelistpulljob.status.status



make generate manifests



regist webhook handler

delete image pull job which is not existed in ImageListPullJob.Spec.Images



support the same behavior as image pull job for TTLSecondsAfterFinished and CompletionTime fields



resourceVersionExpectations



add ut



verify the maximum number of images cannot > 255



make generate manifests



add failled image pull job status



simplify imageListPullJobStatus and spec



fix mdlint



define ImagePullJobTemplate & fix imageliststatus when completionPolicy.Type is Never



fix,some print info



trigger ci



fix some issues of code



fix some logic of Expectations



Check for duplicate values of spec.images



move proposal doc to other PR



trigger ci&& modify comment



add e2e



remove phase field from status and and remove the unnecessary deepcopy

add ut for computeImagePullJobActions and fix some bugs

Signed-off-by: liuzhenwei <dui_zhang@163.com>
diannaowa added a commit to diannaowa/kruise that referenced this pull request Jun 2, 2023
calculate status for imagelistpulljob



make generate manifests



add imagelistpulljob.status.status



make generate manifests



regist webhook handler

delete image pull job which is not existed in ImageListPullJob.Spec.Images



support the same behavior as image pull job for TTLSecondsAfterFinished and CompletionTime fields



resourceVersionExpectations



add ut



verify the maximum number of images cannot > 255



make generate manifests



add failled image pull job status



simplify imageListPullJobStatus and spec



fix mdlint



define ImagePullJobTemplate & fix imageliststatus when completionPolicy.Type is Never



fix,some print info



trigger ci



fix some issues of code



fix some logic of Expectations



Check for duplicate values of spec.images



move proposal doc to other PR



trigger ci&& modify comment



add e2e



remove phase field from status and and remove the unnecessary deepcopy

add ut for computeImagePullJobActions and fix some bugs

Signed-off-by: liuzhenwei <dui_zhang@163.com>
lilongfeng0902 pushed a commit to lilongfeng0902/kruise that referenced this pull request Sep 12, 2023
calculate status for imagelistpulljob



make generate manifests



add imagelistpulljob.status.status



make generate manifests



regist webhook handler

delete image pull job which is not existed in ImageListPullJob.Spec.Images



support the same behavior as image pull job for TTLSecondsAfterFinished and CompletionTime fields



resourceVersionExpectations



add ut



verify the maximum number of images cannot > 255



make generate manifests



add failled image pull job status



simplify imageListPullJobStatus and spec



fix mdlint



define ImagePullJobTemplate & fix imageliststatus when completionPolicy.Type is Never



fix,some print info



trigger ci



fix some issues of code



fix some logic of Expectations



Check for duplicate values of spec.images



move proposal doc to other PR



trigger ci&& modify comment



add e2e



remove phase field from status and and remove the unnecessary deepcopy

add ut for computeImagePullJobActions and fix some bugs

Signed-off-by: liuzhenwei <dui_zhang@163.com>
ppbits pushed a commit to ppbits/kruise that referenced this pull request Apr 4, 2024
calculate status for imagelistpulljob



make generate manifests



add imagelistpulljob.status.status



make generate manifests



regist webhook handler

delete image pull job which is not existed in ImageListPullJob.Spec.Images



support the same behavior as image pull job for TTLSecondsAfterFinished and CompletionTime fields



resourceVersionExpectations



add ut



verify the maximum number of images cannot > 255



make generate manifests



add failled image pull job status



simplify imageListPullJobStatus and spec



fix mdlint



define ImagePullJobTemplate & fix imageliststatus when completionPolicy.Type is Never



fix,some print info



trigger ci



fix some issues of code



fix some logic of Expectations



Check for duplicate values of spec.images



move proposal doc to other PR



trigger ci&& modify comment



add e2e



remove phase field from status and and remove the unnecessary deepcopy

add ut for computeImagePullJobActions and fix some bugs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request] Add ImageListPullJob to simplify ImagePullJob
6 participants