-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternate, Simplified Async Pull #137
Conversation
Note that I still need to update logging to use the chosen logger rather than |
logging is now updated to use klog. verbose logging of internals of the AsyncPuller can be enabled by increasing log verbosity, though i was having trouble trying to pass this through |
go test -count=1 -v ./pkg/remoteimageasync/... |
Node plugin pod seems to be crashing due to - container-image-csi-driver-warm-metal-csi-driver-nodeplugizd67g csi-plugin I0208 11:19:14.556750 1 provider.go:82] Docker config file not found: couldn't find valid .dockercfg after checking in [ /root /]
container-image-csi-driver-warm-metal-csi-driver-nodeplugizd67g csi-plugin panic: runtime error: invalid memory address or nil pointer dereference
container-image-csi-driver-warm-metal-csi-driver-nodeplugizd67g csi-plugin [signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x19f7b2c]
container-image-csi-driver-warm-metal-csi-driver-nodeplugizd67g csi-plugin
container-image-csi-driver-warm-metal-csi-driver-nodeplugizd67g csi-plugin goroutine 44 [running]:
container-image-csi-driver-warm-metal-csi-driver-nodeplugizd67g csi-plugin github.com/warm-metal/container-image-csi-driver/pkg/remoteimage.puller.Pull({{0x0, 0x0}, {0x22a1800, 0x400017c8c0}, {0x2299df8, 0x40007317a0}}, {0x22b3d90, 0x4000a25770})
container-image-csi-driver-warm-metal-csi-driver-nodeplugizd67g csi-plugin /go/src/container-image-csi-driver/pkg/remoteimage/pull.go:47 +0x13c |
@@ -51,8 +51,8 @@ var ( | |||
enableCache = flag.Bool("enable-daemon-image-credential-cache", true, | |||
"Whether to save contents of imagepullsecrets of the daemon ServiceAccount in memory. "+ | |||
"If set to false, secrets will be fetched from the API server on every image pull.") | |||
asyncImagePullMount = flag.Bool("async-pull-mount", false, | |||
"Whether to pull images asynchronously (helps prevent timeout for larger images)") | |||
asyncImagePullTimeout = flag.Duration("async-pull-timeout", 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
b6b702d
to
fbd31d5
Compare
…act that pull was not occurring.
…et.yaml to load multiple jobs in parallel
…c pull logic functions properly via fully integrated kind test
…er result type checks with method calls since i felt this was becoming too rigid... especially when considering logging
@mugdha-adhav could you kick integration tests again? thanks |
…not updated to remove completionChan depth and the build targets still succeeded. should have failed integration tests if executed.
"github.com/stretchr/testify/assert" | ||
) | ||
|
||
type chanTestStruct struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the actual session object here?
} | ||
|
||
// demonstrates session channel structure's pass-by-reference is appropriate | ||
func TestChannelStructContent(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above. If these tests are not directly or indirectly related by code to synchronizer, I think we should put it in a separate file.
Overall LGTM provided we
|
summary of discussion with @vadasambar: I agree that it would be good to build out additional regression tests at higher layers of the onion... in this case at the call to I believe this additional testing does not exercise additional code paths beyond one case where there is interplay between caller timeout (kubelet) and actual pull time when pull policy is always. In this case, an image pull that takes too long to pull and exceeds the 2m caller timeout but ALSO completes the pull BEFORE the next call to continue (completes during backoff waiting period) will start a new pull on next call. The image in this scenario will never succeed. Because it would never succeed previously, and just timeout and cancel, I consider this the same behavior. The above scenario tests the pre-existing pull-always logic in a new context where now many more scenarios CAN succeed. Anything that cannot succeed after the change would also have not succeeded prior. I am happy to grow the regression tests in a future change but don't think adding these new layers should block this feature. |
@vadasambar @mugdha-adhav could you review? thx |
@mugdha-adhav created meeting to wrap up tomorrow could you kick off integration tests again? it looks like it might need another run but it’s not clear from my end last commit was just comments. previous commit had passing integration tests |
I didn’t remove the test. I removed the “rendered” version which is rendered from https://github.com/imuni4fun/container-image-csi-driver/blob/4c6d7160fa72408225477aad4b061765753095e8/test/integration/metrics-manifests/test.yaml#L1 rendered test changes each time and should be ignored. i added it to |
{{- if .Values.enableAsyncPullMount }} | ||
- --async-pull-mount=true | ||
{{- if .Values.enableAsyncPull }} | ||
- --async-pull-timeout=10m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we pass this timeout value from helm values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing with @imuni4fun, we decided that we want to keep the enableAsyncPull
and adding default timeout value to be passed in from values.yaml. Something like --async-pull-timeout={{ .Values.asyncPullTimeout }}
…ller getter method Puller.ImageWithTag()
@mugdha-adhav pushed changes called out in review plus from @woehrl01 above. also one small replacement of docker.Named.String() with Puller.ImageWithTag() ephemeral-volumet-set test passed in local kind cluster ready for last integration tests and merge |
This PR implements an alternative, simplified approach for sync and async image pulling of images. It leverages the previous, synchronous code and behavior as much as possible.
Effectively, only the actual pull activity is affected. Before
v0.7.0
, the pull activity was entirely synchronous and a timeout meant the pull was terminated. Now, the same flow is preserved except that after the existing logic is evaluated and a decision to pull is made, a check is done to see if if the pull is already occurring because it was previously tried and timed out.Many factors complicate the simple timeout-based cancellation that this asynchronous pull feature addresses. Proper modularity allows for more robust and simple tests of individual logic. The changes to existing code are very minimal. The new behavior is implemented in a modular way in new files (a new package). The behavior is more easily tested because more of the system can be mocked.
For another comparison which does not have to back out as many new changes, see https://github.com/warm-metal/container-image-csi-driver/compare/v0.7.0...imuni4fun:container-image-csi-driver:async-pull-simplified-diff?expand=1
That comparison is missing a few minor tweaks but it illustrates the minimalism of the change to existing code.
Created issue #147 per request in community standup. It shares a little more background on this PR.