-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update README #206
Update README #206
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: xing-yang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
README.md
Outdated
|
||
The move of the Kubernetes Volume Snapshot feature to beta also means: | ||
* A revamp of volume snapshot APIs. | ||
* The CSI external-snapshotter sidecar is split into two controllers, a common snapshot controller and a CSI external-snapshotter sidecar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove the word "common" and just call it snapshot controller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
README.md
Outdated
@@ -30,23 +37,30 @@ Snapshotter talks to CSI over socket (/run/csi/socket by default, configurable b | |||
* If the `VolumeSnapshotRef` is set to nil, skip this content since it is not bound to any snapshot object. | |||
* Otherwise, the controller verifies whether the content object is correctly bound to a snapshot object. In case the `VolumeSnapshotRef.UID` is set but it does not match its snapshot object or snapshot no long exists, the content object and its associated snapshot will be deleted. | |||
|
|||
## Usage | |||
## Deployment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also want to provide some recommended RBACs for users (not sidecar, not controller) to operate on the CRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that include admins? Non-admin users can only see VolumeSnapshot, Admins can see both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I mean provide different recommended rbacs for different personas
README.md
Outdated
|
||
External snapshotter follows [controller](https://github.com/kubernetes/community/blob/master/contributors/devel/controllers.md) pattern and uses informers to watch for `VolumeSnapshot` and `VolumeSnapshotContent` create/update/delete events. It filters out these objects with `Snapshotter==<CSI driver name>` specified in the associated VolumeSnapshotClass object and then processes these events in workqueues with exponential backoff. | ||
The snapshot controller is deployed by the Kubernetes distributions and is responsible for watching the VolumeSnapshot CRD objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"and manages the creation and deletion lifecycle"?
README.md
Outdated
|
||
* Discovers the supported snapshotter name by `GetDriverName` call. | ||
## Controller Workflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much of this is relevant to a vendor looking to integrate these components, and how much can go into a "design.md" (or even KEP) file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this section is actually already in the blog post. I added it here because in the old README, there's some design overview on how the controller is implemented. I can just remove this and refer to the blog which is already linked. I can also submit a KEP later if that's preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should capture detailed design and workflows somewhere other than a blog post. I think KEP is a natural place for this
README.md
Outdated
|
||
The change from Alpha to Beta snapshot APIs is not backward compatible. | ||
|
||
If you have already deployed Alpha snapshot APIs and controller and want to upgrade to Beta, you need to do the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also set the deletepolicy to retain, delete the alpha objects, and then import them as pre-existing snapshots using the beta APIs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I can add that. I actually have that in my first draft of the README but removed them when I submitted the PR because I thought maybe the workflow that @yuxiangqian is working on can better address this.
README.md
Outdated
@@ -1,52 +1,75 @@ | |||
# CSI Snapshotter | |||
# CSI Snapshot Feature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also mirror the README format that the other sidecars have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With Beta, I'll have to talk about the snapshot controller and the CSI snapshotter. So it won't be exactly the same as other sidecars.
README.md
Outdated
|
||
The CSI external-snapshotter is part of Kubernetes implementation of [Container Storage Interface (CSI)](https://github.com/container-storage-interface/spec). | ||
The CSI snapshot feature is part of Kubernetes implementation of [Container Storage Interface (CSI)](https://github.com/container-storage-interface/spec). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a few main audiences for this readme:
- Users wanting to use the snapshot feature.
- CSI driver authors wanting to integrate snapshot support for their driver
- Kubernetes distributions/deployers that want to support snapshots.
- Developers making changes to the components.
For 1), I would point them to the K8s docs.
And then clearly state the audience for the rest of this README, which I think should be 2) and 3).
For 4), we should have another doc like "DEVELOPING.md"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'll take a look of README of other sidecars. I think the old README mirrored the one in external-attacher.
README.md
Outdated
|
||
* Discovers the supported snapshotter name by `GetDriverName` call. | ||
| Compatible with CSI Version | Container Image | Min K8s Version | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need a column for what versions of the snapshot CRDs it supports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
README.md
Outdated
|
||
### Running with Leader Election | ||
### Important changes in the snapshot beta APIs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarify from alpha1v1-> beta1v1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this go into changelog instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add this to the changelog, but I still want to highlight some of these in this doc. Let me rephrase a little bit.
README.md
Outdated
#### Other recognized arguments | ||
* `--kubeconfig <path>`: Path to Kubernetes client configuration that the snapshot controller uses to connect to Kubernetes API server. When omitted, default token provided by Kubernetes will be used. This option is useful only when the snapshot controller does not run as a Kubernetes pod, e.g. for debugging. | ||
|
||
* `--resync-period <duration>`: Internal resync interval when the snapshot controller re-evaluates all existing `VolumeSnapshot` instances and tries to fulfill them, i.e. create / delete corresponding snapshots. It does not affect re-tries of failed calls! It should be used only when there is a bug in Kubernetes watch logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, 60 sec
README.md
Outdated
|
||
* `--resync-period <duration>`: Internal resync interval when the snapshot controller re-evaluates all existing `VolumeSnapshot` instances and tries to fulfill them, i.e. create / delete corresponding snapshots. It does not affect re-tries of failed calls! It should be used only when there is a bug in Kubernetes watch logic. | ||
|
||
* `--create-snapshotcontent-retrycount`: Number of retries when we create a snapshot content object for a snapshot. Default is 5. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does that mean we give up entirely after 5? I thought we changed it to retry forever until user cancels the operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This retry count is used for creating VolumeSnapshotContent API objects. We didn't talk about retrying this one forever.
The logic we are trying to change is for creating a physical snapshot on the storage system itself. Grant started a PR which followed what Jan has done for the external-provisioner. Due to the beta splitting controller work, that work is delayed. That involves big controller logic changes which I do not want to add on top of the splitting controller changes in this release. We can revisit that after we cut this release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If API server is temporarily down, then we will give up on trying to bind or create a snapshot? I'm not sure we want that? Also keep in mind especially now that this is beta, we need to support the argument according to the deprecation policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we copied it from the PV controller. Looks like PV controller retries creating PV object 5 times by default, but createProvisionedPVRetryCount is a const defined inside the PV controller. Should we get rid of these command line options and use a const in the snapshot controller instead?
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/persistentvolume/pv_controller.go#L131
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/persistentvolume/pv_controller.go#L1489
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think that would be better. It gives more flexibility to change the behavior later.
I think PV controller scenario is slightly different than snapshot. For snapshots, I think we have stronger idempotency/naming guarantees, so it can retry the calls with the same id, which helps (but not completely) avoids the leaking issue. So I'm not sure how necessary the retries in snapshot controller should be. cc @jsafrane
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, does the scenario apply at all? Now we create the Content object before calling CSI. This avoids the leaking scenario right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed these from README. Will submit a PR that removes them from cmd line options and uses a const internally instead.
README.md
Outdated
|
||
* `--leader-election-namespace <namespace>`: The namespace where the leader election resource exists. Defaults to the pod namespace if not set. | ||
|
||
* `--timeout <duration>`: Timeout of all calls to CSI driver. It should be set to value that accommodates majority of `CreateSnapshot`, `DeleteSnapshot, and `ListSnapshots` calls. 1 minute is used by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closing quote missing after DeleteSnapshot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
README.md
Outdated
The change from Alpha to Beta snapshot APIs is not backward compatible. | ||
|
||
If you have already deployed Alpha snapshot APIs and controller and want to upgrade to Beta, you need to do the following: | ||
* Delete snapshots created using Alpha snapshot CRDs and controller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does "controller" refer to in the alpha sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"controller" here refers to the alpha version external-snapshotter sidecar controller.
|
||
## Testing | ||
|
||
Running Unit Tests: | ||
|
||
```bash | ||
go test -timeout 30s github.com/kubernetes-csi/external-snapshotter/pkg/controller | ||
go test -timeout 30s github.com/kubernetes-csi/external-snapshotter/pkg/common-controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does "make test" accomplish the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"make test" will do more than just running unit tests. It will also do go mod, go fmt, etc.
/hold |
README.md
Outdated
* Uninstall v1alpha1 snapshot CRDs, external-snapshotter sidecar controller, and CSI driver. | ||
* Install v1beta1 snapshot CRDs, snapshot controller, CSI external-snapshotter sidecar and CSI driver. | ||
|
||
If you want to keep the physical snapshot resources created using the v1alpha1 version, here are suggested steps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msau42 @yuxiangqian please review this section to make sure it is okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bswartz Here is the upgrade section. PTAL.
README.md
Outdated
Note: There is a possibility for data loss if something goes wrong during this process. Verify steps 1 and 2 works for a sample snapshot before moving forward. There was a bug in the pre-provisioned snapshots logic that was fixed in release [v1.2.1](https://github.com/kubernetes-csi/external-snapshotter/blob/release-1.2/CHANGELOG-1.2.md#notable-changes-1). Make sure you are using v1.2.1 or later before upgrading to v1beta1 if you want to try the following steps. | ||
1. Change deletionPolicy from Delete to Retain in the volume snapshot content. | ||
2. Delete volume snapshots created using v1alpha1 snapshot CRDs and external-snapshotter sidecar controller. Verify that volume snapshot content and the physical snapshot resource remain. | ||
3. Make sure above steps work for a sample snapshot before moving forward. This is to make sure that deletion of a volume snapshot object will not trigger the deletion of the volume snapshot content object and its associated physical snapshot resource on the storage system when the deletionPolicy is Retain. It is very important to verify this step to make sure there is no data loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put this note as 1), since you want them to do this before they actually try the steps
README.md
Outdated
4. Uninstall v1alpha1 snapshot CRDs, external-snapshotter sidecar controller, and CSI driver. The v1alpha1 volume snapshot content and its associated physical snapshot resource will not be in the Kubernetes database any more. | ||
5. Install v1beta1 snapshot CRDs, snapshot controller, CSI external-snapshotter sidecar, and CSI drvier. | ||
6. Create a v1beta1 volume snapshot content API object for the physical snapshot resource. | ||
* Note: Do not set the UID field of VolumeSnapshotRef! If you copied a volume snapshot content yaml file from your v1alpha1 setup, make sure to DELETE the UID field of VolumeSnapshotRef. This field will be populated by the controller automatically after a v1beta1 volume snapshot API object is created and bound to the volume snapshot content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any other things like annotations that they should also remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove this section as I have not tested it myself.
Sure. I'll open an issue to track. |
/hold cancel |
/lgtm |
670bb0e Merge pull request kubernetes-csi#229 from marosset/fix-codespell-errors 35d5e78 Merge pull request kubernetes-csi#219 from yashsingh74/update-registry 63473cc Merge pull request kubernetes-csi#231 from coulof/bump-go-version-1.20.5 29a5c76 Merge pull request kubernetes-csi#228 from mowangdk/chore/adopt_kubernetes_recommand_labels 8dd2821 Update cloudbuild image with go 1.20.5 1df23db Merge pull request kubernetes-csi#230 from msau42/prow 1f92b7e Add ginkgo timeout to e2e tests to help catch any stuck tests 2b8b80e fixing some codespell errors c10b678 Merge pull request kubernetes-csi#227 from coulof/check-sidecar-supported-versions 72984ec chore: adopt kubernetes recommand label b055535 Header bd0a10b typo c39d73c Add comments f6491af Script to verify EOL sidecar version 4133d1d Merge pull request kubernetes-csi#226 from msau42/cloudbuild 8d519d2 Pin buildkit to v0.10.6 to workaround v0.11 bug with docker manifest 6e04a03 Merge pull request kubernetes-csi#224 from msau42/cloudbuild 26fdfff Update cloudbuild image 6613c39 Merge pull request kubernetes-csi#223 from sunnylovestiramisu/update 0e7ae99 Update k8s image repo url 77e47cc Merge pull request kubernetes-csi#222 from xinydev/fix-dep-version 155854b Fix dep version mismatch 8f83905 Merge pull request kubernetes-csi#221 from sunnylovestiramisu/go-update 1d3f94d Update go version to 1.20 to match k/k v1.27 e322ce5 Merge pull request kubernetes-csi#220 from andyzhangx/fix-golint-error b74a512 test: fix golint error 901bcb5 Update registry k8s.gcr.io -> registry.k8s.io aa61bfd Merge pull request kubernetes-csi#218 from xing-yang/update_csi_driver 7563d19 Update CSI_PROW_DRIVER_VERSION to v1.11.0 a2171be Merge pull request kubernetes-csi#216 from msau42/process cb98782 Merge pull request kubernetes-csi#217 from msau42/owners a11216e add new reviewers and remove inactive reviewers dd98675 Add step for checking builds b66c082 Merge pull request kubernetes-csi#214 from pohly/junit-fixes b9b6763 filter-junit.go: fix loss of testcases when parsing Ginkgo v2 JUnit d427783 filter-junit.go: preserve system error log 38e1146 prow.sh: publish individual JUnit files as separate artifacts 78c0fb7 Merge pull request kubernetes-csi#208 from jsafrane/skip-selinux 36e433e Skip SELinux tests in CI by default 348d4a9 Merge pull request kubernetes-csi#207 from RaunakShah/reviewers 1efc272 Merge pull request kubernetes-csi#206 from RaunakShah/update-prow 7d410d8 Changes to csi prow to run e2e tests in sidecars cfa5a75 Merge pull request kubernetes-csi#203 from humblec/test-vendor 4edd1d8 Add RaunakShah to CSI reviewers group 7ccc959 release tools update to 1.19 d24254f Merge pull request kubernetes-csi#202 from xing-yang/kind_0.14.0 0faa3fc Update to Kind v0.14.0 images ef4e1b2 Merge pull request kubernetes-csi#201 from xing-yang/add_1.24_image 4ddce25 Add 1.24 Kind image 7fe5149 Merge pull request kubernetes-csi#200 from pohly/bump-kubernetes-version 70915a8 prow.sh: update snapshotter version 31a3f38 Merge pull request kubernetes-csi#199 from pohly/bump-kubernetes-version 7577454 prow.sh: bump Kubernetes to v1.22.0 d29a2e7 Merge pull request kubernetes-csi#198 from pohly/csi-test-5.0.0 41cb70d prow.sh: sanity testing with csi-test v5.0.0 c85a63f Merge pull request kubernetes-csi#197 from pohly/fix-alpha-testing b86d8e9 support Kubernetes 1.25 + Ginkgo v2 ab0b0a3 Merge pull request kubernetes-csi#192 from andyzhangx/patch-1 7bbab24 Merge pull request kubernetes-csi#196 from humblec/non-alpha e51ff2c introduce control variable for non alpha feature gate configuration ca19ef5 Merge pull request kubernetes-csi#195 from pohly/fix-alpha-testing 3948331 fix testing with latest Kubernetes e4dab7f Merge pull request kubernetes-csi#194 from yselkowitz/registry-k8s-io 84a4d5a Move from k8s.gcr.io to registry.k8s.io 9a0260c fix boilerplate header 37d1104 Merge pull request kubernetes-csi#191 from pohly/go-1.18 db917f5 update to Go 1.18 335339f Merge pull request kubernetes-csi#187 from mauriciopoppe/remove-eol-windows-versions 890b87a Merge pull request kubernetes-csi#188 from pwschuurman/update-release-notes-docs 274bc9b Update Sidecar Release Process documentation to reference latest syntax for release-notes tool 87b6c37 Merge pull request kubernetes-csi#185 from Garima-Negi/fix-OWNERS-files f1de2c6 Fix OWNERS file - squashed commits 59ae38b Remove EOL windows versions from BUILD_PLATFORMS 5d66471 Merge pull request kubernetes-csi#186 from humblec/sp d066f1b Correct prow.sh typo and make codespell linter pass 762e22d Merge pull request kubernetes-csi#184 from pohly/image-publishing-troubleshooting 81e26c3 SIDECAR_RELEASE_PROCESS.md: add troubleshooting for image publishing 31aa44d Merge pull request kubernetes-csi#182 from chrishenzie/csi-sanity-version f49e141 Update csi-sanity test suite to v4.3.0 d9815c2 Merge pull request kubernetes-csi#181 from mauriciopoppe/armv7-support 05c1801 Add support to build arm/v7 images 4aedf35 Merge pull request kubernetes-csi#178 from xing-yang/timeout 2b9897e Increase build timeout 51d3702 Merge pull request kubernetes-csi#177 from mauriciopoppe/kind-image-1.23 a30efea Add kind image for 1.23 a6a1a79 Merge pull request kubernetes-csi#176 from pohly/go-1.17.3 0a2cf63 prow.sh: bump Go to 1.17.3 fc29fdd Merge pull request kubernetes-csi#141 from pohly/prune-replace-optional 5b9a1e0 Merge pull request kubernetes-csi#175 from jimdaga/patch-1 5eeb602 images: use k8s-staging-test-infra/gcb-docker-gcloud b46691a go-get-kubernetes.sh: make replace statement pruning optional git-subtree-dir: release-tools git-subtree-split: 670bb0e
Changes to csi prow to run e2e tests in sidecars
What type of PR is this?
/kind documentation
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #154
Special notes for your reviewer:
Does this PR introduce a user-facing change?: