Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing the storage version API, Part I #84016

Closed

Conversation

caesarxuchao
Copy link
Member

@caesarxuchao caesarxuchao commented Oct 16, 2019

/sig-apimachinery
/assign @deads2k @yliaog @lavalamp
/kind feature
KEP

This PR implements:

  1. the storage version API.
  2. Let the kube-aggregator host the storage version API.
  3. kube-apiserver updates the storage version during bootstrap.
  4. A filter that blocks certain requests during kube-apiserver bootstrap until the kube-apiserver has updated the storage versions.
  5. An integration test that shows the filer works, and that the apiserver works properly after the storage version has been updated.

This PR is already huge, so I'll do the remaining tasks in follow-up PRs. Some major follow-ups are:

  1. the filter should block write requests to "apiservices" and "crd".
  2. the garbage collection of obsolete storage version objects.
  3. a test showing that the API works fine in HA cluster.
  4. custom resource handler needs to update the storage version the CR before accepting write requests.

@k8s-ci-robot
Copy link
Contributor

@caesarxuchao: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

@caesarxuchao: The label(s) kind/api cannot be applied. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

/sig-apimachinery
/assign
/kind feature
/kind api
WIP

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 16, 2019
@caesarxuchao caesarxuchao force-pushed the HA-storage-version-api branch 4 times, most recently from 52ada48 to e16e33e Compare November 6, 2019 02:13
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 6, 2019
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. sig/auth Categorizes an issue or PR as relevant to SIG Auth. labels Dec 21, 2019
@k8s-ci-robot k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Feb 11, 2020
@cblecker cblecker removed their request for review February 12, 2020 05:07
@k8s-ci-robot
Copy link
Contributor

@caesarxuchao: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce-alpha-features d8edfa0 link /test pull-kubernetes-e2e-gce-alpha-features

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@caesarxuchao
Copy link
Member Author

The pull-kubernetes-e2e-gce-alpha-features failed because the alpha API (storageversion) was not enabled.

@caesarxuchao
Copy link
Member Author

To fix pull-kubernetes-e2e-gce-alpha-features, we need to change its job yaml. It only runs some alpha e2e tests.

OTOH, the pull-kubernetes-local-e2e ran all e2e conformance tests. It passed with the storage version feature enabled. So I don't plan to fix pull-kubernetes-e2e-gce-alpha-features.

}
}

func (s *DefaultManager) addGroupResource(group, resource string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer schema.GroupResource

return
}
klog.V(2).Infof("successfully updated storage version for %v", r.Resource.Name)
s.RemoveResourceInfo(r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we will want to know the full list of ones we handled. Instead, can you mark it as completed.


mu sync.RWMutex
resources map[*ResourceInfo]struct{}
groupResources map[string]map[string]struct{}
Copy link
Contributor

@deads2k deads2k Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about groupResources map[schema.GroupResource]resourceStatus

type resourceStatus struct{
    done bool
    lastErr error
    resourceInfo ResourceInfo
}

for our future inspection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also eliminate the parallel maps


// UpdateStorageVersions updates the StorageVersions. If the updates are
// successful, following calls to Completed() returns true.
func (s *DefaultManager) UpdateStorageVersions(loopbackClientConfig *rest.Config, serverID string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubeAPIServerClientConfig . On aggregated APIs, loopback means a different thing.

for r := range resources {
r.DecodableVersions = decodableVersions(r.EquivalentResourceMapper, r.Group, r.Resource.Name)
if err := updateStorageVersionFor(sc, serverID, r.Group+"."+r.Resource.Name, r.EncodingVersion, r.DecodableVersions); err != nil {
klog.Fatalf("failed to update storage version for %v", r.Resource.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not Fatal. perhaps utilruntime.HandleError

if err != nil && !apierrors.IsNotFound(err) {
return err
}
if err != nil && apierrors.IsNotFound(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err != nil isn't needed

return err
}

func localUpdateStorageVersion(sv *v1alpha1.StorageVersion, apiserverID, encodingVersion string, decodableVersions []string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please refactor to build a new object to return that can be assigned instead of mutating the input.

metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`

// Spec is omitted because there is no spec field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stub in an empty spec please, so the proto looks normal and the object doesn't look too weird.

@deads2k
Copy link
Contributor

deads2k commented Feb 26, 2020

@caesarxuchao do you have a separate PR with just the API?

}

s.StorageVersion.AddResourceInfo(resources...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this happen before we start listening? if so, add a comment that this approach is safe because of this.

apiGroupInfo := apiservicerest.NewRESTStorage(c.GenericConfig.MergedResourceConfig, c.GenericConfig.RESTOptionsGetter)
if err := s.GenericAPIServer.InstallAPIGroup(&apiGroupInfo); err != nil {
rp := apiserverinternalrest.StorageProvider{}
apiGroupInfo, err := rp.NewRESTStorage(c.GenericConfig.MergedResourceConfig, c.GenericConfig.RESTOptionsGetter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected to see this in the kube-apiserver, not in the aggregator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I let the aggregator host it is because the storage version API contains information about all apiservers (kube-apiserver, apiextension apiserver, aggregator apiserver, and custom apiservers). It's analogous to APIService, which is also hosted by the aggrgator. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I let the aggregator host it is because the storage version API contains information about all apiservers (kube-apiserver, apiextension apiserver, aggregator apiserver, and custom apiservers). It's analogous to APIService, which is also hosted by the aggrgator. WDYT?

I think this is closer to the idea of webhook admission configuration. It's provided by the kube-apiserver for consumption and usage by other servers, but aggregation is focused on its single mission

@liggitt liggitt removed the api-review Categorizes an issue or PR as actively needing an API review. label Mar 10, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 27, 2020
@k8s-ci-robot
Copy link
Contributor

@caesarxuchao: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 10, 2020
@caesarxuchao
Copy link
Member Author

Inherited by #92064 and #92459.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/apiserver area/dependency Issues or PRs related to dependency changes area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants