From 5e3740c077885fb1c31d5ca73e395abf6cf45dee Mon Sep 17 00:00:00 2001 From: Adrian Ludwin Date: Mon, 26 Oct 2020 20:04:35 -0400 Subject: [PATCH] HNC v0.6 docs: conditions and configuration Update the docs for the latest in condition schemas; make some slight tweaks to the resource config section and explain how to bypass the overwrite protection. Tested: staged at https://github.com/adrianludwin/multi-tenancy/tree/doc-updates/incubator/hnc/docs/user-guide. --- incubator/hnc/docs/user-guide/concepts.md | 83 ++++++--- incubator/hnc/docs/user-guide/how-to.md | 207 ++++++++++++++-------- 2 files changed, 187 insertions(+), 103 deletions(-) diff --git a/incubator/hnc/docs/user-guide/concepts.md b/incubator/hnc/docs/user-guide/concepts.md index f149edf2e..b2541c2e9 100644 --- a/incubator/hnc/docs/user-guide/concepts.md +++ b/incubator/hnc/docs/user-guide/concepts.md @@ -171,8 +171,8 @@ make this feasible. > other at exactly the same time; the admission controller would allow this > (since neither is yet the parent of the other), leading to a cycle. > Alternatively, an admin might simply accidentally disable the admission -> controllers. In such cases, HNC will put a critical condition on the -> namespaces until the cycle is resolved._ +> controllers. In such cases, HNC will put an `ActivitiesHalted` +> [condition](#admin-conditions) on the namespaces until the cycle is resolved._ In the command line, you may set a namespace’s parent using the `kubectl-hns` plugin as follows: `kubectl hns set --parent `. You can also @@ -266,13 +266,15 @@ updated or deleted as quickly as possible. Similarly, if you change the parent of a namespace, any objects that no longer exist in the namespace’s ancestry will be deleted, and any new objects from that ancestry will be added. -Every propagated object in HNC is given the `hnc.x-k8s.io/inheritedFrom` label. +Every propagated object in HNC is given the `hnc.x-k8s.io/inherited-from` label. The value of this label indicates the namespace that contains the original object. The HNC admission controller will prevent you from adding or removing this label, but if you manage to add it, HNC will likely promptly delete the object (believing that the source object has been deleted), while if you manage to delete it, HNC will simply overwrite the object anyway. +> _Note: in HNC v0.5, the `inherited-from` label was called `inheritedFrom`. + ### Tree labels and non-propagated policies @@ -392,27 +394,42 @@ admin of B privileges to N, then ask that admin to make N a child of B. -### Conditions +### Conditions and events As mentioned above, a **_condition_** is some kind of problem affecting a -namespace or a propagated object. Conditions are reported as part of the status -of the `HierarchicalConfiguration` object in each namespace, are summarized -across the entire cluster in the status of the `HNCConfiguration` object, and -are exposed via the `hnc/namespace_conditions` metric. - -Every condition contains a machine-readable code, a human-readable message, and -an optional list of objects that are affected by the condition. For example: - -* The `CritCycle` condition is used if you somehow bypass the validating webhook - and create a cycle. -* The `CannotPropagate` condition indicates that an object in this namespace - cannot be propagated to other namespaces. This condition is displayed in the - source namespace. - -Any condition that begins with the `Crit` prefix is a **_critical condition_**, -and indicates that there’s a serious problem with the namespace that prevents -normal HNC operation. Namespaces with critical conditions have the following -properties: +namespace or cluster. Namespaces without any problems have all conditions +removed. Generally speaking, HNC's validating admission webhooks should prevent +most conditions from ever occurring, but there some exceptions and corner cases. +Conditions generally require human intervention to resolve, except as described +below. + +Namespace conditions are reported as part of the status of the +`HierarchicalConfiguration` object in each namespace and are exposed via the +`hnc/namespace_conditions` metric. Cluster conditions are reported as part of +the status of the `HNCConfiguration` cluster-wide object; cluster conditions can +either be caused by problems with the cluster-wide configuration, and are also +used to summarize the _namespace_ conditions across the cluster. + +HNC conditions follow a subset of the [standard Kubernetes condition +schema](https://pkg.go.dev/k8s.io/apimachinery/pkg/apis/meta/v1#Condition) with +the following fields: + +* **Type:** one of `ActivitiesHalted` or `BadConfiguration`. The former + indicates that there's a serious problem that prevents normal HNC operations + (see more details below), the latter informs cluster admins of a bad set of + configuration. +* **Reason:** a machine-readable code such as `InCycle` or `ParentMissing` that + explains why the condition is present. +* **Message:** a human-readable message with more information. + +Other standard condition fields, such as `LastTransitionTime` and `Status`, are +unused. + +> _Note: HNC v0.5 used a non-standard condition schema with only one +> machine-readable code. All codes that started with the `Crit` prefix +> correspond to an `ActivitiesHalted` code in HNC v0.6._ + +Namespaces with an `ActivitiesHalted` condition have the following properties: * Object propagation is disabled. That is, new objects will not be copied in, and obsolete objects will not be removed. @@ -422,9 +439,21 @@ properties: When the condition is resolved, object propagation resumes. When the HNC restarts, there can be a short period during which spurious -critical conditions may appear on namespaces as HNC restores its internal view -of the cluster’s hierarchy. These are harmless and generally resolve themselves -within 10-30 seconds for reasonably sized hierarchies. - -In all other cases, conditions require human intervention to resolve. +conditions may appear on namespaces as HNC restores its internal view of the +cluster’s hierarchy. These are harmless and generally resolve themselves within +10-30 seconds for reasonably sized hierarchies. In all other cases, conditions +require human intervention to resolve. + +In addition to problems with the namespaces themselves, HNC may encounter +problems propagating (copying) objects out of source namespaces, or copying them +into destination namespace. In such cases, HNC will generate a standard +[`Event`](https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#event-v1-core) +for that object, with the `.source.component` field set to `hnc.x-k8s.io`. You +can either query such objects directly, or via `kubectl hns describe NAMESPACE`. +The event will include machine-readable and human-readable information about the +problem, and will generally require human intervention to resolve. + +> _Note: HNC v0.5 reported issues with objects as part of the non-standard +> condition schema. These have been removed and replaced by standard Events in +> HNC v0.6 since events are more standard, scalable and loggable. diff --git a/incubator/hnc/docs/user-guide/how-to.md b/incubator/hnc/docs/user-guide/how-to.md index 0bc138f9e..8f80a428e 100644 --- a/incubator/hnc/docs/user-guide/how-to.md +++ b/incubator/hnc/docs/user-guide/how-to.md @@ -18,7 +18,7 @@ This document describes common tasks you might want to accomplish using HNC. * [Uninstall HNC from a cluster](#admin-uninstall) * [Backing up and restoring HNC data](#admin-backup-restore) * [Administer who has access to HNC properties](#admin-access) - * [Modify the object types propagated by HNC](#admin-types) + * [Modify the resources propagated by HNC](#admin-resources) * [Gather metrics](#admin-metrics) @@ -118,9 +118,34 @@ status: {} ### Inspect namespace hierarchies -This section is under construction (as of May 2020). TL;DR: `kubectl hns tree ` and `kubectl hns describe `. +To get an overview of the hierarchy of your entire cluster, use one of the +following variants of the `tree` command: -TODO: explain conditions (eg get HNC to try to propagate a `cluster-admin` rolebinding). +```bash +kubectl hns tree --all-namespaces +kubectl hns tree -A +``` + +You can also limit this display to a single subtree via: + +```bash +kubectl hns tree ROOT_NAMESPACE +``` + +In addition to showing you the structure of your hierarchy, it will also give +you high-level information on any problems with the hierarchies, known as +[conditions](concepts.md#admin-conditions). + +For detailed information on any on namespace, including: +* Its children +* Its conditions +* Any HNC problems with objects in the namespace + +Use the more detailed `describe` command: + +```bash +kubectl hns describe NAMESPACE +``` @@ -137,9 +162,9 @@ because this would result in the objects in the descendants being silently overwritten. HNC will also prevent you from changing the parent of a namespace if this would result in objects being overwritten. -**WARNING: this guard against creating ancestor objects was only introduced in -HNC v0.5.3. Earlier versions of HNC have inconsistent behaviour; see #1076 for -details.** +> **WARNING:** this guard against creating ancestor objects was only introduced in +> HNC v0.5.3. Earlier versions of HNC have inconsistent behaviour; see #1076 for +> details. However, if you bypass these admission controllers - for example, by updating objects while HNC is being upgraded - HNC _will_ overwrite conflicting objects @@ -148,7 +173,7 @@ create a policy in an ancestor namespace, you can be confident that it will be uniformly applied to all descendant namespaces. HNC can also propagate objects other than RBAC objects, but only cluster -administrators can modify this. See [here](#admin-types) for instructions. +administrators can modify this. See [here](#admin-resources) for instructions. Occasionally, objects might fail to be propagated to descendant namespaces for a variety of reasons - e.g., HNC itself might not have sufficient RBAC @@ -174,12 +199,6 @@ In order to delete a subnamespace, you must first have permissions to delete its anchor in its parent namespace. Ask your cluster administrator to give you this permission if you do not have it. -**WARNING: the protections described in this section only work on clusters with -Kubernetes 1.15 and higher installed. See [issue -#688](https://github.com/kubernetes-sigs/multi-tenancy/issues/688) for details. -In Kubernetes 1.14 and earlier, HNC is unable to stop you from deleting -namespaces.** - Subnamespaces are _always_ manipulated via their anchors. For example, you cannot delete a subnamespace by deleting it directly: @@ -218,13 +237,13 @@ implicitly deleted, or any of their ancestors. The `allowCascadingDeletion` field is a bit like `rm -rf` in a Linux shell. -> **WARNING: this option is very dangerous, so you should only set it on the lowest -possible level of the hierarchy.** +> **WARNING:** this option is very dangerous, so you should only set it on the +> lowest possible level of the hierarchy. -> **WARNING: any subnamespaces of the namespace you are deleting will also be -deleted, and so will any subnamespaces of those namespaces, and so on. However, -any _full_ namespaces that are descendants of a subnamespace will not be -deleted.** +> **WARNING:** any subnamespaces of the namespace you are deleting will also be +> deleted, and so will any subnamespaces of those namespaces, and so on. +> However, any _full_ namespaces that are descendants of a subnamespace will not +> be deleted. > _Note: In HNC v0.5.x and earlier, HNC uses v1alpha1 API and this field is > called `allowCascadingDelete`._ @@ -348,7 +367,6 @@ export HNC_IMG_TAG=test-img # # NB: in HNC v0.5, you need `controller-gen` installed via kubebuilder.io for # this to work; this is not required in HNC v0.6. - make deploy ``` @@ -365,10 +383,10 @@ kubectl delete validatingwebhookconfiguration.admissionregistration.k8s.io hnc-v You may also completely delete HNC, including its CRDs and namespaces. However, **this is a destructive process that results in some data loss.** In particular, -you will lose any cluster-wide configuration in your `HNCConfig` object, as well -as any hierarchical relationships between different namespaces, _excluding_ -subnamespaces (subnamespace relationships are saved as annotations on the -namespaces themselves, and so can be recreated when HNC is reinstalled). +you will lose any cluster-wide configuration in your `HNCConfiguration` object, +as well as any hierarchical relationships between different namespaces, +_excluding_ subnamespaces (subnamespace relationships are saved as annotations +on the namespaces themselves, and so can be recreated when HNC is reinstalled). To avoid data loss, consider [backing up](#admin-backup-restore) your HNC objects so they can later be restored. @@ -439,11 +457,12 @@ recreate recreate the anchors manually by typing `kubectl hns create -n HNC has three significant objects whose access administrators should carefully control: -* The `HNCConfig` object. This is a single non-namespaced object (called `config`) - that defines the behaviour of the entire cluster. It should only be modifiable - by cluster administrators. In addition, since it may contain information about - any namespace in the cluster, it should only be readable by users trusted with - this information. +* The `HNCConfiguration` object. This is a single non-namespaced object (named + `config`) that defines the behaviour of the entire cluster. It should only be + modifiable by cluster administrators. In addition, since it may contain + information about any namespace in the cluster, it should only be readable by + users trusted with this information. This object is automatically created by + HNC when it's installed. * The `HierarchyConfiguration` objects. There’s either zero or one of these in each namespace, with the name `hierarchy` if it exists. Any user with `update` access to this object is known as an [administrator](concepts.md#admin) of @@ -470,50 +489,63 @@ would require them to set the `allowCascadingDeletion` property of the child namespace. + ### Modify the object types propagated by HNC -Starting from HNC v0.6, HNC supports the following propagation modes for each -resource: -* `Propagate`: propagates objects from ancestors to descendants and deletes - obsolete descendants. -* `Remove`: deletes all existing propagated copies. -* `Ignore`: stops modifying this resource. New or changed objects will not be - propagated, and obsolete objects will not be deleted. The `inherited-from` - label is not removed. Any unknown mode is treated as `Ignore`. +HNC is configured via the [`HNCConfiguration`](#admin-access) object. You can +inspect this object directly via `kubectl get -oyaml hncconfiguration config`, +or with the HNS plugin via `kubectl hns config describe`. + +The most important type of configuration is the way each object type +("resource") is synchronized across namespace hierarchies. This is known as the +"synchronization mode," and has the following options: + +* **Propagate:** propagates objects from ancestors to descendants and deletes + obsolete descendants. This is the default if a resource is listed in the + config but no mode is explicitly set. +* **Remove:** deletes all existing propagated copies, but does not touch source + objects. +* **Ignore:** stops modifying this resource. New or changed objects will not be + propagated, and obsolete objects will not be deleted. The + `hnc.x-k8s.io/inherited-from` label is not removed. Any unknown mode is + treated as `Ignore`. This is the default if a resource is not listed at all in + the config, except for RBAC roles and role bindings (see below). HNC enforces `roles` and `rolebindings` RBAC resources to have `Propagate` mode. Thus they are omitted in the `HNCConfiguration` spec and only show up in the status. You can also set any Kubernetes resource to any of the propagation modes -discussed above. To do so, you need cluster privileges. +discussed above. To do so, you need permission to update the `HNCConfiguration` +object. -Note: Before HNC v0.6, the propagation modes were in lower case (`propagate`, -`remove`, `ignore`). The modes were set on types by `apiVersion` and `kind` instead -of `group` and `resource`. The `Role` and `RoleBinding` RBAC kinds were also -enforced but they were still left in the `HNCConfiguration` spec. +> _Note: Before HNC v0.6, the propagation modes were in lower case (`propagate`, +> `remove`, `ignore`). The modes were set on types by `apiVersion` and `kind` +> instead of `group` and `resource`. The `Role` and `RoleBinding` RBAC kinds +> were also enforced but they were still left in the `HNCConfiguration` spec._ -**WARNING: If you start propagating a new object type, HNC _cannot_ check -whether there are conflicting objects in descendant namespaces, and will -overwrite them. This will be fixed in HNC v0.6 (see #1102).** +You can view the current set of resources being propagated, along with +statistics, by saying `kubectl hns config describe`, or alternatively `kubectl +get -oyaml hncconfiguration config`. This object is automatically created for +you when HNC is first installed. -To configure an object type using the kubectl plugin: +To configure an object resource using the kubectl plugin: ``` -# Starting from HNC v0.6: +# HNC v0.6: # "--group" can be omitted if the resource is a core K8s resource kubectl hns config set-resource [resource] --group [group] --mode [Propagate|Remove|Ignore] -# Before HNC v0.6: +# HNC v0.5: kubectl hns config set-type --apiVersion [apiVersion] --kind [kind] [propagate|remove|ignore] ``` For example: ``` -# Starting from HNC v0.6: +# HNC v0.6: kubectl hns config set-resource secrets --mode Propagate -# Before HNC v0.6: +# HNC v0.5: kubectl hns config set-type --apiVersion v1 --kind Secret propagate ``` @@ -521,30 +553,25 @@ To verify that this worked: ``` kubectl hns config describe -# Output starting from HNC v0.6: + +# Output from HNC v0.6: Synchronized types: * Propagating: roles (rbac.authorization.k8s.io/v1) * Propagating: rolebindings (rbac.authorization.k8s.io/v1) * Propagating: secrets (v1) # <<<< This should be added -# Output before HNC v0.6: +# Output from HNC v0.5: Synchronized types: * Propagating: Role (rbac.authorization.k8s.io/v1) * Propagating: RoleBinding (rbac.authorization.k8s.io/v1) * Propagating: Secret (v1) # <<<< This should be added ``` -To configure an object type without using the kubectl plugin, edit the existing -`HNCConfiguration` object (HNC will autocreate it for you when it’s installed): +You can also modify the config directly to include custom configurations via +`kubectl edit hncconfiguration config`: -``` -kubectl edit hncconfiguration config -``` - -Modify the config to include custom configurations: - -``` -# Starting from HNC v0.6: +```yaml +# HNC v0.6: apiVersion: hnc.x-k8s.io/v1alpha2 kind: HNCConfiguration metadata: @@ -555,8 +582,8 @@ spec: ... - resource: secrets <<< This should be added mode: Propagate <<< - -# Before HNC v0.6: + +# HNC v0.5: apiVersion: hnc.x-k8s.io/v1alpha1 kind: HNCConfiguration metadata: @@ -570,6 +597,27 @@ spec: mode: propagate <<< ``` +Adding a new resource in the `Propagate` mode is potentially dangerous, since +there could be existing objects of that resource type that would be overwritten +by objects of the same name from ancestor namespaces. In HNC v0.5, it is up to +the cluster administrator to avoid making any mistakes, but in HNC v0.6, the HNS +plugin will not allow you to add a new resource in the `Propagate` mode. +Instead, to do so safely: + +* Add the new resource in the `Remove` mode. This will remove any propagated + copies (of which there should be none) but will force HNC to start + synchronizing all known source objects. +* Wait until `kubectl hns config describe` looks like it's identified the + correct number of objects of the newly added resource in its status. +* Change the propagation mode from `Remove` to `Propagate`. HNC will then check + to see if any objects will be overwritten, and will not allow you to change + the propagation mode until all such conflicts are resolved. + +Alternatively, if you're certain you want to start propagating objects +immediately, you can use the `--force` flag with `kubectl hns config +set-resource` to add a resource directly in the `Propagate` mode. You can also +edit the `config` object directly, which will bypass this protection. + ### Gather metrics @@ -583,7 +631,7 @@ metrics to ensure that HNC stays healthy. |Metric |Description | |:---------------------------------------------------- |:-------------| -| `hnc/namespace_conditions` | The number of namespaces affected by [conditions](concepts.md#admin-conditions), tagged by the condition code and whether or not the conditions are critical or not | +| `hnc/namespace_conditions` | The number of namespaces affected by [conditions](concepts.md#admin-conditions), tagged with information about the condition | | `hnc/reconcilers/hierconfig/total` | The total number of HierarchyConfiguration (HC) reconciliations happened | | `hnc/reconcilers/hierconfig/concurrent_peak` | The peak concurrent HC reconciliations happened in the past 60s, which is also the minimum Stackdriver reporting period and the one we're using | | `hnc/reconcilers/hierconfig/hierconfig_writes_total` | The number of HC writes happened during HC reconciliations | @@ -603,38 +651,45 @@ Explorer](https://cloud.google.com/monitoring/charts/metrics-explorer) by searching the metrics keywords (e.g. `namespace_conditions`). In order to monitor metrics via Stackdriver: +1. Save your some key information as environment variables. You may adjust these + values to suit your needs; there's nothing magical about them. + ```bash + GSA_NAME=hnc-metric-writer + PROJECT_ID=my-gcp-project + + ``` 1. Enable Workload Identity (WI) on either a [new](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#enable_workload_identity_on_a_new_cluster) or [existing](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#enable_workload_identity_on_an_existing_cluster) cluster. -2. Install HNC as described [above](#admin-install). -3. [Create a Google service account (GSA)](https://cloud.google.com/docs/authentication/production#creating_a_service_account): +1. Install HNC as described [above](#admin-install). +1. [Create a Google service account (GSA)](https://cloud.google.com/docs/authentication/production#creating_a_service_account): ```bash - gcloud iam service-accounts create [GSA_NAME] + gcloud iam service-accounts create ${GSA_NAME} ``` -4. Grant “[Monitoring Metric Writer](https://cloud.google.com/monitoring/access-control#mon_roles_desc)” +1. Grant “[Monitoring Metric Writer](https://cloud.google.com/monitoring/access-control#mon_roles_desc)” role to the GSA: ```bash - gcloud projects add-iam-policy-binding [PROJECT_ID] --member \ - "serviceAccount:[GSA_NAME]@[PROJECT_ID].iam.gserviceaccount.com" \ + gcloud projects add-iam-policy-binding ${PROJECT_ID} --member \ + "serviceAccount:${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \ --role "roles/monitoring.metricWriter" ``` -5. Create an [Cloud IAM policy binding](https://cloud.google.com/sdk/gcloud/reference/iam/service-accounts/add-iam-policy-binding) +1. Create an [Cloud IAM policy binding](https://cloud.google.com/sdk/gcloud/reference/iam/service-accounts/add-iam-policy-binding) between `hnc-system/default` KSA and the newly created GSA: ``` gcloud iam service-accounts add-iam-policy-binding \ --role roles/iam.workloadIdentityUser \ - --member "serviceAccount:[PROJECT_ID].svc.id.goog[hnc-system/default]" \ - [GSA_NAME]@[PROJECT_ID].iam.gserviceaccount.com + --member "serviceAccount:${PROJECT_ID}.svc.id.goog[hnc-system/default]" \ + ${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com ``` -6. Add the `iam.gke.io/gcp-service-account=[GSA_NAME]@[PROJECT_ID]` annotation to +1. Add the `iam.gke.io/gcp-service-account=${GSA_NAME}@${PROJECT_ID}` annotation to the KSA, using the email address of the Google service account: ``` kubectl annotate serviceaccount \ --namespace hnc-system \ default \ - iam.gke.io/gcp-service-account=[GSA_NAME]@[PROJECT_ID].iam.gserviceaccount.com + iam.gke.io/gcp-service-account=${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com ``` If everything is working properly, you should start to see metrics in the