Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: lint all md files, add docs #45

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ IMG ?= "datainfrahq/druid-operator"
# Local Image URL to be pushed to kind registery
IMG_KIND ?= "localhost:5001/druid-operator"
# NAMESPACE for druid operator e2e
NAMESPACE_DRUID_OPERATOR ?= "druid-operator"
NAMESPACE_DRUID_OPERATOR ?= "druid-operator-system"
# NAMESPACE for zk operator e2e
NAMESPACE_ZK_OPERATOR ?= "zk-operator"
# NAMESPACE for zk operator e2e
Expand Down
26 changes: 14 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,22 @@
Kubernetes Operator For Apache Druid
</h2>

**This is the official [druid-operator](https://github.com/druid-io/druid-operator) project, now maintained by [Maintainers.md](./MAINTAINERS.md).
**This is the official [druid-operator](https://github.com/druid-io/druid-operator) project, now maintained by [Maintainers.md](./MAINTAINERS.md).
[druid-operator](https://github.com/druid-io/druid-operator) is depreacted. Ref to [issue](https://github.com/druid-io/druid-operator/issues/329) and [PR](https://github.com/druid-io/druid-operator/pull/336). Feel free to open issues and PRs! Collaborators are welcome !**

<div align="center">

![Build Status](https://github.com/datainfrahq/druid-operator/actions/workflows/docker-image.yml/badge.svg) ![Docker pull](https://img.shields.io/docker/pulls/datainfrahq/druid-operator.svg) [![Latest Version](https://img.shields.io/github/tag/datainfrahq/druid-operator)](https://github.com/datainfrahq/druid-operator/releases) [![Slack](https://img.shields.io/badge/slack-brightgreen.svg?logo=slack&label=Community&style=flat&color=%2373DC8C&)](https://kubernetes.slack.com/archives/C04F4M6HT2L)

</div>

</div>

Druid Operator provisions and manages [Apache Druid](https://druid.apache.org/) cluster on kubernetes. Druid Operator is designed to provision and manage [Apache Druid](https://druid.apache.org/) in distributed mode only. It is built using the [kubebuilder](https://github.com/kubernetes-sigs/kubebuilder). Language used is GoLang. Druid Operator is available on [operatorhub.io](https://operatorhub.io/operator/druid-operator) Refer to [Documentation](./docs/README.md) for getting started. Join Kubernetes slack and join [druid-operator](https://kubernetes.slack.com/archives/C04F4M6HT2L)
Druid Operator provisions and manages [Apache Druid](https://druid.apache.org/) cluster on kubernetes.
Druid Operator is designed to provision and manage [Apache Druid](https://druid.apache.org/) in distributed mode only.
It is built in Golang using [kubebuilder](https://github.com/kubernetes-sigs/kubebuilder).
Druid Operator is available on [operatorhub.io](https://operatorhub.io/operator/druid-operator)
Refer to [Documentation](./docs/README.md) for getting started.

Feel free to join Kubernetes slack and join [druid-operator](https://kubernetes.slack.com/archives/C04F4M6HT2L)

### Talks and Blogs on Druid Operator

Expand All @@ -37,14 +42,11 @@

### Notifications

- The project moved to <b>Kubebuilder v3</b> which requires a [manual change](docs/kubebuilder_v3_migration.md) in the operator.
- Users may experience HPA issues with druid-operator with release 0.0.5, as described in the [issue](https://github.com/druid-io/druid-operator/issues/160).
- The latest release 0.0.6 has fixes for the above issue.
- The operator has moved from HPA apiVersion autoscaling/v2beta1 to autoscaling/v2 API users will need to update there HPA Specs according v2beta2 api in order to work with the latest druid-operator release.
- Users may experience pvc deletion [issue](https://github.com/druid-io/druid-operator/issues/186) in release 0.0.6, this issue has been fixed in patch release 0.0.6.1.
- The project moved to <b>Kubebuilder v3</b> which requires a [manual change](docs/kubebuilder_v3_migration.md) in the operator.
- Users are encourage to use operator version 0.0.9+.
- The operator has moved from HPA apiVersion autoscaling/v2beta1 to autoscaling/v2 API users will need to update there HPA Specs according v2 api in order to work with the latest druid-operator release.
- druid-operator has moved Ingress apiVersion networking/v1beta1 to networking/v1. Users will need to update there Ingress Spec in the druid CR according networking/v1 syntax. In case users are using schema validated CRD, the CRD will also be needed to be updated.
- druid-operator has moved PodDisruptionBudget apiVersion policy/v1beta1 to policy/v1. Users will need to update there Kubernetes versions to 1.21+ to use druid-operator tag 0.0.9+.
- The latest release for druid-operator is v1.0.0, this release is compatible with k8s version 1.25. HPA API is kept to version v2beta2.
- The v1.0.0 release for druid-operator is compatible with k8s version 1.25. HPA API is kept to version v2beta2.

### Kubernetes version compatibility

Expand All @@ -57,7 +59,7 @@

### Contributors

<a href="https://github.com/datainfrahq/druid-operator/graphs/contributors"><img src="https://contrib.rocks/image?repo=datainfrahq/druid-operator" /></a>
<a href="https://github.com/datainfrahq/druid-operator/graphs/contributors"><img src="https://contrib.rocks/image?repo=datainfrahq/druid-operator"/></a>

### Note
Apache®, [Apache Druid, Druid®](https://druid.apache.org/) are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. This project, druid-operator, is not an Apache Software Foundation project.
7 changes: 5 additions & 2 deletions chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@

env:
DENY_LIST: "default,kube-system" # Comma-separated list of namespaces to ignore
RECONCILE_WAIT: "10s" # Reconciliation delay
WATCH_NAMESPACE: "" # Namespace to watch or empty string to watch all namespaces, To watch multiple namespaces add , into string. Ex: WATCH_NAMESPACE: "ns1,ns2,ns3"
RECONCILE_WAIT: "10s" # Reconciliation delay
WATCH_NAMESPACE: "" # Namespace to watch or empty string to watch all namespaces, To watch multiple namespaces add , into string. Ex: WATCH_NAMESPACE: "ns1,ns2,ns3"
#MAX_CONCURRENT_RECONCILES:: "" # MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run.

replicaCount: 1
Expand Down Expand Up @@ -46,6 +46,9 @@ podAnnotations: {}

podSecurityContext:
runAsNonRoot: true
fsGroup: 65532
runAsUser: 65532
runAsGroup: 65532

Comment on lines 48 to 52
Copy link
Contributor

@AdheipSingh AdheipSingh Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this ? can we know the exact reason

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was to match the user / group defined in the Dockerfile of the Operator here: https://github.com/cyril-corbon/druid-operator/blob/master/Dockerfile#L31
I also note this :
set user / group / fsuser or the operator crash in 1.26
but I cannot reproduce it anymore

Copy link
Collaborator Author

@cyril-corbon cyril-corbon Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO It's a good idea to set theses values by default.
both containers of the pod have the same uid/gid so it should not be an issue to set this at pod level.

docker history gcr.io/kubebuilder/kube-rbac-proxy:v0.13.1
IMAGE          CREATED        CREATED BY                                      SIZE      COMMENT
60cb20b77a0f   9 months ago   /bin/sh -c #(nop)  ENTRYPOINT ["/usr/local/b…   0B
<missing>      9 months ago   /bin/sh -c #(nop)  USER 65532:65532             0B

I can remove it if you want or if you think it's not useful 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense.
its only related to kube rbac proxy container right ? not the druid-operator ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's related to both containers. its not an issue as they share the same uid / gid

securityContext:
allowPrivilegeEscalation: false
Expand Down
4 changes: 2 additions & 2 deletions docs/dev_doc.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
## Dev Dependencies

- Golang 1.19+
- Kubebuilder 2.3.1+
- Golang 1.20+
- Kubebuilder v3

## Running Operator Locally

Expand Down
6 changes: 3 additions & 3 deletions docs/druid_cr.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
- For full details on spec refer to ```pkg/apis/druid/v1alpha1/druid_types.go```
- The operator supports both deployments and statefulsets for druid Nodes. ```kind``` can be specified in the druid NodeSpec's to ```Deployment``` / ```StatefulSet```.
- ```NOTE: The default behavior shall provision all the nodes as statefulsets.```

- The following are cluster scoped and common to all the druid nodes.

```yaml
Expand Down Expand Up @@ -46,13 +45,13 @@ spec:
common.runtime.properties: |
```

- The following are specific to a node.
- The following are specific to a node.

```yaml
nodes:
# String value, can be anything to define a node name.
brokers:
# nodeType can be broker,historical, middleManager, indexer, router, coordinator and overlord.
# nodeType can be broker, historical, middleManager, indexer, router, coordinator and overlord.
# Required Key
nodeType: "broker"
# Optionally specify for broker nodes
Expand All @@ -67,4 +66,5 @@ spec:
# Runtime Properties for the node
# Required Key
runtime.properties: |
...
```
34 changes: 24 additions & 10 deletions docs/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,58 +14,72 @@


## Deny List in Operator

- There may be use cases where we want the operator to watch all namespaces but restrict few namespaces, due to security, testing flexibility etc reasons.
- The druid operator supports such cases. In ```deploy/operator.yaml```, user can enable ```DENY_LIST``` env and pass the namespaces to be excluded.
- Each namespace to be seperated using a comma.

## Reconcile Time in Operator

- As per operator pattern, the druid operator reconciles every 10s ( default reconcile time ) to make sure the desired state ( druid CR ) in sync with current state.
- In case user wants to adjust the reconcile time, it can be adjusted by adding an ENV variable in ```deploy/operator.yaml```, user can enable ```RECONCILE_WAIT``` env and pass in the value suffixed with ```s``` string ( example: 30s). The default time is 10s.

## Finalizer in Druid CR

- Druid Operator supports provisioning of sts as well as deployments. When sts is created a pvc is created along. When druid CR is deleted the sts controller does not delete pvc's associated with sts.
- In case user does care about pvc data and wishes to reclaim it, user can enable ```DisablePVCDeletionFinalizer: true``` in druid CR.
- In case user does care about pvc data and wishes to reclaim it, user can enable ```DisablePVCDeletionFinalizer: true``` in druid CR.
- Default behavior shall trigger finalizers and pre-delete hooks that shall be executed which shall first clean up sts and then pvc referenced by sts.
- Default behavior is set to true ie after deletion of CR, any pvc's provisioned by sts shall be deleted.

## Deletetion of Orphan PVC's
- Assume ingestion is kicked off on druid, the sts MiddleManagers nodes are scaled to a certain number of replicas, and when the ingestion is completed. The middlemanagers are scaled down to avoid costs etc.

- Assume ingestion is kicked off on druid, the sts MiddleManagers nodes are scaled to a certain number of replicas, and when the ingestion is completed. The middlemanagers are scaled down to avoid costs etc.
- Sts on scale down, just terminates the pods it owns not the PVC. PVC are left orpahned and are of little or no use.
- In such cases druid-operator supports deletion of pvc orphaned by the sts.
- In such cases druid-operator supports deletion of pvc orphaned by the sts.
- To enable this feature users need to add a flag in the druid cluster spec ```deleteOrphanPvc: true```.

## Rolling Deploy

- Operator supports ```rollingDeploy```, in case specified to ```true``` at the clusterSpec, the operator does incremental updates in the order as mentioned [here](http://druid.io/docs/latest/operations/rolling-updates.html)
- In rollingDeploy each node is update one by one, and incase any of the node goes in pending/crashing state during update the operator halts the update and does not update the other nodes. This requires manual intervation.
- Default updates and cluster creation is in parallel.
- Default updates and cluster creation is in parallel.
- Regardless of rolling deploy enabled, cluster creation always happens in parallel.

## Force Delete of Sts Pods

- During upgrade if sts is set to ordered ready, the sts controller will not recover from crashloopback state. The issues is referenced [here](https://github.com/kubernetes/kubernetes/issues/67250), and here's a reference [doc](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#forced-rollback)
- How operator solves this is using the ```forceDeleteStsPodOnError``` key, the operator will delete the sts pod if its in crashloopback state. Example Scenario: During upgrade, user rolls out a faulty configuration causing the historical pod going in crashing state, user rolls out a valid configuration, the new configuration will not be applied unless user manual delete pods, so solve this scenario operator shall delete the pod automatically without user intervention.
- How operator solves this is using the ```forceDeleteStsPodOnError``` key, the operator will delete the sts pod if its in crashloopback state. Example Scenario: During upgrade, user rolls out a faulty configuration causing the historical pod going in crashing state, user rolls out a valid configuration, the new configuration will not be applied unless user manual delete pods, so solve this scenario operator shall delete the pod automatically without user intervention.
- ```NOTE: User must be aware of this feature, there might be cases where crashloopback might be caused due probe failure, fault image etc, the operator shall keep on deleting on each re-concile loop. Default Behavior is True ```

## Scaling of Druid Nodes

- Operator supports ```HPA autosaling/v2``` Spec in the nodeSpec for druid nodes. In case HPA deployed, HPA controller maintains the replica count/state for the particular statefulset referenced. Refer to ```examples.md``` for HPA configuration.
- ```NOTE: Prefered to scale only brokers using HPA.```
- In order to scale MM with HPA, its recommended not to use HPA. Refer to these discussions which have adderessed the issues in details.
1. https://github.com/apache/druid/issues/8801#issuecomment-664020630
2. https://github.com/apache/druid/issues/8801#issuecomment-664648399

1. <https://github.com/apache/druid/issues/8801#issuecomment-664020630>
2. <https://github.com/apache/druid/issues/8801#issuecomment-664648399>

## Volume Expansion of Druid Nodes Running As StatefulSets

```NOTE: This feature has been tested only on cloud environments and storage classes which have supported volume expansion. This feature uses cascade=orphan strategy to make sure only Stateful is deleted and recreated and pods are not deleted.```

- Druid Nodes specifically historicals run as statefulsets. Each statefulset replica has a pvc attached.
- NodeSpec in druid CR has key ```volumeClaimTemplates``` where users can define the pvc's storage class as well as size.
- In case a user wants to increase size in the node, the statefulsets cannot be directly updated.
- Druid Operator behind the scenes performs seamless update of the statefulset, plus patch the pvc's with desired size defined in the druid CR.
- Druid operator shall perform a cascade deletion of the sts, and shall patch the pvc. Cascade deletion has no affect to the pods running, queries are served and no downtime is experienced.
- Druid operator shall perform a cascade deletion of the sts, and shall patch the pvc. Cascade deletion has no affect to the pods running, queries are served and no downtime is experienced.
- While enabling this feature, druid operator will check if volume expansion is supported in the storage class mentioned in the druid CR, only then will it perform expansion.
- Shrinkage of pvc's isnt supported, desiredSize cannot be less than currentSize as well as counts.
- Shrinkage of pvc's isnt supported, **desiredSize cannot be less than currentSize as well as counts**.
- To enable this feature ```scalePvcSts``` needs to be enabled to ```true```.
- By default, this feature is disabled.

## Add Additional Containers in Druid Nodes

- The Druid operator supports additional containers to run along with the druid services. This helps support co-located, co-managed helper processes for the primary druid application
- This can be used for init containers or sidecars or proxies etc.
- To enable this features users just need to add a new container to the container list.
- This is scoped at cluster scope only, which means that additional container will be common to all the nodes.
- This can be used for init containers or sidecars or proxies etc.
- To enable this features users just need to add a new container to the container list
- This is scoped at cluster scope only, which means that additional container will be common to all the nodes
Expand Down Expand Up @@ -181,4 +195,4 @@ All the probes definitions are documented bellow:
timeoutSeconds: 10
```

</details>
</details>
22 changes: 16 additions & 6 deletions docs/getting_started.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,27 @@
## Install the operator

```bash
# This will deploy kind to test the stack locally
make kind
# This will deploy the operator into the druid-operator-system namespace
make deploy
# Check the deployed druid-operator
# Check the deployed druid-operator-system
kubectl describe deployment -n druid-operator-system druid-operator-controller-manager
```

Operator can be deployed with namespaced scope or clutser scope. By default, the operator is namespaced scope.
For the operator to be cluster scope, do the following changes:

- Edit the `config/default/manager_config_patch.yaml` so the `patchesStrategicMerge:` will look like this:

```yaml
patchesStrategicMerge:
- manager_auth_proxy_patch.yaml
- manager_config_patch.yaml
```

- Edit the `config/default/manager_config_patch.yaml` to look like this:

```yaml
apiVersion: apps/v1
kind: Deployment
Expand All @@ -33,28 +39,33 @@ spec:
```

## Install the operator using Helm chart

- Install cluster scope operator into the `druid-operator-system` namespace:

```bash
# Install Druid operator using Helm
helm -n druid-operator-system install --create-namespace cluster-druid-operator ./chart
helm -n druid-operator-system upgrade -i --create-namespace cluster-druid-operator ./chart

# ... or generate manifest.yaml to install using other means:
helm -n druid-operator-system template --create-namespace cluster-druid-operator ./chart > manifest.yaml
```

- Install namespaced operator into the `druid-operator-system` namespace:

```bash
# Install Druid operator using Helm
helm -n druid-operator-system install --create-namespace --set env.WATCH_NAMESPACE="mynamespace" namespaced-druid-operator ./chart
kubectl create ns mynamespace
helm -n druid-operator-system upgrade -i --create-namespace --set env.WATCH_NAMESPACE="mynamespace" namespaced-druid-operator ./chart

# you can use myvalues.yaml instead of --set
helm -n druid-operator-system install --create-namespace -f myvalues.yaml namespaced-druid-operator ./chart
helm -n druid-operator-system upgrade -i --create-namespace -f myvalues.yaml namespaced-druid-operator ./chart

# ... or generate manifest.yaml to install using other means:
helm -n druid-operator-system template --set env.WATCH_NAMESPACE="" namespaced-druid-operator ./chart --create-namespace > manifest.yaml
```

- Update settings, upgrade or rollback:

```bash
# To upgrade chart or apply changes in myvalues.yaml
helm -n druid-operator-system upgrade -f myvalues.yaml namespaced-druid-operator ./chart
Expand All @@ -64,6 +75,7 @@ helm -n druid-operator-system rollback cluster-druid-operator
```

- Uninstall operator

```bash
# To avoid destroying existing clusters, helm will not uninstall its CRD. For
# complete cleanup annotation needs to be removed first:
Expand All @@ -89,8 +101,6 @@ Note that above tiny-cluster only works on a single node kubernetes cluster(e.g.

## Debugging Problems

- For kubernetes version 1.11 make sure to disable ```type: object``` in the CRD root spec.

```bash
# get druid-operator pod name
kubectl get po | grep druid-operator
Expand Down
Loading