Deploying Netskope Cloud Exchange using Helm

The Netskope Cloud Exchange (CE) provides customers with powerful integration tools to leverage investments across their security posture.

Cloud Exchange consumes valuable Netskope telemetry and external threat intelligence and risk scores, enabling improved policy implementation, automated service ticket creation, and exportation of log events from the Netskope Security Cloud.

To learn more about Netskope Cloud Exchange please refer to the Netskope Cloud Exchange introduction page.

Prerequisites
Deploying the Netskope CE Helm Chart
Deleting the Netskope CE Helm Chart
Configurations
Override the Default Values
Updating Deployment
Using Persistent Volumes
Package Sizing Matrix (Horizontally-scaled Approach)
Deploy with Vertically-scaled Approach
Package Sizing Matrix (Vertically-scaled Approach)
Comparison of Vertical-scaling v/s Horizontal-scaling Compute Requirements)
Testing Matrix
Migrating CE v4.2.0 to CE v5.0.0
Restoring MongoDB Data
Troubleshooting
- RabbitMQ Split Brain (Network Partitions)

Prerequisites

The following prerequisites are required to deploy the Netskope Cloud Exchange using helm.

K8s cluster (EKS, OpenShift, etc.) is required to deploy Netskope CE on that.
kubectl must be installed on your machine.
helm must be installed on your machine.
Namespace should be created before we deploy the helm chart.
Persistent Volume (PV) provisioner support in the underlying infrastructure (Note: At least two PVs must be present with ReadWriteMany access mode and we only support shared volumes because in order to support horizontal scaling in CE v5.0.0, the core and worker pods require shared volumes).
Please refer to the section Package Sizing Matrix before proceeding deployment.

Deploying the Netskope CE Helm Chart

FYI: A Release is an instance of a chart running in a Kubernetes cluster. One chart can often be installed many times into the same cluster. And each time it is installed, a new release is created. The release name should contain lower-letters, numbers, and hyphens only.

Before installing the actual product helm chart, we have to deploy the Kubernetes operator for MongoDB and RabbitMQ.

Note: If we are deploying the helm chart on the Openshift at that time we will have to provide privileged access to some service accounts before the deploy helm chart. We have mentioned those service account names here mongodb-database, mongodb-kubernetes-operator, netskope-ce-rabbitmqcluster-server, rabbitmq-operator-rabbitmq-cluster-operator, rabbitmq-operator-rabbitmq-messaging-topology-operator and the service account that you are providing (if you are not providing the service account then provide the privileged access to this netskope-ce-serviceaccount that we are creating by default). Skip this step if you are not on the Openshift. To provide the privileged access to the above service accounts, run the below command.

oc adm policy add-scc-to-user privileged system:serviceaccount:<namespace-name>:<service-account-name>

To install MongoDB Community Operator:

helm repo add mongodb https://mongodb.github.io/helm-charts 
helm install community-operator mongodb/community-operator -n <namespace-name>

To install RabbitMQ Cluster Kubernetes Operator:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install rabbitmq-operator bitnami/rabbitmq-cluster-operator -n <namespace-name> --set msgTopologyOperator.replicaCount=0

To install the chart:

$ helm install <release-name> . -n <namespace-name>

For example, installing the chart with release name my-release:

helm install my-release . -n <namespace-name>

The above command deploys Netskope Cloud Exchange. The Configurations section lists the parameters that can be configured during installation.

Tip: List all releases using helm list

Deleting the Netskope CE Helm Chart

To uninstall/delete the deployment:

helm delete <release-name> -n <namespace-name>

For example, uninstalling the chart with release name my-release:

helm uninstall my-release -n <namespace-name>

The command removes all the Kubernetes components associated with the chart and deletes the release.

Configurations

Common Configurations (these configurations will be applied on all pods)

Name	Description	Default Value	Required
`commonLabels`	Add labels to all the deployed resources (sub-charts are not considered). Evaluated as a template	`{}`	No
`commonAnnotations`	Common annotations to add to all resources. Evaluated as a template	`{}`	No
`namespace`	Namespace name which all resources are running in	`""`	No
`serviceAccount.create`	Enable creation of ServiceAccount for all pods	`true`	No
`serviceAccount.name`	Name of the created serviceAccount	`""`	No
`serviceAccount.annotations`	Additional Service Account annotations	`{}`	No
`serviceAccount.automountServiceAccountToken`	Allows auto mount of ServiceAccountToken on the serviceAccount created	`false`	No
`privateImageRegistry.imagePullSecrets`	If your image registry is private, in that case, you have to pass imagePullSecrets, Secrets must be manually created in the namespace	`[]`	No
`updateStrategy`	Strategy to use to replace existing pods	type: RollingUpdate	No

MongoDB Configurations

Name	Description	Default Value	Required
`mongodb.labels`	Additional labels to be added to the MongoDB statefulset	`{}`	No
`mongodb.annotations`	Additional annotations to be added to the MongoDB statefulset	`{}`	No
`mongodb.image`	Docker image of MongoDB statefulset	`index.docker.io/mongo:5.0.21`	No
`mongodb.initContainers.volumePermissionContainer.create`	Creates init containers will use for change the mount volume permission and ownership	`false`	No
`mongodb.initContainers.image`	Init containers image	`busybox:latest`	No
`mongodb.resources`	Resources request and limit for MongoDB (Note: These are default configurations for a low data volume (Extra Small Netskope CE Package Type). The end user may want to change these values as per the underlying use case and data volume on their end (based on the associated Netskope CE Package Type). While doing that, please ensure that the underlying cluster nodes should also have a cumulative sufficient compute power for this change to work seamlessly. For more details on the Netskope CE Package Types, please refer to the Package Sizing Matrix section)	limits: memory: 1Gi cpu: 2000m requests: memory: 500Mi cpu: 1000m	No
`mongodb.replicaCount`	No. of replica of MongoDB	`3`	No
`mongodb.securityContext.privileged`	Privileged containers can allow almost completely unrestricted host access	`false`	No
`mongodb.securityContext.allowPrivilegeEscalation`	Enable privilege escalation, it should be true if privileged is set to true.	`false`	No
`mongodb.persistence.size`	PVC Storage Request for MongoDB data volume	`3Gi`	No
`mongodb.persistence.storageClassName`	PVC Storage Class for MongoDB data volume	`manual`	No
`mongodb.persistence.annotations`	PVC annotations	`{}`	No
`mongodb.auth.replicaSetKey`	Key used for authentication in the replicaset	`""`	Yes
`mongodb.auth.rootUser`	MongoDB root username	`""`	Yes
`mongodb.auth.rootPassword`	MongoDB root password	`""`	Yes
`mongodb.auth.cteAdminUser`	MongoDB cteAdmin User	`""`	Yes
`mongodb.auth.cteAdminPassword`	MongoDB cteAdminPassword password	`""`	Yes
`mongodb.secrets.root.create`	Enable to create MongoDB root secret	`true`	No
`mongodb.secrets.root.name`	Name of the MongoDB Root secret	`"netskope-ce-mongodb-root-secret"`	No
`mongodb.secrets.cte.create`	Enable to create MongoDB cte secret	`true`	No
`mongodb.secrets.cte.name`	Name of the MongoDB cte secret	`"netskope-ce-mongodb-cre-secret"`	No

RabbitMQ Configurations

Name	Description	Default Value	Required
`rabbitmq.labels`	Additional labels to be added to the RabbitMQ statefulset	`{}`	No
`rabbitmq.annotations`	Additional annotations to be added to the RabbitMQ statefulset	`{}`	No
`rabbitmq.initContainers.image`	docker image of init containers	`"busybox:latest"`	No
`rabbimq.initContainers.volumePermissionContainer.create`	Creates init containers will use for change the mount volume permission and ownership	`false`	No
`rabbitmq.image`	Docker image of RabbitMQ statefulset	`index.docker.io/rabbitmq:3.12.6-management`	No
`rabbitmq.replicaCount`	No. of replica of RabbitMQ	`3`	No
`rabbitmq.resources`	Resources request and limit for RabbitMQ (Note: These are default configurations for a low data volume (Extra Small Netskope CE Package Type). The end user may want to change these values as per the underlying use case and data volume on their end (based on the associated Netskope CE Package Type). While doing that, please ensure that the underlying cluster nodes should also have a cumulative sufficient compute power for this change to work seamlessly. For more details on the Netskope CE Package Types, please refer to the Package Sizing Matrix section)	limits: memory: 3Gi cpu: 1000m requests: memory: 1500Mi cpu: 500m	No
`rabbitmq.securityContext.privileged`	Privileged containers can allow almost completely unrestricted host access	`false`	No
`rabbitmq.securityContext.allowPrivilegeEscalation`	Enable privilege escalation, it should be true if privileged is set to true	`false`	No
`rabbitmq.persistence.size`	PVC Storage Request for RabbitMQ data volume	`3Gi`	No
`rabbitmq.persistence.storageClassName`	PVC Storage Class for Rabbitmq data volume	`manual`	No
`rabbitmq.persistence.annotations`	PVC annotations	`{}`	No
`rabbitmq.auth.rabbitmqDefaultUser`	RabbitMQ Default User	`""`	Yes
`rabbitmq.auth.rabbitmqPassword`	RabbitMQ password	`""`	Yes
`rabbitmq.secrets.create`	Enable to create Rabbitmq secret	`true`	No
`rabbitmq.secrets.name`	Name of the RabbitMQ secret	`"netskope-ce-rabbitmq-secret"`	No

Core Configurations

Name	Description	Default Value	Required
`core.labels`	Additional labels to be added to the Core deployment	`{}`	No
`core.annotations`	Additional annotations to be added to the Core deployment	`{}`	No
`core.initContainers.volumePermissionContainer.create`	Creates init containers will use for change the mount volume permission and ownership	`false`	No
`core.rbac.create`	Whether to create & use RBAC resources or not, binding ServiceAccount to a role	`true`	No
`core.rbac.rules`	Custom rules to create following the role specification	`[]`	No
`core.image`	Docker image of Core	`netskopetechnicalalliances/cloudexchange:core5-latest`	No
`core.replicaCount.core`	No. of replica count for Core	`1`	No
`core.replicaCount.worker`	No. of replica count for Worker	`2`	No
`core.proxy.enable`	To enable proxy in Core	`false`	No
`core.proxy.url`	Proxy URL	`""`	If `core.proxy.enable: true`
`core.resources.core`	Resources request and limit for Core (Note: These are default configurations for a low data volume (Extra Small Netskope CE Package Type). The end user may want to change these values as per the underlying use case and data volume on their end (based on the associated Netskope CE Package Type). While doing that, please ensure that the underlying cluster nodes should also have a cumulative sufficient compute power for this change to work seamlessly. For more details on the Netskope CE Package Types, please refer to the Package Sizing Matrix section)	limits: memory: 2Gi cpu: 1000m requests: memory: 1Gi cpu: 500m	No
`core.resources.worker`	Resources request and limit for Core (Note: These are default configurations for a low data volume (Extra Small Netskope CE Package Type). The end user may want to change these values as per the underlying use case and data volume on their end (based on the associated Netskope CE Package Type). While doing that, please ensure that the underlying cluster nodes should also have a cumulative sufficient compute power for this change to work seamlessly. For more details on the Netskope CE Package Types, please refer to the Package Sizing Matrix section)	limits: memory: 3500Mi cpu: 2000m requests: memory: 1500Mi cpu: 1000m	No
`core.securityContext.privileged`	Privileged containers can allow almost completely unrestricted host access	`false`	No
`core.securityContext.allowPrivilegeEscalation`	Enable privilege escalation, it should be true if privileged is set to true	`false`	No
`core.persistence.size`	PVC Storage Request for Core data volume	`3Gi`	No
`core.persistence.storageClassName`	PVC Storage Class for Core data volume	`manual`	No
`core.persistence.annotations`	PVC annotations	`{}`	No
`core.caCertificate`	Enable the private CA certificate	`false`	No
`core.auth.analyticsToken`	Analytics Token	`""`	Yes
`core.auth.jwtToken`	JWT Token	`""`	Yes
`core.secrets.core.create`	Enable to create Core secret	`true`	No
`core.secrets.core.name`	Name of the Core secret	`"netskope-ce-core-secret"`	No
`core.secrets.caCertificate.create`	Enable to create CA Certificate secret	`true`	No
`core.secrets.caCertificate.name`	Name of the CA Certificate secret	`"netskope-ce-ca-certificate-secret"`	No
`core.workerConcurrency`	Set concurrency horizontal scaling	`3`	No
`core.rabbitmqAvailableStorage`	Rabbitmq available storage	`40`	No

Note: If the core.caCertificate attribute is enabled (Default: false) then the CA certificate should be present in the ca-certificates directory with the ca.pem file name.

UI Configurations

Name	Description	Default Value	Required
`ui.labels`	Additional labels to be added to the UI deployment	`{}`	No
`ui.annotations`	Additional annotations to be added to the UI deployment	`{}`	No
`ui.rbac.create`	Whether to create & use RBAC resources or not, binding ServiceAccount to a role	`true`	No
`ui.rbac.rules`	Custom rules to create following the role specification	`[]`	No
`ui.image`	Docker image of UI	`netskopetechnicalalliances/cloudexchange:ui5-latest`	No
`ui.replicaCount`	No. of replica of UI	`2`	No
`ui.ssl`	To enable SSL certificates	`false`	No
`ui.resources`	Resources request and limit for UI (Note: These are default configurations for a low data volume (Extra Small Netskope CE Package Type). The end user may want to change these values as per the underlying use case and data volume on their end (based on the associated Netskope CE Package Type). While doing that, please ensure that the underlying cluster nodes should also have a cumulative sufficient compute power for this change to work seamlessly. For more details on the Netskope CE Package Types, please refer to the Package Sizing Matrix section)	limits: memory: 250Mi cpu: 750m requests: memory: 125Mi cpu: 325m	No
`ui.securityContext.privileged`	Privileged containers can allow almost completely unrestricted host access	`false`	No
`ui.securityContext.allowPrivilegeEscalation`	Enable privilege escalation, it should be true if privileged is set to true.	`false`	No
`ui.secrets.create`	Enable to create UI secret	`true`	No
`ui.secrets.name`	Name of the UI secret	`"netskope-ce-ui-secret"`	No

Note: If you enable ui.ssl certificates (Default: false), your SSL certificates and certificate & certificate private key (with the respective names cte_cert.key and cte_cert_key.key) must be present in the certificates directory at the root.

Override the Default Values

Specify each parameter using the --set key=value[,key=value] argument to helm install. For example,

$ helm install my-release . --set mongodb.auth.rootPassword=secretpassword

The above command sets the MongoDB root account password to secretpassword.

NOTE: Once this chart is deployed, it is not possible to change the application's access credentials, such as usernames or passwords, using Helm. To change these application credentials after deployment, delete any persistent volumes (PVs) used by the chart and re-deploy it, or use the application's built-in administrative tools if available. Alternatively, create a values-override.yaml YAML file that specifies the values for the parameters can be provided while installing the chart. For example,

$ helm install my-release -f sample-values-override.yaml . -n <namespace-name>

Tip: You can refer to the default values.yaml to get a list of all the helm chart parameters that could be overridden in the override file (refer to the sample-values-overrride-hs.yaml) (refer to the sample-values-override-openshift-hs.yaml for the OpenShift deployment).

Updating Deployment

To override, update the sample-values-override.yaml file with the required values and execute the below command.

$ helm upgrade my-release -f sample-values-override.yaml . -n <namespace-name>

Note: There could be more values that could be needed to be overridden by the end-user based on their use case. For that, please add respective configurations in the sample-overrides file before running the below command. The sample-override file is just for the end user's fundamental reference.

Accessing Netskope CE

To access Netskope CE using port forward, run the below command.

kubectl port-forward service/<ui-service-name> 8080:80 -n <namespace-name>

Tip: To get UI service name run this command kubectl get svc -n <namespace-name>.

Now, go to any browser and enter the below URL in search box.

https://localhost:8080/login

Using Persistent Volumes

There are different types of persistent volumes that can be attached to a Kubernetes deployment. In the below section, we demonstrate the use case of persistent volumes by referencing Amazon Elastic Kubernetes Service (EKS) and Amazon Elastic File System (EFS). Please refer to the prerequisites section before jumping into the detailed steps below.

Prerequisites

An AWS IAM Role with required permissions should be [created and configured](https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html#efs-create-iam-resources) in your Kubernetes cluster Service Account deployment YAML file to allow the Amazon EFS driver to interact with your file system.

Step 1

Create Amazon EFS in AWS and get its file system ID.

Step 2

Install AWS EFS CSI Driver in the Kubernetes cluster (in this case Amazon EKS cluster). To install the driver, follow the below documentation.

https://github.com/kubernetes-sigs/aws-efs-csi-driver

Note: Based on the current latest version of AWS EFS CSI Driver (v1.7.0), the Kubernetes version should be >=v1.17. Though at any point in time, the compatibility versions of Kubernetes and AWS EFS CSI Driver can be identified from the above link.

Step 3

Create StorageClass in the Kubernetes deployment YAML file as mentioned below. In that StorageClass directoryPerms should be 700 and gid and uid should be 1001.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: <sc-name>
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap
  fileSystemId: <fs-id>
  directoryPerms: "700"
  gid: "1001"
  uid: "1001"

Step 4

Pass StorageClass name (sc-name) in the values-override.yaml file as mentioned below in the MongoDB and RabbitMQ sections.

mongodb:
  persistence:
    storageClassName: sc-name
rabbitmq:
  persistence:
    storageClassName: sc-name
core:
  persistence:
    storageClassName: sc-name

Step 5

Install the helm chart by following the steps mentioned in the above section here.

Package Sizing Matrix (Horizontally-scaled Approach)

This section depicts the required CPUs and Memory for containers based on the Netskope Cloud Exchange package types depending on the use case for a horiozontally-scaled approach.

Package Type	# of Core Contianers	Core Container Resources	# of Worker Containers	Worker Container Resources	# of UI Containers	UI Container Resources	# of MongoDB Containers	MongoDB Container Resources	# of RabbitMQ Containesr	RabbitMQ Container Resources	Worker Concurrency	RabbitMQ Available Storage
Extra Small	1	limits: memory: 2Gi cpu: 1000m requests: memory: 1Gi cpu: 500m	2	limits: memory: 3500Mi cpu: 2000m requests: memory: 1500Mi cpu: 1000m	2	limits: memory: 250Mi cpu: 750m requests: memory: 125Mi cpu: 325m	3	limits: memory: 1Gi cpu: 2000m requests: memory: 500Mi cpu: 1000m	3	limits: memory: 3Gi cpu: 1000m requests: memory: 1500Mi cpu: 500m	3	42949672960
Small	1	limits: memory: 2Gi cpu: 1000m requests: memory: 1Gi cpu: 500m	3	limits: memory: 3500Mi cpu: 2000m requests: memory: 1500Mi cpu: 1000m	2	limits: memory: 250Mi cpu: 750m requests: memory: 125Mi cpu: 325m	3	limits: memory: 1Gi cpu: 2000m requests: memory: 500Mi cpu: 1000m	3	limits: memory: 3Gi cpu: 1000m requests: memory: 1500Mi cpu: 500m	3	42949672960
Medium	1	limits: memory: 2Gi cpu: 1000m requests: memory: 1Gi cpu: 500m	4	limits: memory: 3500Mi cpu: 2000m requests: memory: 1500Mi cpu: 1000m	2	limits: memory: 250Mi cpu: 750m requests: memory: 125Mi cpu: 325m	3	limits: memory: 1Gi cpu: 2000m requests: memory: 500Mi cpu: 1000m	3	limits: memory: 3Gi cpu: 1000m requests: memory: 1500Mi cpu: 500m	3	85899345920
Large	1	limits: memory: 2Gi cpu: 1000m requests: memory: 1Gi cpu: 500m	5	limits: memory: 3500Mi cpu: 2000m requests: memory: 1500Mi cpu: 1000m	2	limits: memory: 250Mi cpu: 750m requests: memory: 125Mi cpu: 325m	3	limits: memory: 1Gi cpu: 2000m requests: memory: 500Mi cpu: 1000m	3	limits: memory: 3Gi cpu: 1000m requests: memory: 1500Mi cpu: 500m	3	128849018880
Extra Large	1	limits: memory: 2Gi cpu: 1000m requests: memory: 1Gi cpu: 500m	8	limits: memory: 3500Mi cpu: 2000m requests: memory: 1500Mi cpu: 1000m	2	limits: memory: 250Mi cpu: 750m requests: memory: 125Mi cpu: 325m	3	limits: memory: 1Gi cpu: 2000m requests: memory: 500Mi cpu: 1000m	3	limits: memory: 3Gi cpu: 1000m requests: memory: 1500Mi cpu: 500m	3	171798691840

Please take a look at the sample-values-override-hs file that shows how to override the default values. To deploy the Helm Chart with the override file, refer to the section Override the Default Values.

Deploy with Vertically-scaled Approach

To deploy Netskope CE using a vertically-scaled approach, where the number of core containers is set to 1 and the number of worker containers is set to 0, follow the configurations in the below section Package Sizing Matrix (Vertically-scaled Approach).

Set the number of core containers

Specify the core.replicaCount.core value as 1 in the configuration
Ensure that the deployment only includes a single core container to handle the workload

Disable the worker containers

Set the core.replicaCount.worker value as 0 in the configuration
This ensures that no worker containers are deployed in the environment

By applying these specific configurations, you can deploy Netskope CE with a vertically-scaled approach, optimizing the deployment for a single core container without any worker containers.

Tip: You can refer to the default values.yaml to get a list of all the helm chart parameters that could be overridden in the override file (refer to the sample-values-overrride-vs.yaml) (refer to the sample-values-override-openshift-vs.yaml for the OpenShift deployment).

Package Sizing Matrix (Vertically-scaled Approach)

This section depicts the required CPUs and Memory for containers based on the Netskope Cloud Exchange package types depending on the use case for a vertically-scaled approach.

Package Type	# of Core Contianers	Core Container Resources	# of UI Containers	UI Container Resources	# of MongoDB Containers	MongoDB Container Resources	# of RabbitMQ Containesr	RabbitMQ Container Resources	Worker Concurrency	RabbitMQ Available Storage
Extra Small	1	limits: memory: 4Gi cpu: 4000m requests: memory: 2Gi cpu: 2000m	2	limits: memory: 250Mi cpu: 750m requests: memory: 125Mi cpu: 325m	3	limits: memory: 1Gi cpu: 2000m requests: memory: 500Mi cpu: 1000m	3	limits: memory: 3Gi cpu: 1000m requests: memory: 1500Mi cpu: 500m	6	42949672960
Small	1	limits: memory: 8Gi cpu: 6000m requests: memory: 2Gi cpu: 3000m	2	limits: memory: 250Mi cpu: 750m requests: memory: 125Mi cpu: 325m	3	limits: memory: 1Gi cpu: 2000m requests: memory: 500Mi cpu: 1000m	3	limits: memory: 3Gi cpu: 1000m requests: memory: 1500Mi cpu: 500m	9	42949672960
Medium	1	limits: memory: 16Gi cpu: 8000m requests: memory: 4Gi cpu: 4000m	2	limits: memory: 250Mi cpu: 750m requests: memory: 125Mi cpu: 325m	3	limits: memory: 1Gi cpu: 2000m requests: memory: 500Mi cpu: 1000m	3	limits: memory: 3Gi cpu: 1000m requests: memory: 1500Mi cpu: 500m	12	85899345920
Large	1	limits: memory: 24Gi cpu: 12000m requests: memory: 8Gi cpu: 4000m	2	limits: memory: 250Mi cpu: 750m requests: memory: 125Mi cpu: 325m	3	limits: memory: 1Gi cpu: 2000m requests: memory: 500Mi cpu: 1000m	3	limits: memory: 3Gi cpu: 1000m requests: memory: 1500Mi cpu: 500m	18	128849018880
Extra Large	1	limits: memory: 48Gi cpu: 24000m requests: memory: 8Gi cpu: 4000m	2	limits: memory: 250Mi cpu: 750m requests: memory: 125Mi cpu: 325m	3	limits: memory: 1Gi cpu: 2000m requests: memory: 500Mi cpu: 1000m	3	limits: memory: 3Gi cpu: 1000m requests: memory: 1500Mi cpu: 500m	36	171798691840

Comparison of Vertical-scaling v/s Horizontal-scaling Compute Requirements

This section depicts the comparison of the underlying (nodes) compute power required in both the approaches of CE deployment (Vertically-scaled v/s Horizontally-scaled). Here, we have considered the High Availability (HA) aspects while desgining the cluster and the helm configurations.

Package Type	Vertical-scaling Benchmarking	Horizontal-scaling Benchmarking
Extra Small	3 Nodes, 8 Core and 16 GB Memory (Total 24 Core and 48 Memory)	6 Nodes, 4 Core and 8 GB Memory (Total 24 Core and 48 Memory)
Small	3 Nodes, 8 Core and 16 GB Memory (Total 24 Core and 48 Memory)	7 Nodes, 4 Core and 8 GB Memory (Total 28 Core and 56 Memory)
Medium	3 Nodes, 16 Core and 32 GB Memory (Total 48 Core and 96 Memory)	8 Nodes, 4 Core and 8 GB Memory (Total 32 Core and 64 Memory)
Large	3 Nodes, 32 Core and 64 GB Memory (Total 96 Core and 192 Memory)	9 Nodes, 4 Core and 8 GB Memory (Total 36 Core and 72 Memory)
Extra Large	3 Nodes, 32 Core and 64 GB Memory (Total 96 Core and 192 Memory)	10 Nodes, 4 Core and 8 GB Memory (Total 40 Core and 80 Memory)

Testing Matrix

This section depicts the container orchestration platforms and CE version on which we have tested the current Helm chart.

Vendor	Container Orchestration Platform	Host OS	Component Versions	Component Configurations
Kubernetes	Kubernetes	RHEL	RHEL	8 Node, 4 Cores CPU and 8 GB Memory and Medium Package, NFS as a Persistent Volume
Kubernetes	Kubernetes	CentOS	CentOS 7	8 Node, 4 Cores CPU and 8 GB Memory and Medium Package, NFS as a Persistent Volume
Kubernetes	Kubernetes	Ubuntu	Ubuntu 20.04	8 Node, 4 Cores CPU and 8 GB Memory and Medium Package, NFS as a Persistent Volume
OpenShift	OpenShift	RHEL	4.11.39 Openshift	8 Node, 4 Cores CPU and 8 GB Memory and Medium Package, NFS as a Persistent Volume

Container Name	Image Tag	Version
Core	netskopetechnicalalliances/cloudexchange:core5-latest	5.0.0
UI	netskopetechnicalalliances/cloudexchange:ui5-latest	5.0.0
MongoDB	index.docker.io/mongo:5.0.21	5.0.21
RabbitMQ	index.docker.io/rabbitmq:3.12.6-management	3.12.6-management

Migrating CE v4.2.0 to CE v5.0.0

Notes

Please be aware that during the migration process, there will be no data loss for the custom plugins that have been uploaded. Therefore, there is no need to re-upload those custom plugins, and you do not need to back up and retain any critical custom plugins before the migration.
Modify Configuration: Make necessary changes to configurations, which means get latest helm chart for CE v5.0.0.
Update to CE v5.0.0: Follow the appropriate update procedure to migrate to CE v5.0.0.
Verify Migration: Conduct thorough testing to ensure the migration was successful and all functionalities are intact.
Complete Post-Migration Tasks: Inform stakeholders about the completion of the migration and perform specific post migration steps if any based on the end-users use-case

By following these steps, you can migrate to CE v5.0.0 while minimizing disruptions and ensuring a smooth transition for your system.

Download the Helm chart for CE v5.0.0 and make any necessary modifications to the chart's values or configuration files if required. Once the changes have been done, deploy the updated Helm chart to implement the desired changes in your CE environment.
```
helm upgrade <release-name> . -n <namespace> -f <values-override-file>
```
Retrieve Mongodb statefulset pods name.
```
kubectl get pods -n <namespace> 
```
Using the retrieved mongodb statefulset pods name from the previous step, delete the Mongodb Statefulset pods in reverse order (wait untill the deleted pod starts again with new image and becomes healthy before deleting the next pod.) using the following command.
```
kubectl delete pod <mongodb-statefulset-pod-name> -n <namespapce>  
```
After the successful migration to v5.0.0, ensure that the MongoDB and RabbitMQ StatefulSets have completed the rolling update, resulting in the creation of new containers for the Core and Worker components. Perform a series of sanity tests to verify the functionality and stability of the migrated system. Additionally, check the CE version from the CE user interface to confirm the successful migration and ensure that the updated version v5.0.0 is reflected.

Restoring MongoDB Data

To restore existing MongoDB data in your newly deployed Netskope CE stack or an upgraded Netskope CE stack, follow the below steps.

Prerequisites

Prior to the restore, it is essential to stop data ingestion to maintain data integrity.
As a part of the restore process, it is necessary to stop the Core and Worker containers to ensure a smooth restore.
Before proceeding with the restore process, it is crucial to verify that you have successfully created a comprehensive MongoDB dump containing all the necessary data. This backup ensures that you have a reliable and complete snapshot of your MongoDB database to restore from in the event of data loss or corruption.

Notes

Please be aware that during the restore process, there is a data loss for the custom plugins that were uploaded. This is due to the removal of persistent volume claim for Core and Worker containers. Therefore, it will be necessary to reupload those custom plugins after completing the restore to ensure their availability in the updated system. It is advisable to take appropriate measures to back up and retain any critical custom plugins prior to the restore process to mitigate any potential loss of data.
Plan for Downtime: Allocate a maintenance window and inform stakeholders to minimize disruptions.
Notify Stakeholders: Communicate the restore process schedule, expected downtime, and potential impact to stakeholders.
Stop Data Ingestion: Gracefully halt the data ingestion process to prevent data loss or inconsistencies.
Stop Core and Worker Deployments: Properly shut down the core and worker deployments.
Verify Data Restore: Conduct thorough testing to ensure the restore was successful and all functionalities are intact.
Complete Post-Restore Tasks: Inform stakeholders about the completion of the restore and perform specific post restore steps if any based on the end-users use-cases.

By following these steps, one can minimize disruptions and ensure a smooth restore for their system.

Retrieve Core and Worker deployments names.
```
kubectl get deployment -n <namespace> 
```

Note: As a prerequisite, it is essential to halt data ingestion before proceeding with the deletion of the Core and Worker deployments.

Using the provided deployments names from the previous step, delete the Core and Worker deployments using the following command.
```
kubectl delete deployment <core-deployment-name> <worker-deployment-name> -n <namespapce>
```
Retrieve Custom plugin PVC name.
```
kubectl get pvc -n <namespcae>
```
After confirming the successful deletion of the Core and Worker containers, proceed to delete the Persistent Volume Claim (PVC) using the appropriate command or method.
```
kubectl delete pvc <custom-plugin-pvc-name> -n <namespace>
```
Deploy a temporary container and ensure you have a backup available in Kubernetes, follow these steps:
- Prepare the Container Specification:
  - Create a Kubernetes Deployment or Pod definition file (e.g., deployment.yaml) with the necessary specifications.
  - Specify the container image you want to use, including all required software and dependencies.
  - Here we have provided sample pod spec, in which change appropriate values by replacing "<>".
```
apiVersion: v1
kind: Pod
metadata:
  name: restore-mongodb
  namespace: <namespcae-name>
spec: 
  containers:
    - env:
        - name: MONGO_CONNECTION_STRING
          value: <mongo-db-connection-string>
      image: <image>
      command:
        - sh
      args:
        - -ec
        - |
          sleep 3000;
      imagePullPolicy: IfNotPresent
      name: restore-mongodb
      volumeMounts:
        - name: restore-mongodb
          mountPath: /data
  volumes:
    - name: restore-mongodb
      persistentVolumeClaim:
        claimName: restore-mongodb-pvc
```
  - To apply the pod spec file and deploy the pod in Kubernetes, use the following command:
```
kubectl apply -f <spec-file.yaml>
```
- SSH into the that newly created pod using the below command.
```
kubectl exec -it <pod-name> -- /bin/bash
```
  Replace <pod-name> with the actual name of the pod you want to access. This command will open an interactive shell session within the specified pod, allowing you to execute commands and access the container's filesystem.
- To restore MongoDB data from within the newly created pod, execute the following command after ensuring that the MongoDB dump is available within the container:
```
mongorestore --uri=$MONGO_CONNECTION_STRING --gzip  --archive=<dump-file> --drop;
```
Now that we have successfully restored the data, we can redeploy the actaul helm chart with the required values so that it will automatically detect change and bring the kubernetes resources in its desired state keeping the restored data persistent.
```
helm upgrade <release-name> . -n <namespace> -f <values-override-file>
```
After the successful restore of MongoDB data, perform a series of sanity tests to verify the functionality and stability of the restored system.

Troubleshooting

RabbitMQ Split Brain (Network Partitions)

In our setup, we utilize a RabbitMQ cluster consisting of three nodes to achieve high availability. One node is designated as the master, while the other two nodes serve as replicas. Clustering is employed to accomplish various objectives, such as enhancing data safety through replication, improving availability for client operations, and increasing overall system throughput. Optimal configurations may vary depending on specific goals and requirements.

Problem Statement

In a RabbitMQ cluster configuration with multiple nodes, network issues or connectivity problems can lead to a situation where one of the cluster members becomes isolated and operates independently from the rest of the cluster. This causes the cluster to split into separate entities, with each side considering the other side as crashed. This scenario, known as split-brain, creates inconsistencies as queues, bindings, and exchanges can be created or deleted separately on each side of the split, leading to data inconsistency and potential data loss.

Detecting Split Brain

Nodes determine if its peer is down if another node is unable to contact it for a period of time, 60 seconds by default. If two nodes come back into contact, both having thought the other is down, the nodes will determine that a partition has occurred. This will be written to the RabbitMQ log in a format similar to below:

2020-05-18 06:55:37.324 [error] <0.341.0> Mnesia(rabbit@warp10): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@hostname2}

Recovering from a Split Brain

To recover from a split-brain, first choose one partition which you trust the most. This partition will become the authority for the state of the system (schema, messages) to use; any changes which have occurred on other partitions will be lost.

Stop all nodes in the other partitions, then start them all up again. When they rejoin the cluster they will restore state from the trusted partition. Follow the below steps to rejoin to the cluster.

Note: In our case rabbitmq node name looks like, rabbit@netskope-ce-rabbitmqcluster-server-0.netskope-ce-rabbitmqcluster-nodes.<namespace-name>

SSH in to that RabbitMQ node which is outside of Network.

kubectl exec -it <rabbitmq-node-pod-name> -n <namespace> -- /bin/bash

Run the below commands into that RabbitMQ node

rabbitmqctl stop_app
# => Stopping node rabbit@rabbit2 ...done.

rabbitmqctl reset
# => Resetting node rabbit@rabbit2 ...

rabbitmqctl join_cluster rabbit@rabbit1
# => Clustering node rabbit@rabbit2 with [rabbit@rabbit1] ...done.

rabbitmqctl start_app
# => Starting node rabbit@rabbit2 ...done.

For more information, refer RabbitMQ Clustering and Network Partitions

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
media		media
templates		templates
.gitignore		.gitignore
Chart.yaml		Chart.yaml
README.md		README.md
license.md		license.md
sample-values-override-hs.yaml		sample-values-override-hs.yaml
sample-values-override-openshift-hs.yaml		sample-values-override-openshift-hs.yaml
sample-values-override-openshift-vs.yaml		sample-values-override-openshift-vs.yaml
sample-values-override-vs.yaml		sample-values-override-vs.yaml
values.yaml		values.yaml

License

netskopeoss/K8S-OpenShift-CloudExchange

Folders and files

Latest commit

History

Repository files navigation

Deploying Netskope Cloud Exchange using Helm

Table of contents

Prerequisites

Deploying the Netskope CE Helm Chart

Deleting the Netskope CE Helm Chart

Configurations

Common Configurations (these configurations will be applied on all pods)

MongoDB Configurations

RabbitMQ Configurations

Core Configurations

UI Configurations

Override the Default Values

Updating Deployment

Accessing Netskope CE

Using Persistent Volumes

Prerequisites

Step 1

Step 2

Step 3

Step 4

Step 5

Package Sizing Matrix (Horizontally-scaled Approach)

Deploy with Vertically-scaled Approach

Package Sizing Matrix (Vertically-scaled Approach)

Comparison of Vertical-scaling v/s Horizontal-scaling Compute Requirements

Testing Matrix

Migrating CE v4.2.0 to CE v5.0.0

Notes

Restoring MongoDB Data

Prerequisites

Notes

Troubleshooting

RabbitMQ Split Brain (Network Partitions)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages