From 90660ccfb20267fb6476f2afbeebd20ae9365ab1 Mon Sep 17 00:00:00 2001 From: Karl Cardenas Date: Wed, 11 Sep 2024 15:19:26 -0700 Subject: [PATCH 1/5] docs: DOC-1369 --- .../release-notes/known-issues.md | 77 ++++++++++--------- .../troubleshooting/automation.md | 35 +++++++++ .../troubleshooting/troubleshooting.md | 2 + 3 files changed, 76 insertions(+), 38 deletions(-) create mode 100644 docs/docs-content/troubleshooting/automation.md diff --git a/docs/docs-content/release-notes/known-issues.md b/docs/docs-content/release-notes/known-issues.md index 4615ab72bf..83b134353b 100644 --- a/docs/docs-content/release-notes/known-issues.md +++ b/docs/docs-content/release-notes/known-issues.md @@ -14,44 +14,45 @@ to review and stay informed about the status of known issues in Palette. As issu The following table lists all known issues that are currently active and affecting users. -| Description | Workaround | Publish Date | Product Component | -| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------- | ---------------------------- | -| An issue with Edge hosts using [Trusted Boot](../clusters/edge/trusted-boot/trusted-boot.md) and encrypted drives occurs when TRIM is not enabled. As a result, Solid-State Drive and Nonvolatile Memory Express drives experience degraded performance and potentially cause cluster failures. This [issue](https://github.com/kairos-io/kairos/issues/2693) stems from [Kairos](https://kairos.io/) not passing through the `--allow-discards` flag to the `systemd-cryptsetup attach` command. | Check out the [Degreated Performance on Disk Drives](../troubleshooting/edge.md#scenario---degreated-performance-on-disk-drives) troubleshooting guide for guidance on workaround. | September 4, 2024 | Edge | -| The AWS CSI pack has a [Pod Disruption Budget](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) (PDB) that allows for a maximum of one unavailable pod. This behavior causes an issue for single-node clusters as well as clusters with a single control plane node and a single worker node where the control plane lacks worker capability. [Operating System (OS) patch](../clusters/cluster-management/os-patching.md) updates may attempt to evict the CSI controller without success, resulting in the node remaining in the un-schedulable state. | If OS patching is enabled, allow the control plane nodes to have worker capability. For single-node clusters, turn off the OS patching feature. | September 4, 2024 | Cluster, Packs | -| On AWS IaaS Microk8s clusters, OS patching can get stuck and fail. | Refer to the [Troubleshooting](../troubleshooting/nodes.md#os-patch-fails-on-aws-with-microk8s-127) section for debug steps. | August 17, 2024 | Palette | -| When upgrading a self-hosted Palette instance from 4.3 to 4.4 the MongoDB pod may be stuck with the following error: `ReadConcernMajorityNotAvailableYet: Read concern majority reads are currently not possible.` | Delete the PVC, PV and the pod manually. All resources will be recreated with the correct configuration. | August 17, 2024 | Self-Hosted Palette | -| For existing clusters that have added a new machine and all new clusters, pods may be stuck in the draining process and require manual intervention to drain the pod. | Manually delete the pod if it is stuck in the draining process. | August 17, 2024 | Palette | -| Clusters with the Virtual Machine Orchestrator (VMO) pack may experience VMs getting stuck in a continuous migration loop, as indicated by a `Migrating` or `Migration` VM status. | Review the [Virtual Machine Orchestrator (VMO) Troubleshooting](../troubleshooting/vmo-issues.md) section for workarounds. | August 1, 2024 | Virtual Machine Orchestrator | -| Palette CLI users who authenticated with the `login` command and specified a Palette console endpoint that does not contain the tenant name are encountering issues with expired JWT tokens. | Re-authenticate using your tenant URL, for example, `https://my-org.console.spectrocloud.com.` If the issue persists after re-authenticating, remove the `~/.palette/palette.yaml` file that is auto-generated by the Palette CLI. Re-authenticate with the `login` command if other commands require it. | July 25, 2024 | CLI | -| Adding new cloud providers, such as Nutanix, is currently unavailable. Private Cloud Gateway (PCG) deployments in new Nutanix environments fail to complete the installation. As a result, adding a new Nutanix environment to launch new host clusters is unavailable. This does not impact existing Nutanix deployments with a PCG deployed. | No workarounds are available. | July 20, 2024 | Clusters, Self-Hosted, PCG | -| Single-node Private Cloud Gateway (PCG) clusters are experiencing an issue upgrading to 4.4.11. The vSphere CSI controller pod fails to start because there are no matching affinity rules. | Check out the [vSphere Controller Pod Fails to Start in Single Node PCG Cluster](../troubleshooting/pcg.md#scenario---vsphere-controller-pod-fails-to-start-in-single-node-pcg-cluster) guide for workaround steps. | July 20, 2024 | PCG | -| When provisioning an Edge cluster, it's possible that some Operating System (OS) user credentials will be lost once the cluster is active. This is because the cloud-init stages from different sources merge during the deployment process, and sometimes, the same stages without distinct names overwrite each other. | Give each of your cloud-init stages in the OS pack and in the Edge installer **user-data** file a unique name. For more information about cloud-init stages and examples of cloud-init stages with names, refer to [Cloud-init Stages](../clusters/edge/edge-configuration/cloud-init.md). | July 17, 2024 | Edge | -| When you use a content bundle to provision a new cluster without using the local Harbor registry, it's possible for the images to be pulled from external networks instead of from the content bundle, consuming network bandwidth. If your Edge host has no connection to external networks or if it cannot locate the image on a remote registry, some pods may enter the `ImagePullBackOff` state at first, but eventually the pods will be created using images from the content bundle. | For connected clusters, you can make sure that the remote images are not reachable by the Edge host, which will stop the Palette agent from downloading the image and consuming bandwidth, and eventually the cluster will be created using images from the content bundle. For airgap clusters, the `ImagePullBackOff` error will eventually resolve on its own and there is no action to take. | July 11, 2024 | Edge | -| When you add a new VMware vSphere Edge host to an Edge cluster, the IP address may fail to be assigned to the Edge host after a reboot. | Review the [Edge Troubleshooting](../troubleshooting/edge.md) section for workarounds. | July 9, 2024 | Edge | -| When you install Palette Edge using an Edge Installer ISO with a RHEL 8 operating system on a Virtual Machine (VM) with insufficient video memory, the QR code in the registration screen does not display correctly. | Increase the video memory of your VM to 8 MB or higher. The steps to do this vary depending on the platform you use to deploy your VM. In vSphere, you can right click on the VM, click **Edit Settings** and adjust the video card memory in the **Video card** tab. | July 9, 2024 | Edge | -| Custom Certificate Authority (CA) is not supported for accessing Azure AKS clusters. Using a custom CA prevents the `spectro-proxy` pack from working correctly with Azure AKS clusters. | No workaround is available. | July 9, 2024 | Packs, Clusters | -| Manifests attached to an Infrastructure Pack, such as OS, Kubernetes, Network, or Storage, are not applied to the Edge cluster. This issue does not impact the infrastructure pack's YAML definition, which is applied to the cluster. | Specify custom configurations through an add-on pack or a custom manifest pack applied after the infrastructure packs. | Jul 9, 2024 | Edge, Packs | -| Clusters using Cilium and deployed to VMware environments with the VXLAN tunnel protocol may encounter an I/O timeout error. This issue is caused by the VXMNET3 adapter, which is dropping network traffic and resulting in VXLAN traffic being dropped. You can learn more about this issue in the [Cilium's GitHub issue #21801](https://github.com/cilium/cilium/issues/21801). | Review the section for workarounds. | June 27, 2024 | Packs, Clusters, Edge | -| [Sonobuoy](../clusters/cluster-management/compliance-scan.md#conformance-testing) scans fail to generate reports on airgapped Palette Edge clusters. | No workaround is available. | June 24, 2024 | Edge | -| Clusters configured with OpenID Connect (OIDC) at the Kubernetes layer encounter issues when authenticating with the [non-admin Kubeconfig file](../clusters/cluster-management/kubeconfig.md#cluster-admin). Kubeconfig files using OIDC to authenticate will not work if the SSL certificate is set at the OIDC provider level. | Use the admin Kubeconfig file to authenticate with the cluster, as it does not use OIDC to authenticate. | June 21, 2024 | Clusters | -| During the platform upgrade from Palette 4.3 to 4.4, Virtual Clusters may encounter a scenario where the pod `palette-controller-manager` is not upgraded to the newer version of Palette. The virtual cluster will continue to be operational, and this does not impact its functionality. | Refer to the [Controller Manager Pod Not Upgraded](../troubleshooting/palette-dev-engine.md#scenario---controller-manager-pod-not-upgraded) troubleshooting guide. | June 15, 2024 | Virtual Clusters | -| Edge hosts with FIPS-compliant RHEL Operating System (OS) distribution may encounter the error where the `systemd-resolved.service` service enters the **failed** state. This prevents the nameserver from being configured, which will result in cluster deployment failure. | Refer to [TroubleShooting](../troubleshooting/edge.md#scenario---systemd-resolvedservice-enters-failed-state) for a workaround. | June 15, 2024 | Edge | -| The GKE cluster's Kubernetes pods are failing to start because the Kubernetes patch version is unavailable. This is encountered during pod restarts or node scaling operations. | Deploy a new cluster and use a GKE cluster profile that does not contain a Kubernetes pack layer with a patch version. Migrate the workloads from the existing cluster to the new cluster. This is a breaking change introduced in Palette 4.4.0 | June 15, 2024 | Packs, Clusters | -| does not support multi-node control plane clusters. The upgrade strategy, `InPlaceUpgrade`, is the only option available for use. | No workaround is available. | June 15, 2024 | Packs | -| Clusters using as the Kubernetes distribution, the control plane node fails to upgrade when using the `InPlaceUpgrade` strategy for sequential upgrades, such as upgrading from version 1.25.x to version 1.26.x and then to version 1.27.x. | Refer to the [Control Plane Node Fails to Upgrade in Sequential MicroK8s Upgrades](../troubleshooting/pack-issues.md) troubleshooting guide for resolution steps. | June 15, 2024 | Packs | -| Azure IaaS clusters are having issues with deployed load balancers and ingress deployments when using Kubernetes versions 1.29.0 and 1.29.4. Incoming connections time out as a result due to a lack of network path inside the cluster. Azure AKS clusters are not impacted. | Use a Kubernetes version lower than 1.29.0 | June 12, 2024 | Clusters | -| OIDC integration with Virtual Clusters is not functional. All other operations related to Virtual Clusters are operational. | No workaround is available. | Jun 11, 2024 | Virtual Clusters | -| Deploying self-hosted Palette or VerteX to a vSphere environment fails if vCenter has standalone hosts directly under a Datacenter. Persistent Volume (PV) provisioning fails due to an upstream issue with the vSphere Container Storage Interface (CSI) for all versions before v3.2.0. Palette and VerteX use the vSphere CSI version 3.1.2 internally. The issue may also occur in workload clusters deployed on vSphere using the same vSphere CSI for storage volume provisioning. | If you encounter the following error message when deploying self-hosted Palette or VerteX: `'ProvisioningFailed failed to provision volume with StorageClass "spectro-storage-class". Error: failed to fetch hosts from entity ComputeResource:domain-xyz` then use the following workaround. Remove standalone hosts directly under the Datacenter from vCenter and allow the volume provisioning to complete. After the volume is provisioned, you can add the standalone hosts back. You can also use a service account that does not have access to the standalone hosts as the user that deployed Palette. | May 21, 2024 | Self-Hosted | -| Conducting cluster node scaling operations on a cluster undergoing a backup can lead to issues and potential unresponsiveness. | To avoid this, ensure no backup operations are in progress before scaling nodes or performing other cluster operations that change the cluster state | April 14, 2024 | Clusters | -| Palette automatically creates an AWS security group for worker nodes using the format `-node`. If a security group with the same name already exists in the VPC, the cluster creation process fails. | To avoid this, ensure that no security group with the same name exists in the VPC before creating a cluster. | April 14, 2024 | Clusters | -| K3s version 1.27.7 has been marked as _Deprecated_. This version has a known issue that causes clusters to crash. | Upgrade to a newer version of K3s to avoid the issue, such as versions 1.26.12, 1.28.5, and 1.27.11. You can learn more about the issue in the [K3s GitHub issue](https://github.com/k3s-io/k3s/issues/9047) page. | April 14, 2024 | Packs, Clusters | -| When deploying a multi-node AWS EKS cluster with the Container Network Interface (CNI) , the cluster deployment fails. | A workaround is to use the AWS VPC CNI in the interim while the issue is resolved. | April 14, 2024 | Packs, Clusters | -| If a Kubernetes cluster deployed onto VMware is deleted, and later re-created with the same name, the cluster creation process fails. The issue is caused by existing resources remaining inside the PCG, or the System PCG, that are not cleaned up during the cluster deletion process. | Refer to the [VMware Resources Remain After Cluster Deletion](../troubleshooting/pcg.md#scenario---vmware-resources-remain-after-cluster-deletion) troubleshooting guide for resolution steps. | April 14, 2024 | Clusters | -| In a VMware environment, self-hosted Palette instances do not receive a unique cluster ID when deployed, which can cause issues during a node repave event, such as a Kubernetes version upgrade. Specifically, Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) will experience start problems due to the lack of a unique cluster ID. | To resolve this issue, refer to the [Volume Attachment Errors Volume in VMware Environment](../troubleshooting/palette-upgrade.md#volume-attachment-errors-volume-in-vmware-environment) troubleshooting guide. | April 14, 2024 | Self-Hosted | -| Day-2 operations related to infrastructure changes, such as modifying the node size and count, when using MicroK8s are not taking effect. | No workaround is available. | April 14, 2024 | Packs, Clusters | -| If a cluster that uses the Rook-Ceph pack experiences network issues, it's possible for the file mount to become and remain unavailable even after the network is restored. | This a known issue disclosed in the [Rook GitHub repository](https://github.com/rook/rook/issues/13818). To resolve this issue, refer to pack documentation. | April 14, 2024 | Packs, Edge | -| Edge clusters on Edge hosts with ARM64 processors may experience instability issues that cause cluster failures. | ARM64 support is limited to a specific set of Edge devices. Currently, Nvidia Jetson devices are supported. | April 14, 2024 | Edge | -| During the cluster provisioning process of new edge clusters, the Palette webhook pods may not always deploy successfully, causing the cluster to be stuck in the provisioning phase. This issue does not impact deployed clusters. | Review the [Palette Webhook Pods Fail to Start](../troubleshooting/edge.md#scenario---palette-webhook-pods-fail-to-start) troubleshooting guide for resolution steps. | April 14, 2024 | Edge | +| Description | Workaround | Publish Date | Product Component | +| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ | ---------------------------- | +| Third-party binaries downloaded and used by the Palette CLI may become stale and incompatible with the CLI. | Refer to the [Incompatible Stale Palette CLI Binaries](../troubleshooting/automation.md#scenario---incompatible-stale-palette-cli-binaries) troubleshooting guide for workaround guidance. | September 11, 2024 | CLI | +| An issue with Edge hosts using [Trusted Boot](../clusters/edge/trusted-boot/trusted-boot.md) and encrypted drives occurs when TRIM is not enabled. As a result, Solid-State Drive and Nonvolatile Memory Express drives experience degraded performance and potentially cause cluster failures. This [issue](https://github.com/kairos-io/kairos/issues/2693) stems from [Kairos](https://kairos.io/) not passing through the `--allow-discards` flag to the `systemd-cryptsetup attach` command. | Check out the [Degreated Performance on Disk Drives](../troubleshooting/edge.md#scenario---degreated-performance-on-disk-drives) troubleshooting guide for guidance on workaround. | September 4, 2024 | Edge | +| The AWS CSI pack has a [Pod Disruption Budget](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) (PDB) that allows for a maximum of one unavailable pod. This behavior causes an issue for single-node clusters as well as clusters with a single control plane node and a single worker node where the control plane lacks worker capability. [Operating System (OS) patch](../clusters/cluster-management/os-patching.md) updates may attempt to evict the CSI controller without success, resulting in the node remaining in the un-schedulable state. | If OS patching is enabled, allow the control plane nodes to have worker capability. For single-node clusters, turn off the OS patching feature. | September 4, 2024 | Cluster, Packs | +| On AWS IaaS Microk8s clusters, OS patching can get stuck and fail. | Refer to the [Troubleshooting](../troubleshooting/nodes.md#os-patch-fails-on-aws-with-microk8s-127) section for debug steps. | August 17, 2024 | Palette | +| When upgrading a self-hosted Palette instance from 4.3 to 4.4 the MongoDB pod may be stuck with the following error: `ReadConcernMajorityNotAvailableYet: Read concern majority reads are currently not possible.` | Delete the PVC, PV and the pod manually. All resources will be recreated with the correct configuration. | August 17, 2024 | Self-Hosted Palette | +| For existing clusters that have added a new machine and all new clusters, pods may be stuck in the draining process and require manual intervention to drain the pod. | Manually delete the pod if it is stuck in the draining process. | August 17, 2024 | Palette | +| Clusters with the Virtual Machine Orchestrator (VMO) pack may experience VMs getting stuck in a continuous migration loop, as indicated by a `Migrating` or `Migration` VM status. | Review the [Virtual Machine Orchestrator (VMO) Troubleshooting](../troubleshooting/vmo-issues.md) section for workarounds. | August 1, 2024 | Virtual Machine Orchestrator | +| Palette CLI users who authenticated with the `login` command and specified a Palette console endpoint that does not contain the tenant name are encountering issues with expired JWT tokens. | Re-authenticate using your tenant URL, for example, `https://my-org.console.spectrocloud.com.` If the issue persists after re-authenticating, remove the `~/.palette/palette.yaml` file that is auto-generated by the Palette CLI. Re-authenticate with the `login` command if other commands require it. | July 25, 2024 | CLI | +| Adding new cloud providers, such as Nutanix, is currently unavailable. Private Cloud Gateway (PCG) deployments in new Nutanix environments fail to complete the installation. As a result, adding a new Nutanix environment to launch new host clusters is unavailable. This does not impact existing Nutanix deployments with a PCG deployed. | No workarounds are available. | July 20, 2024 | Clusters, Self-Hosted, PCG | +| Single-node Private Cloud Gateway (PCG) clusters are experiencing an issue upgrading to 4.4.11. The vSphere CSI controller pod fails to start because there are no matching affinity rules. | Check out the [vSphere Controller Pod Fails to Start in Single Node PCG Cluster](../troubleshooting/pcg.md#scenario---vsphere-controller-pod-fails-to-start-in-single-node-pcg-cluster) guide for workaround steps. | July 20, 2024 | PCG | +| When provisioning an Edge cluster, it's possible that some Operating System (OS) user credentials will be lost once the cluster is active. This is because the cloud-init stages from different sources merge during the deployment process, and sometimes, the same stages without distinct names overwrite each other. | Give each of your cloud-init stages in the OS pack and in the Edge installer **user-data** file a unique name. For more information about cloud-init stages and examples of cloud-init stages with names, refer to [Cloud-init Stages](../clusters/edge/edge-configuration/cloud-init.md). | July 17, 2024 | Edge | +| When you use a content bundle to provision a new cluster without using the local Harbor registry, it's possible for the images to be pulled from external networks instead of from the content bundle, consuming network bandwidth. If your Edge host has no connection to external networks or if it cannot locate the image on a remote registry, some pods may enter the `ImagePullBackOff` state at first, but eventually the pods will be created using images from the content bundle. | For connected clusters, you can make sure that the remote images are not reachable by the Edge host, which will stop the Palette agent from downloading the image and consuming bandwidth, and eventually the cluster will be created using images from the content bundle. For airgap clusters, the `ImagePullBackOff` error will eventually resolve on its own and there is no action to take. | July 11, 2024 | Edge | +| When you add a new VMware vSphere Edge host to an Edge cluster, the IP address may fail to be assigned to the Edge host after a reboot. | Review the [Edge Troubleshooting](../troubleshooting/edge.md) section for workarounds. | July 9, 2024 | Edge | +| When you install Palette Edge using an Edge Installer ISO with a RHEL 8 operating system on a Virtual Machine (VM) with insufficient video memory, the QR code in the registration screen does not display correctly. | Increase the video memory of your VM to 8 MB or higher. The steps to do this vary depending on the platform you use to deploy your VM. In vSphere, you can right click on the VM, click **Edit Settings** and adjust the video card memory in the **Video card** tab. | July 9, 2024 | Edge | +| Custom Certificate Authority (CA) is not supported for accessing Azure AKS clusters. Using a custom CA prevents the `spectro-proxy` pack from working correctly with Azure AKS clusters. | No workaround is available. | July 9, 2024 | Packs, Clusters | +| Manifests attached to an Infrastructure Pack, such as OS, Kubernetes, Network, or Storage, are not applied to the Edge cluster. This issue does not impact the infrastructure pack's YAML definition, which is applied to the cluster. | Specify custom configurations through an add-on pack or a custom manifest pack applied after the infrastructure packs. | Jul 9, 2024 | Edge, Packs | +| Clusters using Cilium and deployed to VMware environments with the VXLAN tunnel protocol may encounter an I/O timeout error. This issue is caused by the VXMNET3 adapter, which is dropping network traffic and resulting in VXLAN traffic being dropped. You can learn more about this issue in the [Cilium's GitHub issue #21801](https://github.com/cilium/cilium/issues/21801). | Review the section for workarounds. | June 27, 2024 | Packs, Clusters, Edge | +| [Sonobuoy](../clusters/cluster-management/compliance-scan.md#conformance-testing) scans fail to generate reports on airgapped Palette Edge clusters. | No workaround is available. | June 24, 2024 | Edge | +| Clusters configured with OpenID Connect (OIDC) at the Kubernetes layer encounter issues when authenticating with the [non-admin Kubeconfig file](../clusters/cluster-management/kubeconfig.md#cluster-admin). Kubeconfig files using OIDC to authenticate will not work if the SSL certificate is set at the OIDC provider level. | Use the admin Kubeconfig file to authenticate with the cluster, as it does not use OIDC to authenticate. | June 21, 2024 | Clusters | +| During the platform upgrade from Palette 4.3 to 4.4, Virtual Clusters may encounter a scenario where the pod `palette-controller-manager` is not upgraded to the newer version of Palette. The virtual cluster will continue to be operational, and this does not impact its functionality. | Refer to the [Controller Manager Pod Not Upgraded](../troubleshooting/palette-dev-engine.md#scenario---controller-manager-pod-not-upgraded) troubleshooting guide. | June 15, 2024 | Virtual Clusters | +| Edge hosts with FIPS-compliant RHEL Operating System (OS) distribution may encounter the error where the `systemd-resolved.service` service enters the **failed** state. This prevents the nameserver from being configured, which will result in cluster deployment failure. | Refer to [TroubleShooting](../troubleshooting/edge.md#scenario---systemd-resolvedservice-enters-failed-state) for a workaround. | June 15, 2024 | Edge | +| The GKE cluster's Kubernetes pods are failing to start because the Kubernetes patch version is unavailable. This is encountered during pod restarts or node scaling operations. | Deploy a new cluster and use a GKE cluster profile that does not contain a Kubernetes pack layer with a patch version. Migrate the workloads from the existing cluster to the new cluster. This is a breaking change introduced in Palette 4.4.0 | June 15, 2024 | Packs, Clusters | +| does not support multi-node control plane clusters. The upgrade strategy, `InPlaceUpgrade`, is the only option available for use. | No workaround is available. | June 15, 2024 | Packs | +| Clusters using as the Kubernetes distribution, the control plane node fails to upgrade when using the `InPlaceUpgrade` strategy for sequential upgrades, such as upgrading from version 1.25.x to version 1.26.x and then to version 1.27.x. | Refer to the [Control Plane Node Fails to Upgrade in Sequential MicroK8s Upgrades](../troubleshooting/pack-issues.md) troubleshooting guide for resolution steps. | June 15, 2024 | Packs | +| Azure IaaS clusters are having issues with deployed load balancers and ingress deployments when using Kubernetes versions 1.29.0 and 1.29.4. Incoming connections time out as a result due to a lack of network path inside the cluster. Azure AKS clusters are not impacted. | Use a Kubernetes version lower than 1.29.0 | June 12, 2024 | Clusters | +| OIDC integration with Virtual Clusters is not functional. All other operations related to Virtual Clusters are operational. | No workaround is available. | Jun 11, 2024 | Virtual Clusters | +| Deploying self-hosted Palette or VerteX to a vSphere environment fails if vCenter has standalone hosts directly under a Datacenter. Persistent Volume (PV) provisioning fails due to an upstream issue with the vSphere Container Storage Interface (CSI) for all versions before v3.2.0. Palette and VerteX use the vSphere CSI version 3.1.2 internally. The issue may also occur in workload clusters deployed on vSphere using the same vSphere CSI for storage volume provisioning. | If you encounter the following error message when deploying self-hosted Palette or VerteX: `'ProvisioningFailed failed to provision volume with StorageClass "spectro-storage-class". Error: failed to fetch hosts from entity ComputeResource:domain-xyz` then use the following workaround. Remove standalone hosts directly under the Datacenter from vCenter and allow the volume provisioning to complete. After the volume is provisioned, you can add the standalone hosts back. You can also use a service account that does not have access to the standalone hosts as the user that deployed Palette. | May 21, 2024 | Self-Hosted | +| Conducting cluster node scaling operations on a cluster undergoing a backup can lead to issues and potential unresponsiveness. | To avoid this, ensure no backup operations are in progress before scaling nodes or performing other cluster operations that change the cluster state | April 14, 2024 | Clusters | +| Palette automatically creates an AWS security group for worker nodes using the format `-node`. If a security group with the same name already exists in the VPC, the cluster creation process fails. | To avoid this, ensure that no security group with the same name exists in the VPC before creating a cluster. | April 14, 2024 | Clusters | +| K3s version 1.27.7 has been marked as _Deprecated_. This version has a known issue that causes clusters to crash. | Upgrade to a newer version of K3s to avoid the issue, such as versions 1.26.12, 1.28.5, and 1.27.11. You can learn more about the issue in the [K3s GitHub issue](https://github.com/k3s-io/k3s/issues/9047) page. | April 14, 2024 | Packs, Clusters | +| When deploying a multi-node AWS EKS cluster with the Container Network Interface (CNI) , the cluster deployment fails. | A workaround is to use the AWS VPC CNI in the interim while the issue is resolved. | April 14, 2024 | Packs, Clusters | +| If a Kubernetes cluster deployed onto VMware is deleted, and later re-created with the same name, the cluster creation process fails. The issue is caused by existing resources remaining inside the PCG, or the System PCG, that are not cleaned up during the cluster deletion process. | Refer to the [VMware Resources Remain After Cluster Deletion](../troubleshooting/pcg.md#scenario---vmware-resources-remain-after-cluster-deletion) troubleshooting guide for resolution steps. | April 14, 2024 | Clusters | +| In a VMware environment, self-hosted Palette instances do not receive a unique cluster ID when deployed, which can cause issues during a node repave event, such as a Kubernetes version upgrade. Specifically, Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) will experience start problems due to the lack of a unique cluster ID. | To resolve this issue, refer to the [Volume Attachment Errors Volume in VMware Environment](../troubleshooting/palette-upgrade.md#volume-attachment-errors-volume-in-vmware-environment) troubleshooting guide. | April 14, 2024 | Self-Hosted | +| Day-2 operations related to infrastructure changes, such as modifying the node size and count, when using MicroK8s are not taking effect. | No workaround is available. | April 14, 2024 | Packs, Clusters | +| If a cluster that uses the Rook-Ceph pack experiences network issues, it's possible for the file mount to become and remain unavailable even after the network is restored. | This a known issue disclosed in the [Rook GitHub repository](https://github.com/rook/rook/issues/13818). To resolve this issue, refer to pack documentation. | April 14, 2024 | Packs, Edge | +| Edge clusters on Edge hosts with ARM64 processors may experience instability issues that cause cluster failures. | ARM64 support is limited to a specific set of Edge devices. Currently, Nvidia Jetson devices are supported. | April 14, 2024 | Edge | +| During the cluster provisioning process of new edge clusters, the Palette webhook pods may not always deploy successfully, causing the cluster to be stuck in the provisioning phase. This issue does not impact deployed clusters. | Review the [Palette Webhook Pods Fail to Start](../troubleshooting/edge.md#scenario---palette-webhook-pods-fail-to-start) troubleshooting guide for resolution steps. | April 14, 2024 | Edge | ## Resolved Known Issues diff --git a/docs/docs-content/troubleshooting/automation.md b/docs/docs-content/troubleshooting/automation.md new file mode 100644 index 0000000000..1a87209b5d --- /dev/null +++ b/docs/docs-content/troubleshooting/automation.md @@ -0,0 +1,35 @@ +--- +sidebar_label: "Automation" +title: "Automation" +description: "Troubleshooting steps for Palette and VerteX related automation tools such as the SDK, CLI, and API." +icon: "" +hide_table_of_contents: false +sidebar_position: 5 +tags: ["troubleshooting", "automation", "sdk", "cli", "api"] +--- + +The following sections will help you troubleshoot issues with Palette and VerteX related automation tools such as the +API, CLI, Terraform, and SDK. + +## Scenario - Incompatible Stale Palette CLI Binaries + +Palette CLI may encounter issues when attempting to use third-party binaries that are incompatible with the CLI such as +`docker`, `kind`, and `validatorctl`. By default, the Palette CLI will download the third-party binaries from the +internet and store them in the `$HOME/.palette/bin` directory, the first time you issue a command that requires them. +The Palette CLI does not these binaries, which can lead to compatibility issues with current versions of the CLI. + +Use the following steps to resolve issues with incompatible stale Palette CLI binaries. + +### Debug Steps + +1. Log in to the machine where the Palette CLI is installed. + +2. Remove the `~/.palette/bin` directory. + + ```shell + rm -rf ~/.palette/bin + ``` + +3. Re-issue the command that requires the binary. The CLI will download the latest version of the binary and store it in + the `$HOME/.palette/bin` directory. If you used the `--workspace` flag then the third-party binaries will be stored + in the specified workspace directory. diff --git a/docs/docs-content/troubleshooting/troubleshooting.md b/docs/docs-content/troubleshooting/troubleshooting.md index bb056c4603..7d5dfe803e 100644 --- a/docs/docs-content/troubleshooting/troubleshooting.md +++ b/docs/docs-content/troubleshooting/troubleshooting.md @@ -11,6 +11,8 @@ tags: ["troubleshooting"] Use the following troubleshooting resources to help you address issues that may arise. You can also reach out to our support team by opening up a ticket through our [support page](http://support.spectrocloud.io/). +- [Automation](automation.md) + - [Cluster Deployment](cluster-deployment.md) - [Edge](edge.md) From 1a64853098f1b3c385875cc84b8f5e8f08aff938 Mon Sep 17 00:00:00 2001 From: Karl Cardenas Date: Wed, 11 Sep 2024 15:35:34 -0700 Subject: [PATCH 2/5] docs: fixed typo --- plugins/packs-integrations.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/plugins/packs-integrations.js b/plugins/packs-integrations.js index ff8b251697..9db8cb795f 100644 --- a/plugins/packs-integrations.js +++ b/plugins/packs-integrations.js @@ -473,7 +473,7 @@ async function pluginPacksAndIntegrationsData(context, options) { logger.error("An error occurred while reading the JSON file:", e); } } - logger.info(`The number of packs identified are ${Object.keys(apiPackResponse.packMDMap).length}`); + logger.info(`The number of packs identified is: ${Object.keys(apiPackResponse.packMDMap).length}`); return { packsPaletteData: apiPackResponse.packMDMap, packsPaletteDetailsData: apiPackResponse.apiPacksData, From 3999eb011ca96e7a3bbca28de8f6c5797ef5ad0a Mon Sep 17 00:00:00 2001 From: Karl Cardenas <29551334+karl-cardenas-coding@users.noreply.github.com> Date: Wed, 11 Sep 2024 15:45:18 -0700 Subject: [PATCH 3/5] docs: Apply suggestions from code review Co-authored-by: Lenny Chen <55669665+lennessyy@users.noreply.github.com> --- docs/docs-content/troubleshooting/automation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/docs-content/troubleshooting/automation.md b/docs/docs-content/troubleshooting/automation.md index 1a87209b5d..de8d4be7a1 100644 --- a/docs/docs-content/troubleshooting/automation.md +++ b/docs/docs-content/troubleshooting/automation.md @@ -16,7 +16,7 @@ API, CLI, Terraform, and SDK. Palette CLI may encounter issues when attempting to use third-party binaries that are incompatible with the CLI such as `docker`, `kind`, and `validatorctl`. By default, the Palette CLI will download the third-party binaries from the internet and store them in the `$HOME/.palette/bin` directory, the first time you issue a command that requires them. -The Palette CLI does not these binaries, which can lead to compatibility issues with current versions of the CLI. +The Palette CLI does not upgrade these binaries, which can lead to compatibility issues with current versions of the CLI. Use the following steps to resolve issues with incompatible stale Palette CLI binaries. From 3aa9833212e6b9eab3c2e59931fadb7d5d4fbdfa Mon Sep 17 00:00:00 2001 From: karl-cardenas-coding Date: Wed, 11 Sep 2024 22:48:15 +0000 Subject: [PATCH 4/5] ci: auto-formatting prettier issues --- docs/docs-content/troubleshooting/automation.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/docs-content/troubleshooting/automation.md b/docs/docs-content/troubleshooting/automation.md index de8d4be7a1..0eb3cb64ff 100644 --- a/docs/docs-content/troubleshooting/automation.md +++ b/docs/docs-content/troubleshooting/automation.md @@ -16,7 +16,8 @@ API, CLI, Terraform, and SDK. Palette CLI may encounter issues when attempting to use third-party binaries that are incompatible with the CLI such as `docker`, `kind`, and `validatorctl`. By default, the Palette CLI will download the third-party binaries from the internet and store them in the `$HOME/.palette/bin` directory, the first time you issue a command that requires them. -The Palette CLI does not upgrade these binaries, which can lead to compatibility issues with current versions of the CLI. +The Palette CLI does not upgrade these binaries, which can lead to compatibility issues with current versions of the +CLI. Use the following steps to resolve issues with incompatible stale Palette CLI binaries. From 5b0b3191ff3a7d9c5e0515ee299238243c5402dc Mon Sep 17 00:00:00 2001 From: Karl Cardenas Date: Wed, 11 Sep 2024 15:50:01 -0700 Subject: [PATCH 5/5] ci: fix vale --- vale.ini | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/vale.ini b/vale.ini index fe6f834f83..8ac2714115 100644 --- a/vale.ini +++ b/vale.ini @@ -5,10 +5,10 @@ Vocab = spectrocloud-vocab MinAlertLevel = suggestion Packages = Google, write-good, alex, https://github.com/spectrocloud/spectro-vale-pkg/releases/latest/download/spectrocloud-docs-internal.zip - +IgnoredScopes = code, tt, img, url, a, [*.md] BasedOnStyles = Vale, Google, write-good, alex, spectrocloud-docs-internal -IgnoredScopes = code, tt, img, url, a, + ; BlockIgnores = ; The following line ignores all import statements in markdown files, backticks