docs: fix readme for anonymization (#559)

* docs: fixed markdown for Anonymization Signed-off-by: Jatin Mehrotra <jatin.mehrotra@classmethod.jp> * docs: added details for events which are not being masked Signed-off-by: Jatin Mehrotra <jatin.mehrotra@classmethod.jp> * docs: removed timeframe, added issue link for event anonymization Signed-off-by: Jatin Mehrotra <jatin.mehrotra@classmethod.jp> * docs: changed title to Further details Signed-off-by: Jatin Mehrotra <jatin.mehrotra@classmethod.jp> * docs: fixed broken markdown for config management and remote caching section Signed-off-by: Jatin Mehrotra <jatin.mehrotra@classmethod.jp> * Signed-off-by: Jatin Mehrotra <jatin.mehrotra@classmethod.jp> docs: added "note" for events before further details section --------- Signed-off-by: Jatin Mehrotra <jatin.mehrotra@classmethod.jp> Co-authored-by: Jatin Mehrotra <jatin.mehrotra@classmethod.jp> Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
k8sgpt-ai · Jul 19, 2023 · 70bec05 · 70bec05
1 parent 781ecb7
commit 70bec05
Showing 1 changed file with 51 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -395,6 +395,7 @@ With this option, the data is anonymized before being sent to the AI Backend. Du
 
 
 <summary> Anonymization </summary>
+
 1. Error reported during analysis:
 ```bash
 Error: HorizontalPodAutoscaler uses StatefulSet/fake-deployment as ScaleTargetRef which does not exist.
@@ -415,12 +416,61 @@ The Kubernetes system is trying to scale a StatefulSet named tGLcCRcHa1Ce5Rs usi
 The Kubernetes system is trying to scale a StatefulSet named fake-deployment using the HorizontalPodAutoscaler, but it cannot find the StatefulSet. The solution is to verify that the StatefulSet name is spelled correctly and exists in the same namespace as the HorizontalPodAutoscaler.
 ```
 
+Note: **Anonymization does not currently apply to events.**
+
+### Further Details
+
 **Anonymization does not currently apply to events.**
 
+*In a few analysers like Pod, we feed to the AI backend the event messages which are not known beforehand thus we are not masking them for the **time being**.*
+
+- The following are the list of analysers in which data is **being masked**:-
+
+  - Statefulset
+  - Service
+  - PodDisruptionBudget
+  - Node
+  - NetworkPolicy
+  - Ingress
+  - HPA
+  - Deployment
+  - Cronjob
+
+- The following are the list of analysers in which data is **not being masked**:-
+
+  - RepicaSet
+  - PersistentVolumeClaim
+  - Pod
+  - **_*Events_**
+
+***Note**:
+  - k8gpt will not mask the above analysers because they do not send any identifying information except **Events** analyser.
+  - Masking for **Events** analyzer is scheduled in the near future as seen in this [issue](https://github.com/k8sgpt-ai/k8sgpt/issues/560). _Further research has to be made to understand the patterns and be able to mask the sensitive parts of an event like pod name, namespace etc._
+
+- The following are the list of fields which are not **being masked**:-
+
+  - Describe
+  - ObjectStatus
+  - Replicas
+  - ContainerStatus
+  - **_*Event Message_**
+  - ReplicaStatus
+  - Count (Pod)
+
+***Note**:
+  - It is quite possible the payload of the event message might have something like "super-secret-project-pod-X crashed" which we don't currently redact _(scheduled in the near future as seen in this [issue](https://github.com/k8sgpt-ai/k8sgpt/issues/560))_.
+
+### Proceed with care
+
+  - The K8gpt team recommends using an entirely different backend **(a local model) in critical production environments**. By using a local model, you can rest assured that everything stays within your DMZ, and nothing is leaked.
+  - If there is any uncertainty about the possibility of sending data to a public LLM (open AI, Azure AI) and it poses a risk to business-critical operations, then, in such cases, the use of public LLM should be avoided based on personal assessment and the jurisdiction of risks involved.
+
+
 </details>
 
 <details>
 <summary> Configuration management</summary>
+
 `k8sgpt` stores config data in the `$XDG_CONFIG_HOME/k8sgpt/k8sgpt.yaml` file. The data is stored in plain text, including your OpenAI key.
 
 Config file locations:
@@ -440,6 +490,7 @@ In these scenarios K8sGPT supports AWS S3 Integration.
  _As a prerequisite `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` are required as environmental variables._
 
 _Adding a remote cache_
+
 Note: this will create the bucket if it does not exist
 ```
 k8sgpt cache add --region <aws region> --bucket <name>