-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logs for blobfuse2 diverge from its source documentation #1333
Comments
@fabio-s-franco if you have installed aks managed blob csi driver (https://learn.microsoft.com/en-us/azure/aks/azure-blob-csi?tabs=NFS#enable-csi-driver-on-a-new-or-existing-aks-cluster), the log file is |
1 similar comment
@fabio-s-franco if you have installed aks managed blob csi driver (https://learn.microsoft.com/en-us/azure/aks/azure-blob-csi?tabs=NFS#enable-csi-driver-on-a-new-or-existing-aks-cluster), the log file is |
@andyzhangx That's precisely what I am saying is not the case: kubectl exec -it -n kube-system csi-blob-node-xxxxx -c blob -- bash
root@aks-syspool-xxxxxxxx-vmss000000:/# ls -la /var/log
total 20
drwxr-xr-x 1 root root 4096 Jan 19 02:42 .
drwxr-xr-x 1 root root 4096 Jul 3 2023 ..
drwxr-xr-x 2 root root 4096 Jan 19 02:42 apt
-rw-r--r-- 1 root root 3036 Jan 19 02:42 dpkg.log That's all there is |
@fabio-s-franco the logs are on the node, run following command
|
@andyzhangx Thank you for looking into it. I have 3 points I observe after following your suggestion: 1 - Using the helper container to shell into the k8s node still does not find the file at the specified location:
2 - The documentation is ambiguous on the meaning of node agent. Every command on the example talks about entering the shell of a container within "csi-node". At no moment it indicates otherwise, other then mentioning agent node, which can be interpreted as the csi-node daemon pod related to the node where the container is being mounted into. 3 - Setting aside the confusion, there is very little detail around using the CSI driver with a private endpoint that uses HNS or that it needs to connect via ADLS API, which requires a separate private endpoint (dfs, not blob one). |
can you provide the csi driver daemonset pod logs on the node? and what's the |
@andyzhangx This is a sample of the logs now from a pod on a node that provisions the blob mount:
Though I have already overcome the issue with DFS connectivity and the problem is more around the ability to properly investigate the cause. which I have more detailed logs on the issue from blobfuse2 repo. The contents of /var/log are:
The closest thing is /var/log/blobfuse-driver.log, which as you can see, is empty. edit: |
@fabio-s-franco from the csi driver logs on that node, I don't see there is any mount activity logs, is that the right node? |
@andyzhangx You are right, the node where the container was being mounted changed. I can now find these files:
But that accounts only for a small part of the problem (log-wise): 1 - The documentation does not detail the nuances of HNS enabled mounts, via private endpoint. Even when the parameter networkEndpointType=privatEndpoint, it does not work when HNS is enabled (which requires creating an additional private endpoint, for DFS subresource). Both kubelet and control plane user assigned identities had the contributor role over the entire subscription (to make sure that was not the problem). And in my case, the creation of the private endpoint is done in terraform because they are needed for other purposes, so I don't really want the driver to create it for me. 2 - Debugging documentation is not clear about some points (like shell into the aks node, instead of the blob container within csi-blob-node pod). I was only able to get it to work after understanding blobfuse2 and running commands there directly, verifying its logs until I could mount it (after seeing it tries to use .dfs endpoint when
And the matching logs on daemonset:
An example I posted on blobfuse2 issue: ===== RESPONSE ERROR (ServiceCode=AuthorizationFailure) =====
Description=403 This request is not authorized to perform this operation., Details: (none)
GET https://*.dfs.core.windows.net/mycont?maxResults=2&recursive=false&resource=filesystem&timeout=901 # <<<<<<< Ifinally was able to resolve it, when I saw this
Authorization: REDACTED
User-Agent: [Azure-Storage-Fuse/2.1.2 (Debian GNU/Linux 12 (bookworm)) Azure-Storage/0.1 (go1.20.5; linux)]
X-Ms-Client-Request-Id: [REDACTED]
X-Ms-Date: [Fri, 05 Apr 2024 11:40:17 GMT]
X-Ms-Version: [2018-11-09]
--------------------------------------------------------------------------------
RESPONSE Status: 403 This request is not authorized to perform this operation.
Content-Length: [194]
Content-Type: [application/json;charset=utf-8]
Date: [Fri, 05 Apr 2024 11:40:17 GMT]
Server: [Windows-Azure-HDFS/1.0 Microsoft-HTTPAPI/2.0]
X-Ms-Error-Code: [AuthorizationFailure]
X-Ms-Request-Id: [REDACTED]
X-Ms-Version: [2018-11-09]
] So, to sum it up. This process could have been a lot friendlier. And I believe that can be mostly achieved via documentation with accompanying examples and an improved debugging experience, which in part is derived from the wiki debug steps that could use a bit more clarification. I think that is it. My problem is already resolved, so I am here is just to provided some feedback to help the next warrior. |
@fabio-s-franco I see you have set following mount options in the volume, so
|
@andyzhangx Hi, no these logs are a bit misleading. When When calling the command with It's correct, |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What happened:
Had an issue (Azure/azure-storage-fuse#1376) with HNS enabled blobfuse2 based mount when it is only accessible via private endpoint (public network access disabled). It made it quite difficult to workout the issue because there are no log files on the location documented by blobfuse2 project
/var/log/blobfuse2.log
.In fact, the only way I could find any logs at all was to shell into the relevant node agent (csi-blob-node daemonset) and execute a mount command by hand. Other than that, there are no blobfuse2.log files available from within the blob container within the pod.
What you expected to happen:
Find log files to debug issues on the documented location of
/var/log/blobfuse2.log
How to reproduce it:
Enable the managed driver via azure-cli or any other means in an AKS cluster. Version installed on my cluster has blobfuse2 2.1.2
Anything else we need to know?:
The original issue was caused by the fact that blobfuse2 does not try to connect with standard (non HNS) endpoint fdqn, therefore the private endpoint naively setup using the blob as target subresource instead of dfs, will not be reachable as it will attempt to connect to *.dfs.core.windows.net instead of *.blob.core.windows.net, whose private DNS zone will not exist when the private endpoint was initially set to accommodate standard non HNS mounts. It is not clear and it is very difficult to debug that for the same storage account, if you can mount either HNS enabled or disabled containers, the former requires a private endpoint setup specifically for that purpose which resolves to a different fdqn than the latter.
On top of that it does not seem to be possible to configure the endpoint to point to another fdqn for that case. Even setting
AZURE_STORAGE_BLOB_ENDPOINT
environment variable directly, does not influence the fdqn used by blobfuse2 to connect.So, without any logs, it becomes nearly impossible to determine the root cause.
What is documented on https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/docs/csi-debug.md will only output what is directly sent to stdout. For example:
Error: failed to initialize new pipeline [failed to authenticate credentials for azstorage]
This error does not help determine the problem, which can have several different underlying root causes (I alone, have experienced three different ones, for the same error message). Which I believe is an issue for the blobfuse2 to handle (better error output), but nonetheless it doesn't help that no log is persisted within the node agent. Moreover systemctl (or journalctl) command does not work on node agent pod as blobfuse2 is not managed by systemd. So the documentation is either inaccurate or not applicable to managed installation of CSI Blob driver / AKS setup I have.
Environment:
1.22.5
kubectl version
):Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.9", GitCommit:"d33c44091f0e760b0053f06023e87a1c99dfd302", GitTreeState:"clean", BuildDate:"2024-01-31T01:58:06Z", GoVersion:"go1.20.12", Compiler:"gc", Platform:"linux/amd64"
uname -a
): 5.15.0-1058-azure chore: update helm charts #66-Ubuntu SMP Fri Feb 16 00:40:24 UTC 2024 x86_64 GNU/Linuxazure-cli
blobfuse2 2.1.2
The text was updated successfully, but these errors were encountered: