Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to retrieve instance data from ec2 metadata #2031

Closed
brizaldi opened this issue May 8, 2024 · 12 comments
Closed

Failed to retrieve instance data from ec2 metadata #2031

brizaldi opened this issue May 8, 2024 · 12 comments

Comments

@brizaldi
Copy link

brizaldi commented May 8, 2024

We're currently using CIS Amazon Linux 2 running on kubernetes version 1.29 and getting this error:

ebs-csi-node

I0508 04:49:41.864583       1 ec2.go:40] "Retrieving EC2 instance identity metadata" regionFromSession=""
I0508 04:49:41.864780       1 metadata.go:52] "failed to retrieve instance data from ec2 metadata; retrieving instance data from kubernetes api" err="could not get EC2 instance identity metadata: operation error ec2imds: GetInstanceIdentityDocument, request canceled, context deadline exceeded"
E0508 04:50:11.868297       1 main.go:154] "Could not determine region from any metadata service. The region can be manually supplied via the AWS_REGION environment variable." err="error getting instance data from ec2 metadata or kubernetes api"
panic: error getting instance data from ec2 metadata or kubernetes api                                                                                                                                                                                                                                                                                                                 

goroutine 1 [running]:
main.main()
    /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/cmd/main.go:155 +0xf27
@AndrewSirenko
Copy link
Contributor

Hi @brizaldi, the EBS CSI Driver's node service requires some source of instance/node metadata to function. By default, we attempt to use the EC2 Instance Metadata service, but fallback to querying the Kubernetes API. These errors point to neither source being reachable.

You will need to provide the EBS CSI Node pods with access to either IMDS (for example, by raising the hop limit, see our FAQ) or the Kubernetes API server (by finding and configuring what is blocking its access to enable communication between the pod and the Kubernetes API) for it to function.


Please ignore the The region can be manually supplied via the AWS_REGION environment variable." part of the error message. While the EBS CSI Driver controller pod can function with just the region being passed in, the node pod cannot.

@brizaldi
Copy link
Author

brizaldi commented May 9, 2024

I've already tried to set the hop limit to either 2 or 3, but still got the same error

here's what I've setup on terraform:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  eks_managed_node_groups = {
    default = {
      metadata_options = {
        "http_endpoint": "enabled",
        "http_put_response_hop_limit": 2,
        "http_tokens": "required"
      }
      ...
    }
  }
}

Btw do you know what port it used to communicate between the pod and Kubernetes API? I suspect it might be because I used CIS benchmark AMI, and they maybe blocked the ports. Since when I tried to use the usual Amazon Linux AMI there's no error happened.

@ConnorJC3
Copy link
Contributor

do you know what port it used to communicate between the pod and Kubernetes API?

IMDS is reached via contacting the special IP 169.254.169.254 (where AWS makes IMDS available to EC2 instances) on TCP port 80 (the standard HTTP port).

@dghubble
Copy link

I imagine the right outcome here is for the aws-ebs-csi-driver to add support for IMDSv2, especially now that AWS is pushing so hard for it and defaulting to disabling IMDSv1.

@ConnorJC3
Copy link
Contributor

ConnorJC3 commented May 25, 2024

@dghubble The EBS CSI driver does support IMDSv2 and will use it if available, however the default IMDSv2 configuration prevents containers from accessing it.

You can give the EBS CSI Driver access by running it in host networking mode, or you can give all containers access (note: generally considered a security bad practice) by increasing IMDSv2's hop limit.

@saku3071
Copy link

Facing the same issue . Any update in fix ?

@ConnorJC3
Copy link
Contributor

The fix is to configure your cluster so that the EBS CSI Driver node pods have access to either IMDS or the Kubernetes API. Access to one of the two is a hard requirement for use of the EBS CSI Driver.

@saku3071
Copy link

saku3071 commented Jun 18, 2024

@ConnorJC3 These are AWS managed addons . How do we

  • configure individual pods have access to either IMDS or the Kubernetes API ?

from node level its working good when trying to reaching below Kubernetes API .

faced issue when moved from rhel 7 to rhel 9 ami only doe ebs add on and also core dns. on older rhel 7 its working good if reverted .

  • any iptables rules extra needed here in specific ?

EBS addon logs:

I0618 15:14:03.963302 1 main.go:135] Version: v2.10.1
I0618 15:14:03.963388 1 main.go:136] Running node-driver-registrar in mode=
I0618 15:14:03.963401 1 main.go:157] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0618 15:14:03.967563 1 main.go:164] Calling CSI driver to discover driver name
I0618 15:14:03.980756 1 main.go:173] CSI driver name: "ebs.csi.aws.com"
I0618 15:14:03.980829 1 node_register.go:55] Starting Registration Server at: /registration/ebs.csi.aws.com-reg.sock
I0618 15:14:03.982694 1 node_register.go:64] Registration Server started at: /registration/ebs.csi.aws.com-reg.sock
I0618 15:14:03.982909 1 node_register.go:88] Skipping HTTP server because endpoint is set to: ""
I0618 15:14:04.539319 1 main.go:90] Received GetInfo call: &InfoRequest{}
I0618 15:14:04.572753 1 main.go:101] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
I0618 15:14:06.731893 1 main.go:133] "Calling CSI driver to discover driver name"
I0618 15:14:06.737665 1 main.go:141] "CSI driver name" driver="ebs.csi.aws.com"
I0618 15:14:06.737702 1 main.go:170] "ServeMux listening" address="0.0.0.0:9808"
I0618 15:20:36.766742 1 ec2.go:40] "Retrieving EC2 instance identity metadata" regionFromSession=""
I0618 15:20:36.766905 1 metadata.go:52] "failed to retrieve instance data from ec2 metadata; retrieving instance data from kubernetes api" err="could not get EC2 instance identity metadata: operation error ec2imds: GetInstanceIdentityDocument, request canceled, context deadline exceeded"
E0618 15:21:06.770791 1 main.go:154] "Could not determine region from any metadata service. The region can be manually supplied via the AWS_REGION environment variable." err="error getting instance data from ec2 metadata or kubernetes api"
panic: error getting instance data from ec2 metadata or kubernetes api

goroutine 1 [running]:
main.main()
/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/cmd/main.go:155 +0xfb9


coredns logs:

[INFO] plugin/reload: Running configuration SHA512 = 8a7d59126e7f114ab49c6d2613be93d8ef7d408af8ee61a710210843dc409f03133727e38f64469d9bb180f396c84ebf48a42bde3b3769730865ca9df5eb281c
CoreDNS-1.9.3
linux/amd64, go1.20.4, c9dedfbf
[ERROR] plugin/errors: 2 3893335740271709553.8833067999452524873. HINFO: read udp 10.162.210.216:35664->10.162.128.2:53: i/o timeout
[ERROR] plugin/errors: 2 3893335740271709553.8833067999452524873. HINFO: read udp 10.162.210.216:60397->10.162.128.2:53: i/o timeout
[ERROR] plugin/errors: 2 3893335740271709553.8833067999452524873. HINFO: read udp 10.162.210.216:54032->10.162.128.2:53: i/o timeout
[ERROR] plugin/errors: 2 3893335740271709553.8833067999452524873. HINFO: read udp 10.162.210.216:42794->10.162.128.2:53: i/o timeout
[ERROR] plugin/errors: 2 3893335740271709553.8833067999452524873. HINFO: read udp 10.162.210.216:41396->10.162.128.2:53: i/o timeout
[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
.:53
[INFO] plugin/reload: Running configuration SHA512 = 8a7d59126e7f114ab49c6d2613be93d8ef7d408af8ee61a710210843dc409f03133727e38f64469d9bb180f396c84ebf48a42bde3b3769730865ca9df5eb281c
CoreDNS-1.9.3
linux/amd64, go1.20.4, c9dedfbf
[ERROR] plugin/errors: 2 3368869207192152190.3618717134880642323. HINFO: read udp 10.162.210.216:38294->10.162.128.2:53: i/o timeout
[ERROR] plugin/errors: 2 3368869207192152190.3618717134880642323. HINFO: read udp 10.162.210.216:59287->10.162.128.2:53: i/o timeout
[ERROR] plugin/errors: 2 3368869207192152190.3618717134880642323. HINFO: read udp 10.162.210.216:46973->10.162.128.2:53: i/o timeout
[ERROR] plugin/errors: 2 3368869207192152190.3618717134880642323. HINFO: read udp 10.162.210.216:34990->10.162.128.2:53: i/o timeout
[ERROR] plugin/errors: 2 3368869207192152190.3618717134880642323. HINFO: read udp 10.162.210.216:49502->10.162.128.2:53: i/o timeout
[ERROR] plugin/errors: 2 3368869207192152190.3618717134880642323. HINFO: read udp 10.162.210.216:54621->10.162.128.2:53: i/o timeout
[ERROR] plugin/errors: 2 3368869207192152190.3618717134880642323. HINFO: read udp 10.162.210.216:54628->10.162.128.2:53: i/o timeout
[ERROR] plugin/errors: 2 3368869207192152190.3618717134880642323. HINFO: read udp 10.162.210.216:50492->10.162.128.2:53: i/o timeout
[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://172.20.0.1:443/version": dial tcp 172.20.0.1:443: i/o timeout
[ERROR] plugin/errors: 2 3368869207192152190.3618717134880642323. HINFO: read udp 10.162.210.216:59475->10.162.128.2:53: i/o timeout
[ERROR] plugin/errors: 2 3368869207192152190.3618717134880642323. HINFO: read udp 10.162.210.216:55621->10.162.128.2:53: i/o timeout
[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://172.20.0.1:443/version": dial tcp 172.20.0.1:443: i/o timeout
[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s

@ConnorJC3
Copy link
Contributor

Your logs likely indicate a networking issue, I would check if your pod networking (CNI plugin) is working.

@brizaldi
Copy link
Author

@here

Since I need the cluster to be ready soon, I switched to using the Bottlerocket image, which also has the CIS Bottlerocket Benchmark Level 1 out of the box.

I will let you guys decide whether to close this issue or keep it open for discussion. Thanks.

@ConnorJC3
Copy link
Contributor

/close

Because this does not appear to be a bug in the driver itself, and is rather an issue with the CIS image, I'm going to close this issue out. Please reopen this issue or create a new issue if further support is needed.

@k8s-ci-robot
Copy link
Contributor

@ConnorJC3: Closing this issue.

In response to this:

/close

Because this does not appear to be a bug in the driver itself, and is rather an issue with the CIS image, I'm going to close this issue out. Please reopen this issue or create a new issue if further support is needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants