-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First run during deployment produce 404 error #122
Comments
Your configuration for the DS would be helpful.
If you put on verbose logging you will get a lot of information that may would be helpful to understand if the pod is being indexed properly prior to the initial call. |
|
Hello @SharpEdgeMarshall, I'm experiencing strange behaviors and hangs with boto3 and kube2iam. Can you share a little more about the code your are trying to deploy in your container(s). @jrnt30: When you say 'latest version' do you mean 0.9.0? I don't see any code changes there that indicate changes in caching. https://github.com/jtblin/kube2iam/releases/tag/0.9.0 Could you clarify? Thank you. |
Closing as "not repro", let me know if you experience this again. 0.10.0 should help with performance as well. |
Observing the same problem, the first start in all deployments that begins with downloading stuff from S3 via aws-cli always fails, after the first pod restart they come up fine. common for all of them is a 404 in the kube2iam logs & error:
Kube2iam: 0.10.0
|
With verbose log:
|
This does seem to be the same type of timing issue that we have seen before. You probably pieced this together, but in your 2nd example, the Kube2IAM watch has not received notification that the Pod has become running (which is when we get the IP and cache the There are a few events that Kube2IAM is registering in your example, but if you notice the I don't know how, in it's current form, we could give people both things they need:
A much bigger solution that has floated around would be to augment the "Pod based" IAM approach and augment it with a "Service Account" based IAM approach. This would solve part of this problem potentially, because we can see the Pod's service account before it ever gets an IP so we could start the STS handshake and caching of credentials even if we hadn't gotten the I don't want to "Fork" the repo because I feel like @jtblin deserves to get credit for the good work he's put into this and the very crafty idea, but if there isn't any updates on the #132 I will probably just start working and publishing something myself |
Still same situation here even updating k8s and kube2iam to 1.9.8 and 0.10.0 respectively. |
We're in the exact same situation ... jobs which attempt/fail to pull settings from S3 as their first action |
@jrnt30 I'm interested in which direction you ended up going, if any. We ended up mitigating this issue by increasing the |
we wrote a shell script that runs as part of our run command: attempt=0
exitCode=0
response=0
max_attempts=${MAX_ATTEMPTS:-30}
jitter=${JITTER:-5}
while [[ $attempt -lt $max_attempts ]] ; do
# Jitter requests to break resource contention
timeout=$((1 + RANDOM % jitter))
# 169.254.169.254/latest/meta-data is reserved by AWS to access the Metadata API to retrieve IAM credentials.
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html
# This curl command is hitting the Metadata API which routes through Kube2iam via IP Tables
# https://github.com/jtblin/kube2iam#iptables
# If the requests return a 200OK, then it's implied that we can communicate with Kube2iam
response=$(curl --write-out %{http_code} --silent --output /dev/null --connect-timeout 2 'http://169.254.169.254/latest/meta-data/iam/security-credentials/')
if [[ $response == 200 ]] ; then
exitCode=0
break
fi
echo "Attempt #${attempt}. Failure connecting to Kube2iam! Retrying in ${timeout}s." 1>&2
sleep $timeout
attempt=$(( attempt + 1 ))
done once kube2iam returns 200 OK, the rest of the run command proceeds. Note that this has implications for local development, since presumably you won't have kube2iam running on developers' laptops when they run these containers. but the real solution is to switch to kiam which does not have this kube2iam race condition |
@dblackdblack That's another route we were considering. Thanks for the example script! If kiam doesn't have this issue I will have to take a serious look at switching. |
With kiam you have to run some dedicated machines (> 1 in order to avoid a single point of failure) just to run the kiam server component, so it's not a drop-in replacement for kube2iam (since kiam requires more infrastructure than kube2iam). |
When you say you added this to your "run command" do you mean the final script that gets executed for your Docker container? Just want to make sure I attempt to use this the right way. Thanks! |
All of our docker images inherit from a single common base image. That base image contains a script named |
create image from amazon/aws-cli with a run script that will ensure the aws credentials are available before executing the command. script taken from: jtblin/kube2iam#122 (comment) related issue: jtblin/kube2iam#136
Kube2iam version: 0.8.1
Kubernetes version: 1.8.4
CNI: weave 2.0.5
Only during deployment we are experiencing an error retrieving aws credentials with boto, the pod crashes and is restarted, then works perfectly.
This is the log line from kube2iam during the error
App error log
The text was updated successfully, but these errors were encountered: