You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been having a problem wherein some runs fail intermittently due to a tainted resource. Maybe 30% of the runs. Took me awhile to get a test case w/ the correct logging inserted to prove what was wrong. The bug exists in both master and the f-formalize-interps branch. I have a branch w/ my fix that I'll PR to master. It may be a quick fix to the f-formalize-interps branch as well if this is the sort of solution we want to merge.
Anyway, here's the problem. We use vpc so we have to connect to the private_ip of the instance for provisioning. Unfortunately, this bug may crop up in other places so it needs a bigger fix than just adding a private_ip toggle for aws.
This will sometimes copy the file to the wrong host, due to the wrong value being returned from the element interpolation in the connection object. Then when it tries to remote-exec that script on the correct host, it fails because the script is not found.
Here's a case that shows the problem that I just pulled from the f-formalize-interps branch w/ added logging. You can see what I added here:
^^ instance.2 is where we see the bug. it provisions to the wrong node.
below is the rest of the run.
2015/01/14 11:39:49 ==> interpolationFuncElement: args => []interface {}{"10.10.1.75B780FFEC-B661-4EB8-9236-A01737AD98B610.10.1.49", "1"}
module.env.aws_instance.zookeeper_aws_instance.1: Provisioning with 'remote-exec'...
2015/01/14 11:39:49 terraform-provisioner-remote-exec: 2015/01/14 11:39:49 reconnecting to TCP connection for SSH
module.env.aws_instance.zookeeper_aws_instance.1 (remote-exec): Connecting to remote host via SSH...
module.env.aws_instance.zookeeper_aws_instance.1 (remote-exec): Host: 10.10.1.49
module.env.aws_instance.zookeeper_aws_instance.1 (remote-exec): User: ubuntu
module.env.aws_instance.zookeeper_aws_instance.1 (remote-exec): Password: false
module.env.aws_instance.zookeeper_aws_instance.1 (remote-exec): Private key: true
2015/01/14 11:39:49 terraform-provisioner-file: 2015/01/14 11:39:49 opening new ssh session
<snip>
2015/01/14 11:39:50 ==> interpolationFuncElement: args => []interface {}{"10.10.1.75B780FFEC-B661-4EB8-9236-A01737AD98B610.10.1.49", "2"}
module.env.aws_instance.zookeeper_aws_instance.2: Provisioning with 'remote-exec'...
module.env.aws_instance.zookeeper_aws_instance.2 (remote-exec): Connecting to remote host via SSH...
module.env.aws_instance.zookeeper_aws_instance.2 (remote-exec): Host: 10.10.1.75
module.env.aws_instance.zookeeper_aws_instance.2 (remote-exec): User: ubuntu
module.env.aws_instance.zookeeper_aws_instance.2 (remote-exec): Password: false
module.env.aws_instance.zookeeper_aws_instance.2 (remote-exec): Private key: true
<snip>
module.env.aws_instance.zookeeper_aws_instance.1 (remote-exec): Connected! Executing scripts...
<snip>
module.env.aws_instance.zookeeper_aws_instance.2 (remote-exec): Connected! Executing scripts...
<snip>
2015/01/14 11:39:51 ==> interpolationFuncElement: args => []interface {}{"10.10.1.102B780FFEC-B661-4EB8-9236-A01737AD98B610.10.1.75B780FFEC-B661-4EB8-9236-A01737AD98B610.10.1.49", "0"}
module.env.aws_instance.zookeeper_aws_instance.0: Provisioning with 'file'...
module.env.aws_instance.zookeeper_aws_instance.0 (file): private_ip attribute: "10.10.1.102"
module.env.aws_instance.zookeeper_aws_instance.0 (file): connection:host attribute: "10.10.1.102"
2015/01/14 11:39:51 terraform-provisioner-remote-exec: 2015/01/14 11:39:51 SCP session complete, closing stdin pipe.
2015/01/14 11:39:51 terraform-provisioner-remote-exec: 2015/01/14 11:39:51 Waiting for SSH session to complete.
module.env.aws_instance.zookeeper_aws_instance.1 (remote-exec): sudo: unable to resolve host ip-10-10-1-49
module.env.aws_instance.zookeeper_aws_instance.1 (remote-exec): sudo: /tmp/foo.sh: command not found
2015/01/14 11:39:51 terraform-provisioner-remote-exec: 2015/01/14 11:39:51 remote command exited with '1': /tmp/script.sh
2015/01/14 11:39:51 [ERROR] Error walking 'aws_instance.zookeeper_aws_instance.1': 1 error(s) occurred:
* Script exited with non-zero exit status: 1
module.env.aws_instance.zookeeper_aws_instance.1: Creation complete
<snip>
module.env.aws_instance.zookeeper_aws_instance.2: Creation complete
<snip>
2015/01/14 11:39:56 ==> interpolationFuncElement: args => []interface {}{"10.10.1.102B780FFEC-B661-4EB8-9236-A01737AD98B610.10.1.49", "0"}
module.env.aws_instance.zookeeper_aws_instance.0: Provisioning with 'remote-exec'...
module.env.aws_instance.zookeeper_aws_instance.0 (remote-exec): Connecting to remote host via SSH...
<snip>
module.env.aws_instance.zookeeper_aws_instance.0 (remote-exec): Host: 10.10.1.102
module.env.aws_instance.zookeeper_aws_instance.0 (remote-exec): User: ubuntu
module.env.aws_instance.zookeeper_aws_instance.0 (remote-exec): Password: false
module.env.aws_instance.zookeeper_aws_instance.0 (remote-exec): Private key: true
<snip>
2015/01/14 11:39:58 [INFO] Apply walk complete
2015/01/14 11:39:58 [INFO] Writing backup state to: terraform.tfstate.backup
<snip>
module.env.aws_instance.zookeeper_aws_instance.0: Creation complete
2015/01/14 11:39:58 waiting for all plugin processes to complete...
Error applying plan:
1 error(s) occurred:
* Script exited with non-zero exit status: 1
Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
<snip>
As you can see, if the resources are being applied in parallel and they use element() interpolation there is a chance that not all of the resources have been completely initialized so the "aws_instance.zookeeper_aws_instance.*.private_ip" functionality doesn't return all of the instances like you'd expect.
You can see the fix I wrote on master in #794 and I think it solves it for all cases but there may be an edge case. All this fix does is preserve the correct length of the array by filling in the uninitialized instances with a placeholder.
The text was updated successfully, but these errors were encountered:
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
ghost
locked and limited conversation to collaborators
May 4, 2020
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I have been having a problem wherein some runs fail intermittently due to a tainted resource. Maybe 30% of the runs. Took me awhile to get a test case w/ the correct logging inserted to prove what was wrong. The bug exists in both master and the f-formalize-interps branch. I have a branch w/ my fix that I'll PR to master. It may be a quick fix to the f-formalize-interps branch as well if this is the sort of solution we want to merge.
Anyway, here's the problem. We use vpc so we have to connect to the private_ip of the instance for provisioning. Unfortunately, this bug may crop up in other places so it needs a bigger fix than just adding a private_ip toggle for aws.
With this configuration:
This will sometimes copy the file to the wrong host, due to the wrong value being returned from the element interpolation in the connection object. Then when it tries to remote-exec that script on the correct host, it fails because the script is not found.
Here's a case that shows the problem that I just pulled from the f-formalize-interps branch w/ added logging. You can see what I added here:
Banno@bb9feea
^^ instance.1 provisions to the correct node.
^^ instance.2 is where we see the bug. it provisions to the wrong node.
below is the rest of the run.
As you can see, if the resources are being applied in parallel and they use element() interpolation there is a chance that not all of the resources have been completely initialized so the "aws_instance.zookeeper_aws_instance.*.private_ip" functionality doesn't return all of the instances like you'd expect.
You can see the fix I wrote on master in #794 and I think it solves it for all cases but there may be an edge case. All this fix does is preserve the correct length of the array by filling in the uninitialized instances with a placeholder.
The text was updated successfully, but these errors were encountered: