Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.10.0: "terraform init" works, but "terraform plan" fails with "netrpc: connect: no such file or directory" #15756

Closed
patrickdappollonio opened this issue Aug 7, 2017 · 25 comments · Fixed by #15788

Comments

@patrickdappollonio
Copy link

patrickdappollonio commented Aug 7, 2017

Hello,

I have a custom provider called myprovider which I recently upgraded to terraform 0.10.0 using Go dep tool. My toml file looks like:

[[constraint]]
  name = "github.com/hashicorp/terraform"
  version = "0.10.0"

Then I built the binary and delivered it as a zip file to a VM, along with the one available in the Terraform downloads page. I tried running some HCL file and it prompted me to run terraform init which I did, and then when ran terraform plan I get the following:

$ ./terraform plan
Error asking for user input: 1 error(s) occurred:

* provider.myprovider: dial unix /tmp/plugin263716127|netrpc: connect: no such file or directory

If I only define the provider with no resources, it works fine by saying "No changes. Infrastructure is up-to-date."

More info below:

Terraform Version

$ ./terraform version
Terraform v0.10.0

Terraform Configuration Files

This works (returns there's no changes and infrastructure is up-to-date):

provider "myprovider" {
  region = "demo"
}

Whereas this fails with the error mentioned above:

provider "myprovider" {
  region = "demo"
}

resource "myprovider_instance" "default" {}

Do note that running terraform apply on 0.9.11 will fail since it doesn't have some required fields defined in the HCL code.

Debug Output

Link here

Expected Behavior

It should detect and run the provider and say that the validation for required attributes failed.

Actual Behavior

It fails with the message dial unix /tmp/plugin363438245|netrpc: connect: no such file or directory

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. Compile a custom terraform provider from Go code
  2. Download terraform from the releases page and place it the same folder as the compiled terraform provider
  3. Zip and scp or similar both the provider and terraform to other machine (or the same machine).
  4. Create a terraform file with just the provider definition and no resources, and run terraform plan. See it prompts for running terraform init.
  5. Run terraform init, checks that it successfully executes
  6. Run terraform plan again.
  7. Notice it works okay.
  8. Modify the HCL code to add an empty resource from your custom provider.
  9. Run terraform plan again, notice it fails with the error message.

Important Factoids

Virtual Machine, running Ubuntu:

Distributor ID: Ubuntu
Description:    Ubuntu 16.04.2 LTS
Release:        16.04
Codename:       xenial

The interesting part is, it only fails when adding a resource, but with no resource and just the provider declaration it works perfectly.

@apparentlymart
Copy link
Contributor

Hi @patrickdappollonio! Sorry this isn't working.

Based on what you describe here -- especially that it fails only when there's a resource block in the configuration -- it sounds like the plugin binary is failing to start up for some reason. Could you try running directly the command /home/patrick/Development/linux/terraform-provider-myprovider and share any output it produces? If working as expected, this should produce an error message about it being a plugin binary and not a program to be run directly, but I suspect it may produce some other sort of error that's preventing the program from starting up at all.

That it works okay with 0.9 is also very interesting. Is that using exactly the same provider binary as you're using with 0.10, or is it a separate build of the plugin that could potentially behave differently?

@apparentlymart apparentlymart added bug cli waiting-response An issue/pull request is waiting for a response from the community labels Aug 7, 2017
@patrickdappollonio
Copy link
Author

patrickdappollonio commented Aug 7, 2017

$ ./terraform-provider-myprovider
This binary is a plugin. These are not meant to be executed directly.
Please execute the program that consumes these plugins, which will
load any plugins automatically

$ file terraform-provider-myprovider
terraform-provider-myprovider: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped

I wonder if it fails because of the validation part of the schema. myprovider defines a required name parameter which isn't there.

I have different builds for 0.9 and 0.10, since I don't want the vendored, frozen copy of terraform in 0.9 to misuse the same code in 0.10. And just to be fair, the provider used to use glide in 0.9, but I switched to dep as a package manager in 0.10, and instead of just updating, I actually removed the vendor folder, along with glide.yaml and glide.lock and recreate everything from scratch (fresh) using the toml defined before.

Let me know if you have any more questions!

@apparentlymart
Copy link
Contributor

Thanks for this extra information, @patrickdappollonio!

Just to constrain the possibilities here, could you try copying the plugin executable you're using with 0.9 to a location where your 0.10 install will find it, and try this again? Plugin binaries built for 0.9 should be compatible with 0.10, since there were no changes to the actual API here, just the discovery/installation mechanism.

This test will hopefully allow us to narrow down whether the inconsistency lives on the side of Terraform core or on the side of the plugin itself (which includes some code from the core repository, included as a library, which may have changed.)

@patrickdappollonio
Copy link
Author

patrickdappollonio commented Aug 8, 2017

Using terraform binary 0.10.0:

$ ./terraform -v
Terraform v0.10.0

$ ./terraform-provider-myprovider
This binary is a plugin. These are not meant to be executed directly.
Please execute the program that consumes these plugins, which will
load any plugins automatically

So, with myprovider using the vendored dependency in 0.9.11 and no resource:

$ cat demo.tf
provider "myprovider" {
  region = "demo"
}

$ ./terraform init
Initializing provider plugins...

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

$ ./terraform plan 
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

No changes. Infrastructure is up-to-date.

This means that Terraform did not detect any differences between your
configuration and real physical resources that exist. As a result, Terraform
doesn't need to do anything.

Now with a resource:

$ cat demo.tf
provider "myprovider" {
  region = "demo"
}

resource "myprovider_instance" "default" {}

$ ./terraform init

Initializing provider plugins...

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

$ ./terraform plan
5 error(s) occurred:

* myprovider_instance.default: "name": required field is not set

So it does work with an old binary with the repo at 0.9.11 vendored. I would expect that by doing the terraform plan command it'll fail with both at 0.10.0 saying that the validation didn't pass, like the second case.

I'm here if you guys need any extra information.

@patrickdappollonio patrickdappollonio changed the title Terraform 0.10.0: "terraform init" works, but "terraform plan" fails with "netrpc: connect: no such file or directory" 0.10.0: "terraform init" works, but "terraform plan" fails with "netrpc: connect: no such file or directory" Aug 8, 2017
@patrickdappollonio
Copy link
Author

Just as an extra comment, I generate my binaries with the following command:

GOOS=linux go build -a -tags netgo -ldflags '-s -w' -o $(FILENAME)

@nbering
Copy link

nbering commented Aug 8, 2017

I got this error when I built my plugin from the latest version of https://github.com/hashicorp/go-plugin. Using hashicorp/go-plugin@f72692a as frozen in the vendor file for https://github.com/terraform-providers/terraform-provider-azurerm fixed my problem.

@patrickdappollonio
Copy link
Author

patrickdappollonio commented Aug 9, 2017

I can confirm @nbering's solution works. I pinned the package to f72692 and now I properly get my error messages back rather than failing with the socket message:

$ ./terraform plan
5 error(s) occurred:

* myprovider_instance.default: "name": required field is not set
<others ommited...>

For the sake of completeness, here's both of my gopkg.lock files generated by Go's dep tool when a) only pinning terraform to 0.10.0 (filename up-to-date-gopkg.toml) and b) when pinning both terraform to 0.10.0 and go-plugin to f72692a (filename pinned-go-plugin-gopkg.toml).

And here's a diff of both Gopkg.lock and Gopkg.toml files. Do note that pinning go-plugin removes google.golang.org/grpc and github.com/golang/protobuf which may define the communication issue.

@patrickdappollonio
Copy link
Author

Tracking the issue further down the road, it definitely happens when go-plugin made the switch to google.golang.org/grpc rather than the built-in custom solution it used before.

Commit b7d6477501c13292d71fd3b8e688269e51b028ba marks the last commit to use the old grpc model rather than using Go's own GRPC library. I compiled my binary against that SHA and it did work with no problem. Then, the last commit that go-plugin changed to google.golang.org/grpc is 5ee1a665d1862b865bd2203cdce4f08e39841815 (which updates the readme to say that any program that can use grpc can create plugins) breaks my provider.

Pinning to b7d64775 seems to do the trick.

@apparentlymart
Copy link
Contributor

Ahh right, yes... sorry that development work in go-plugin has been going on in parallel with the Terraform 0.10 work but we didn't run into it precisely because we'd been intentionally vendoring the "old" version.

Sorry this ended up falling on you both to debug. At the very least we should find somewhere visible to document this until we get Terraform itself ready to use the new grpc-based plugin protocol (which will be 0.11 at the absolute earliest, since it'd be a breaking change for existing plugin binaries) since I imagine all plugin developers will run into this if they're using the usual workflow of just taking what's on master when seeding a vendor dir.

@patrickdappollonio
Copy link
Author

patrickdappollonio commented Aug 9, 2017

Well technically, we're vendoring terraform and not go-plugin. The Go plugin side is a dependency of terraform and I would assume that tools should be able to pick a copy of go-plugin on the terraform vendor/ folder, but the versioned copy is from February and I guess the dep tool doesn't work well with govendor, since it didn't pick it up.

@patrickdappollonio
Copy link
Author

There's some work going around in dep with people reporting the same issue: Import should create overrides for transitive dependencies #845. Linking it here for whoever falls into the same issue.

@apparentlymart
Copy link
Contributor

Hi all!

After some internal discussion with the people who were working on go-plugin, I've learned that the intent was to preserve compatibility but there was an oversight in how the "client" (Terraform core) parses the handshake sent by the "server" (the provider itself) which caused the handshake to be misinterpreted.

The good news is that we think this can be addressed just by upgrading the go-plugin library vendored into Terraform core, since the new "client" code will accept both the new and old handshake formats. We're gonna do some more testing here to validate this plan but hopefully we should be able to remove this gotcha for future releases, allowing plugins to be built against the master branch of go-plugin again.

Sorry again for this oversight, and thanks to @patrickdappollonio and @nbering for digging in and figuring out where the issue was coming from.

@apparentlymart apparentlymart removed the waiting-response An issue/pull request is waiting for a response from the community label Aug 9, 2017
apparentlymart added a commit that referenced this issue Aug 11, 2017
This puts us on a version that has grpc protocol support. Although we're
not actually using that yet, the plugin has handshake changed slightly to
allow plugins to declare whether they use the old or new protocols, and
so this upgrade allows us to support plugins that were built against
newer versions of go-plugin that include this extra field in the
handshake.

This fixes #15756.
apparentlymart added a commit that referenced this issue Aug 11, 2017
This puts us on a version that has grpc protocol support. Although we're
not actually using that yet, the plugin has handshake changed slightly to
allow plugins to declare whether they use the old or new protocols, and
so this upgrade allows us to support plugins that were built against
newer versions of go-plugin that include this extra field in the
handshake.

This fixes #15756.
apparentlymart added a commit that referenced this issue Aug 11, 2017
This puts us on a version that has grpc protocol support. Although we're
not actually using that yet, the plugin has handshake changed slightly to
allow plugins to declare whether they use the old or new protocols, and
so this upgrade allows us to support plugins that were built against
newer versions of go-plugin that include this extra field in the
handshake.

This fixes #15756.
apparentlymart added a commit that referenced this issue Aug 11, 2017
This puts us on a version that has grpc protocol support. Although we're
not actually using that yet, the plugin has handshake changed slightly to
allow plugins to declare whether they use the old or new protocols, and
so this upgrade allows us to support plugins that were built against
newer versions of go-plugin that include this extra field in the
handshake.

This fixes #15756.
@apparentlymart
Copy link
Contributor

I've upgraded go-plugin in master, so after the next release we should be able to properly support plugins that are themselves built against grpg-capable go-plugin versions.

Thanks again for the debugging effort here, and sorry we didn't catch this before the 0.10.0 release.

hh pushed a commit to crosscloudci/cross-cloud that referenced this issue Aug 13, 2017
To fix Bug with Plugins affected by hashicorp/terraform#15756
Upstream commit we need to fix Bug hashicorp/terraform@ee5fc3b
hh pushed a commit to crosscloudci/cross-cloud that referenced this issue Aug 13, 2017
To fix Bug with Plugins affected by hashicorp/terraform#15756
Upstream commit we need to fix Bug hashicorp/terraform@ee5fc3b
@nbering
Copy link

nbering commented Aug 14, 2017

Thanks! Some of that experimental work in go-plugin may have been triggered by some experimental work I was doing with an attempt to port the plugin interface to other languages. That stalled for the moment due to time constraints, but I still have a hopes of writing a NodeJS plugin interface with that added protocol buffers support. Hopefully pulling in those upstream changes will make it easier when I pick up on that experiment again.

I was also a little surprised that govendor didn't pick up on the go-plugin version when fetching the terraform dependency.

@apparentlymart
Copy link
Contributor

Hi @nbering! Multi-language plugins are indeed the intended use-case here, though to be honest it's not Terraform driving that right now, but rather other things that also use go-plugin. We do hope to switch to grpc for Terraform eventually, but I just want to be clear that this is not a short-term goal since Terraform is currently doing some rather grpc-unfriendly things that we'll need to address first.

@nbering
Copy link

nbering commented Aug 14, 2017

@apparentlymart Ya, no problem. I honestly took on that project from a certain level of naivety. I was willing to read the code to figure out the protocol, but I didn't realize that the plugin API was like an iceberg. There's levels of the API that mimic the TCP protocol in software (yamux), but in the end the real killer up-front was that the gob protocol used by Go's RPC library was not even a little bit portable between languages. Not without some PHD-level work, anyway. I'm going to continue work on the protocol buffers implementation as I can, since it's a very interesting academic exercise - if nothing else - but there was definitely a lot more to it than I was initially expecting. If anyone is interested, I am willing to share the discoveries I maid with my first foray, and after a little more cleanup I'll open up my initial experiments, but it's definitely far from a working prototype. Specifically, Yamux remains the biggest sticking point. At this point I'm basically trying to implement Yamux in node without a test suite to work against. I'm also learning Go at the same time, so it's pretty exhausting work.

Again, not pushing for any timelines or anything like that... it was just something I was playing with and asked a few questions on go-plugin and was really pleasantly surprised when @mitchellh came in with some commits that moved my project forward a bit.

@jritsema
Copy link
Contributor

jritsema commented Aug 28, 2017

I'm running into this same issue as well. @patrickdappollonio are you saying there's not an easy way to resolve this with dep yet? I've been using glide and it doesn't look like there's a good solution for pinning to the older version of go-plugin. Any tips? Thanks!

@patrickdappollonio
Copy link
Author

patrickdappollonio commented Aug 28, 2017

@jritsema it is definitely possible to solve it with dep, I never said that it wasn't or that it isn't easy: the package management tools make this easy to solve as editing the package definition. Use this in your Gopkg.toml and then run dep ensure:

[[override]]
  name = "github.com/hashicorp/go-plugin"
  revision = "b7d6477501c13292d71fd3b8e688269e51b028ba"

For Glide is pretty much the same thing, followed by a glide up or glide install (your choice):

- package: github.com/hashicorp/go-plugin
  version: b7d6477501c13292d71fd3b8e688269e51b028ba

@jritsema
Copy link
Contributor

Thanks @patrickdappollonio. I tried that and got the following error building my plugin...

vendor/github.com/hashicorp/terraform/plugin/client.go:26: unknown field 'Logger' in struct literal of type plugin.ClientConfig

Any idea which version of terraform is compatible?

@patrickdappollonio
Copy link
Author

patrickdappollonio commented Aug 28, 2017

I'm running both 0.9.11 and 0.10.0 (both pinned with glide or go dep)with that go-plugin version with no issues whatsoever. Can you share your glide.yaml or Gopkg.toml file?

@jritsema
Copy link
Contributor

I tried both tf 0.10.0 and 0.10.2

[[constraint]]
  name = "github.com/hashicorp/terraform"
  version = "0.10.0"

[[constraint]]
  name = "github.com/parnurzeal/gorequest"
  version = "0.2.15"

[[constraint]]
  name = "github.com/turnerlabs/harbor-auth-client"
  version = "1.1.0"

[[override]]
  name = "github.com/hashicorp/go-plugin"
  revision = "b7d6477501c13292d71fd3b8e688269e51b028ba"

@jritsema
Copy link
Contributor

That was it! Saw in Gopkg.lock that it was grabbing the 0.10.2. Sorry, first time using dep...thanks so much for you help @patrickdappollonio!

@patrickdappollonio
Copy link
Author

patrickdappollonio commented Aug 28, 2017

Perfect! And sorry, I misclicked the "delete comment" by mistake and there's no "are you sure?" message coming when you click it so sadly it's gone now.

For people coming in because of the same issue, the problem was that dep uses pinning in a slightly different way: You need to type version = "=0.10.0" (note the equal sign inside the quotes) to pin to an specific version. If you don't, it means 0.10.0 and above (which in this case and at the time of writing this comment, it'll be 0.10.2).

@dikhan
Copy link

dikhan commented Dec 26, 2017

Thanks a lot for sharing this issue @patrickdappollonio ...was having some trouble to pin down the problem and this ticket was the salvation 🥇

dikhan added a commit to dikhan/terraform-provider-openapi that referenced this issue Dec 28, 2017
- Pinned down a specific version of terraform plugins and terraform
for compatibility reasons. Check the following link for more info:
	hashicorp/terraform#15756
dikhan added a commit to dikhan/terraform-provider-openapi that referenced this issue Dec 28, 2017
- Pinned down a specific version of terraform plugins and terraform
for compatibility reasons. Check the following link for more info:
	hashicorp/terraform#15756
devstar0826 added a commit to devstar0826/terraform-provider-openapi that referenced this issue Oct 24, 2019
- Pinned down a specific version of terraform plugins and terraform
for compatibility reasons. Check the following link for more info:
	hashicorp/terraform#15756
@ghost
Copy link

ghost commented Apr 5, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants