Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implements gzip interpolation func #3858

Closed
wants to merge 1 commit into from

Conversation

glerchundi
Copy link
Contributor

fixes #3407

Signed-off-by: Gorka Lerchundi Osa glertxundi@gmail.com

@glerchundi
Copy link
Contributor Author

In case you feel it can be merged, here it is the PR. You can take a look to these use cases:

For etcd2:
https://github.com/glerchundi/infrastructure-etcd2/blob/master/etcd.tf#L69-L78
For kubernetes:
https://github.com/glerchundi/infrastructure-kubernetes/blob/master/kubernetes.tf#L182-L190

Ignore the old-fashioned base64enc (in favor or base64encode) because we're using a hand-made terraform binary yet.

@apparentlymart
Copy link
Contributor

Interesting...

I think this would be the first time we explicitly encouraged having raw binary data interpolated in Terraform. Technically it has always been possible to read binary data using file(...) but in practice I think most use-cases for it were to load text-based files.

If it is always used inside base64 to create something text-friendly again then I guess it could be fine, but in cases where base64 isn't used I wonder how well/badly this would interact with the JSON serialization of Terraform states, since JSON doesn't explicitly support non-text strings. Did you try some cases without base64?

@glerchundi
Copy link
Contributor Author

If it is always used inside base64 to create something text-friendly again then I guess it could be fine, but in cases where base64 isn't used I wonder how well/badly this would interact with the JSON serialization of Terraform states, since JSON doesn't explicitly support non-text strings. Did you try some cases without base64?

I cannot figure out any use-case in which this can be used without base64ing or encoding in whatever form. At least I have not found in the position.

Thanks for the quick response.

@glerchundi
Copy link
Contributor Author

@apparentlymart, I found that sooner or later #3846 & #3909 are going to be merged. If this is the direction Terraform is going to take regarding to local resource management, I can imagine another use case in which this would be useful. Imagine you want to save files in a storage service (s3, gcs or whatever) and you want to make them gzipped. Something like:

current:

resource "aws_s3_bucket_object" "object" {
    bucket = "your_bucket_name"
    key = "new_object_key"
    source = "path/to/file"
}

gzipped:

resource "aws_s3_bucket_object" "object" {
    bucket = "your_bucket_name"
    key = "new_object_key"
    source = "${gzip(file("path/to/file))}"
}

WDYT?

@apparentlymart
Copy link
Contributor

@glerchundi I definitely agree that this is useful... I'd love to be able to upload gzipped stuff to S3 as you suggested!

My concern is specifically with how well Terraform's internals will deal with having raw binary data in property values: Go strings are UTF-8 by convention but in practice can be any bytes, but JSON can't serialize raw bytes and so the encoding/json library will replace non-UTF-8-valid data with the unicode replacement character. This is just one case I know of where we implicitly assume UTF-8 string data in Terraform; they may be other cases I'm not considering.

This will probably work fine for attributes that have a StateFunc that returns a hash of the data (which I believe aws_s3_bucket_object.source does), but it may cause strange consequences for any attribute whose value is stored literally in the state.

I think it would be good to test some edge cases here to make sure that Terraform won't silently corrupt binary data. We could test some of this by just adding a couple extra unit tests that return raw binary data without base64-encoding it, but I'm not sure how best to test what happens when binary data gets written out into the JSON state file... that probably requires something like an acceptance test using the null_resource, e.g. putting gzip data in the triggers map to see whether Terraform corrupts it at some point during the workflow.

@glerchundi
Copy link
Contributor Author

@apparentlymart thanks for this detailed explanation, I really appreciate it.

This will probably work fine for attributes that have a StateFunc that returns a hash of the data (which I believe aws_s3_bucket_object.source does), but it may cause strange consequences for any attribute whose value is stored literally in the state.

This seems reasonable. I think it would be possible to diverge paths depending on how do you what to save this in the state based on the value type. But anyway this is your concern and your internal&core headaches ;P.

I think it would be good to test some edge cases here to make sure that Terraform won't silently corrupt binary data. We could test some of this by just adding a couple extra unit tests that return raw binary data without base64-encoding it, but I'm not sure how best to test what happens when binary data gets written out into the JSON state file... that probably requires something like an acceptance test using the null_resource, e.g. putting gzip data in the triggers map to see whether Terraform corrupts it at some point during the workflow.

Yep, absolutly needed! If I find a free slot I can work on this.

And again, thanks for pointing me out with every detail it was really interesting.

@oli-g
Copy link

oli-g commented Dec 3, 2015

+1. I really need something like this.

@seanb4t
Copy link

seanb4t commented Jan 5, 2016

+1 This would be extremely helpful.

@glerchundi
Copy link
Contributor Author

@apparentlymart rebased this, any chance to make this merged into the master branch? It seems like it's not just me ;)

If not, please tell us something so that we can start thinking in other approaches.

Thanks.

@cihangir
Copy link

+1 I really need this right now :)

@moosilauke18
Copy link

+1

@paultyng
Copy link
Contributor

Running in to the same issues with large cloud inits. Coreos doesn't support multipart, but it does support base64+gzip content of files. This would be a big win.

@Seraf
Copy link

Seraf commented Jun 23, 2016

Hello,

I also need it for coreos !

@opsline-ryan
Copy link

I also really need this to use terraform with coreos and CloudFormation. This PR looks good to be merged. If you need help with anything related to adding this feature, please let me know.

@sachincab
Copy link

thanks for looking into this, much needed feature!
I hope it would be available sooner for use.

@rochacon
Copy link

Any updates on this? This would be really helpful when using some rather large user data scripts (I just hit this problem).

@glerchundi would you mind rebasing the branch with latest master so the conflict is resolved?

rochacon added a commit to rochacon/terraform that referenced this pull request Aug 24, 2016
rochacon added a commit to rochacon/terraform that referenced this pull request Sep 5, 2016
rochacon added a commit to rochacon/terraform that referenced this pull request Sep 5, 2016
tomas-edwardsson pushed a commit to wowair/terraform that referenced this pull request Nov 10, 2016
@cemo
Copy link

cemo commented May 22, 2017

@glerchundi does it make sense for you too?

@rochacon
Copy link

I've been using this patch in a custom built Terraform binary and haven't seen this corrupt gzip data issue, this is worrying, thanks for bringing this up.

My use case does not allow the base64 wrapping though, since CoreOS cloud init implementation does not support it (not sure if this changed recently).

For reference, this is the way I've been using it in my resources:

resource "aws_launch_configuration" "controller" {
  # ...
  user_data = "${gzip(data.template_file.controller_user_data.rendered)}"
  # ...
}

@apparentlymart
Copy link
Contributor

If passing base64 directly to user_data is not workable, I expect we could offer an alternative property user_data_base64 that does the decoding internally to the provider before sending data to the EC2 API. This is admittedly annoying for anyone already using it since it will require some manual fixup to prevent Terraform from replacing the instance to set it, but after the initial transition I think it would work as expected, despite being not the most ideal solution.

@cemo
Copy link

cemo commented May 22, 2017

Coreos support gzip+base64 encoded writing files. Our simplified use case is like this:

data "template_file" "cloud-config" {
  vars {
    ca_pem = "${base64encode(var.tls-ca-self-signed-cert-pem)}"
  }
}

But we have so much certificates so we had to pass them as gzip since we exceeded 16K limit. If gzip PR will be merged, we will use like base64encode(gzip(content)).

https://coreos.com/os/docs/latest/cloud-config.html

@cemo
Copy link

cemo commented May 23, 2017

@apparentlymart, @glerchundi I have rebased and added necessary implementations at here which is mostly derived from @glerchundi's work. But expected result at my test is different than @glerchundi. He has used base64 as well but I am not sure why the result is different. Would you review this please guys?

@cemo
Copy link

cemo commented May 23, 2017

I think it is related to Go version the time first PR is submitted. Since there are changes at 1.8 related gzip the output might change.

@glerchundi
Copy link
Contributor Author

glerchundi commented May 23, 2017

@cemo definitely, it's related to gzip, specifically in DEFLATE implementation changes. We already faced this issue in other projects, same input different results depending on the runtime you were using:

https://golang.org/doc/go1.8#compress_flate

@apparentlymart Ok, seems reasonable. That being said, and knowing there were precedents in place (base64sha{256,512}) I updated the PR to do gzip+base64 both at the same time. This would unblock the issues with CoreOS and gzip'ed+base64'ed data. In case gzip (or whatever func which requires raw data as output) was needed it can be tackled in another PR, lets resolve this first.

UPDATE: @cemo i missed you already pointed me to the golang reference...

@glerchundi glerchundi force-pushed the master branch 3 times, most recently from 4aebc83 to f273344 Compare May 23, 2017 06:09
@glerchundi
Copy link
Contributor Author

glerchundi commented May 23, 2017

Just to put some info on this thread, coreos-cloudinit already supports encoding: "gzip+base64" but the whole config file is being deprecated and replaced by a lighter, simplified and json based alternative called ignition which follows the standard data URI scheme when it comes to embedding data. I didn't tried yet but I think the same behaviour can be achieved by doing something like this: data:application/x-gzip;base64:<gzipped-and-then-base64ed-data>.

So it seems this PR applies.

/cc @grubernaut @apparentlymart

@cemo
Copy link

cemo commented May 26, 2017

@apparentlymart I think that you can merge this issue since all your concerns are addressed. This PR also closes #3407.

@shantanugadgil
Copy link

Hi,

This seems to be only one way, right? No way to get back the original using any other function?
Do you think we would need a "gunzipbase64decode" as well ?

I happened to stumble upon this while searching for my requirement:
"base64encodebzip2"/"bunzipbase64decode"
I need to interpolate and use bzip2+base64encoded data (small strings)
* could work the same for template files as well I guess.

Thanks and Regards,
Shantanu

@josephholsten
Copy link
Contributor

@cemo where's the PR with your implementation? seems like we could close this PR if yours gets merged.

@cemo
Copy link

cemo commented Jun 11, 2017

@josephholsten I had implemented but did not create a PR since @glerchundi was the original author. He also updated to address @apparentlymart's concerns. Now it is safe to merge this PR.

@apparentlymart apparentlymart self-requested a review June 12, 2017 15:39
@apparentlymart
Copy link
Contributor

Hi everyone! Sorry for the silence here. This is on my list to review soon, just getting delayed by some legwork going on for the 0.10 release. I will look at this as soon as I can.

@wonderhoss
Copy link

This would be incredibly useful.
Our use case is templating cloud-config write-file blocks with base64 encoded certificate data read from Vault.
Since these go onto S3, we encrypt them using kms (for which there is a terraform resource), but need to gzip the payload first (due to kms plaintext size limit).

It would be great if we could simply do

data "aws_kms_ciphertext" "somesecret" {
  key_id = "${aws_kms_key.my_key.key_id}"
  plaintext = "${gzip(${vault_generic_secret.cert.data["cert"]})}"
}

Copy link
Contributor

@josephholsten josephholsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems straightforward, needs user doc

@apparentlymart
Copy link
Contributor

I rebased this, added some docs, and merged it as a303817.

Thanks for working on this, @glerchundi, and for your patience as we worked through the design constraints. Sorry this one took so long!

At present this function will likely be of limited use due to the need for other resources to be updated to expect base64 data as an option, but this is a good first step.

We'd be happy to review PRs for adding base64 variants of attributes that are often used with the file interpolation function where there's a strong use-case for using binary data, such as content on aws_s3_bucket_object having a companion content_base64 attribute. I expect we'd be more cautious about situations where the use of binary data seems marginal; if someone wants to propose such a change, I'd ask that they do their best to describe the use-case for binary data and why plain text isn't a suitable alternative, since plain text is generally better where possible for having readable diffs, etc.

@glerchundi
Copy link
Contributor Author

Don't worry @apparentlymart and thanks this is going to be a huge win for us.

@ghost
Copy link

ghost commented Apr 7, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature request: new built-in function gzip(string)