Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source bundler: Unified source code fetching for Terraform CLI and Terraform Cloud #42

Merged
merged 7 commits into from
Jun 7, 2023

Conversation

apparentlymart
Copy link
Contributor

So far Terraform Cloud has used an entirely separate mechanism for fetching the source code for a root module than it does for installing any child module packages. This is largely a consequence of how Terraform Cloud started its life as little more than an automation wrapper around Terraform CLI, mimicking how someone would use Terraform CLI locally: a local user typically starts by fetching the source code using something like git clone, and then runs terraform init to deal with everything else.

As we work to present a more unified experience between these two entry points, it's been increasingly frustrating to have this slightly different treatment in these two cases, since it tends to lead to confusing edge-cases such as how .terraformignore files only get handled for the package containing the root module in Terraform Cloud and not in any other case. It also means that Terraform Cloud needs to work pretty hard to preserve the installed child modules between the plan phase and the apply phase since re-running terraform init won't necessarily reproduce identical source code, if e.g. new commits have been pushed to the remote Git branch in the meantime.

As a foundation for future Terraform features (not yet planned or announced) that more tightly integrate Terraform CLI and Terraform Cloud, here we implement a carefully-selected subset of Terraform CLI's module installation behavior in a way that can be called from both Terraform CLI and Terraform Cloud, producing an equivalent result in both contexts.

We've added this to go-slug because it's already the shared library between Terraform CLI and Terraform Cloud that deals with the problem of packaging up source code. This new functionality introduces the idea of a "source bundle", which is in some sense a "meta-slug": a bunch of source code packages frozen and bundled together into a single directory/archive for later use.

The two new exported packages are explicitly marked as experimental in their documentation because we anticipate needing to make at least some small tweaks to the API as we start to make use of this in some early real-world experiments. We have written a small amount of "realistic-ish" code using this API already to help design it, but need to reserve the possibility of making some breaking changes if we learn that there's something missing. This whole codebase is still currently pre-1.0 anyway, but these extra disclaimers of compatibility mean that the main existing scope of this codebase could potentially reach 1.0 before this new functionality does.


Terraform's existing module installer is based on go-getter, which is in retrospect not a great foundation for this sort of problem because it's far too flexible and therefore gives poor user feedback when problems occur and allows many different ways of writing essentially the same source address that make it hard to optimize installation by reusing existing cached packages.

However, we intend to use this only for new features that users would explicitly opt into, and so it's not necessary to fully replicate everything Terraform CLI would allow today: the existing module installer in terraform init won't be going anywhere for the foreseeable future, so those who need its more esoteric capabilities can continue using Terraform as they to today.

With that in mind, what we have here is essentially a subset of Terraform CLI's module installation capabilities:

  • Installation from Git, whether from a specific commit, a named tag, or another mutable ref.
  • Installation from gzipped tar archives fetched over HTTP.
  • Indirect installation through Terraform module registries as long as the underlying source location is one of the above. (That includes both the public Terraform Registry and Terraform Cloud's private module registry.)

For everything that this implementation and Terraform CLI have in common, the goal is to use a compatible source address syntax. That means that if someone writes a module for the subset that we support in source bundler then it should also work with Terraform CLI's traditional module installer, but not necessarily vice-versa.

The two initial capabilities described above happen to align exactly with what Terraform Cloud itself can currently fetch for root modules, so it's a natural place to draw the line to start -- we don't need to teach Terraform Cloud about any new installation sources -- and can be expanded later if needed as long as we're adding source types that both Terraform CLI and Terraform Cloud can reasonably support (where Terraform Cloud's multi-tenant nature imposes some additional constraints compared to CLI).


Part of the reason to place this new use-case in this existing repository is that this adds support for .terraformignore for all source packages -- both for root modules and for downstream modules. The definition of the .terraformignore file format belongs to this codebase and is now shared between the single-directory "slug" implementation and the source bundle implementation.

That does mean that sourcebundle-based features will behave marginally differently to Terraform CLI's existing module installer if installing from source packages that have .terraformignore files in the root: those files will not be available under sourcebundle-based usage but will be available when installed with Terraform CLI.

This is another example of the source bundle behavior being effectively a subset of the Terraform CLI behavior: it is possible to write a source package that will work in both contexts either by not including .terraformignore at all or by making sure that the modules inside don't try to access any of the files that are excluded by .terraformignore. Those who just continue using Terraform CLI as-is will be unaffected even if they do have a .terraformignore file in their source package.

The handling of .terraformignore files was previously just embedded
directly in the root package of this repository as a bunch of unexported
symbols.

In future commits we'll introduce another package to this codebase that
will also need to deal with .terraformignore files, but in a slightly
different context. In preparation, here we factor out that handling into
a separate internal package and adapt the root package to call into that
package instead of using its own inline implementation.

This carefully preserves the existing caller's ability to ignore various
errors, while now allowing new callers to handle errors differently.
This package models the source address types we'll support for modules
when working with the forthcoming "source bundle" mechanism, which is
essentially a "meta-slug" capturing the contents of many different source
packages all at once.

For the resulting source bundle to be useful we need to be able to talk
about the source addresses it was built from, so that callers can request
the bundle path equivalent to a particular remote source address path.

This address syntax is intentionally a subset of the go-getter-based
syntax used in today's Terraform CLI module installer, because we want
to have a set of commonly-used address types that are treated equivalently
in both implementations. However, we won't be actually using go-getter
here because its high level of flexibility (and quirkiness) is a poor fit
for safely constructing persistable source code bundles.
We typically decide between using HCL native syntax vs. JSON HCL syntax
based on filename suffixes, and so it's helpful to have a robust way to
retrieve just the filename portion of an arbitrary source address (as long
as it's referring to a file) which takes into account edge-cases such as
the query string portion of remote source addresses.
@apparentlymart apparentlymart added the enhancement New feature or request label Jun 5, 2023
@apparentlymart apparentlymart self-assigned this Jun 5, 2023
Copy link
Contributor

@alisdair alisdair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In as far as I can comprehend it, this looks good to me! It's hard to review in isolation, so I'm sure it'll make more sense once I see a consumer of the new packages.

I'm surprised to see relatively little unit testing in the sourceaddrs package, especially the tricky-feeling subpath functions, but the important pieces seem covered by larger integration-type tests. Do you intend to expand the test coverage later?

My only feedback is a typo in one of the commit messages:

LocalSource and RemoteSource implement both interfaces, while
RegistrySource only implements Source and the new RegistrySourceRemote
only implements FinalSource.

I think this should read "…the new RegistrySourceFinal only implements FinalSource".

@apparentlymart
Copy link
Contributor Author

You are correct about the commit message wording indeed. I'll fix that.

Also you are right that I erred more towards integration-style testing rather than unit testing here because we're still "getting the feel for" the design here and so I didn't want to be blocked by constantly rewriting lots of tests. But I do agree that adding more unit tests would be an important prerequisite for this graduating from having an explicit "this is experimental" warning on it, if not before.

Terraform module source addresses have a historical design oddity where
both local and remote source addresses are self-contained but registry
source addresses must always be combined with a version constraint and
then resolved through an extra step of selecting the latest available
version that matches that version constraint.

This means that we cannot use the same type to represent both a package
to be installed and a package that has already been resolved. To fill that
gap here we introduce a variant address type sourceaddrs.FinalSource which
represents addresses of packages that have already been installed, rather
that addresses of packages that are requested to be installed.

LocalSource and RemoteSource implement both interfaces, while
RegistrySource only implements Source and the new RegistrySourceFinal
only implements FinalSource. This then allows other subsystems that are
built in terms of these address types to be explicit about whether they
are expecting an address of something to be installed or an address of
something that has already been installed, and thus the Go compiler can
help ensure that we handle both cases fully rather than forgetting about
the need for the extra version selection information for already-installed
registry module source addresses.
Since this is entirely new functionality that wasn't previously in this
codebase's scope, we'll need to gather some real experience using this new
API before we can be confident that it's suitable to commit to.

At the time of this commit this entire codebase is pre-1.0 and so subject
to breaking changes anyway, but these extra notes are here just in case
the root package functionality gets stabilized before these new packages
are ready to be stabilized. We'll remove these warnings at a later date
once we've got enough experience with these new packages in internal
projects to confidently commit to a public-facing API.
Currently this function can only possibly return RemoteSource values
anyway and so the distinction between Source and FinalSource is moot, but
semantically it's more correct to say that we're returning a finalized
source address here and so using the more correct type will make this
compose better with other functionality which works generically with
final source addresses, and will also allow us to potentially make this
return registry addresses sometimes if we decide that gives better results
for the assumed use-case of producing diagnostic messages.
@apparentlymart apparentlymart merged commit 6a7ca7a into main Jun 7, 2023
@apparentlymart apparentlymart deleted the f-source-bundler branch June 7, 2023 19:26
package sourceaddrs

import (
"github.com/apparentlymart/go-versions/versions"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-hoc comment, but this is the first time I am seeing this - why not hashicorp/go-version? We use that regularly throughout the rest of TFC's codebases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use this library in Terraform because go-version tends to generate useless error messages and historically it didn't handle prereleases correctly, although I think that latter point may at least have been fixed in the meantime. But consistency with what Terraform itself does is the main thing here, so better to use the library it uses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants