Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

states: New packages for state, state files, and state managers #18223

Merged
merged 4 commits into from
Jun 12, 2018

Conversation

apparentlymart
Copy link
Contributor

@apparentlymart apparentlymart commented Jun 9, 2018

As part of the ongoing effort to implement a consistent value type system across all of Terraform, we need to revamp our state format to properly represent structured data. While we could've made a more minimal change here to just swap out the representations of resource attributes and output values, we're also going to take this opportunity (since we're making a breaking change anyway) to lay the groundwork for some other changes expected in forthcoming releases.

Following the lead of how we reorganized configuration loading in an earlier commit, here we have some new packages that are intended to eventually replace the functionality of some other components that are currently mixed across various other packages. The intent here is to have a similar three-layer architecture as for configuration:

Concern Package for Config Package for State
Models hashicorp/terraform/configs + hashicorp/hcl2/hcl hashicorp/terraform/states
Manager hashicorp/terraform/configs/configload hashicorp/terraform/states/statemgr
File Format(s) hashicorp/hcl2/hclparse hashicorp/terraform/states/statefile

The separation adopted here for both config and state is intended to organize around caller cohorts: the types in the "models" package will be used from many different files in Terraform, while the "manager" APIs deal with logic for loading and saving the models and are used in relatively fewer places (mainly just in the backends, and other "early" contexts). The low-level file format handling is in turn used primarily by the managers.

Since this PR includes three separate packages, the following sections will give an overview of the high-level goals and any other notable aspects of each package.

states: model types

Prior to this change, the model types for states lived inside the terraform package. As part of our ongoing effort to split apart that package into logical smaller parts, the new equivalents of those types are now in the top-level package directory states, alongside configs.

Some of the state-related types in the terraform package had misleading names due to historical terminology drift, and so this new package is also an opportunity to re-align that terminology while introducing into the state a first-class representation of resources themselves, where the old models represented only resource instances. The approximate correspondences between old and new are:

terraform states
terraform.State states.State
terraform.ModuleState states.Module
not modelled states.Resource
terraform.ResourceState states.ResourceInstance
terraform.InstanceState states.ResourceInstanceObject
terraform.OutputState states.OutputValue

An overview of the general shape of the new states.State data structure can be seen in the expected value for one of the tests:

want := &State{
Modules: map[string]*Module{
"": {
Addr: addrs.RootModuleInstance,
LocalValues: map[string]cty.Value{
"foo": cty.StringVal("foo value"),
},
OutputValues: map[string]*OutputValue{
"bar": {
Value: cty.StringVal("bar value"),
Sensitive: false,
},
"secret": {
Value: cty.StringVal("secret value"),
Sensitive: true,
},
},
Resources: map[string]*Resource{
"test_thing.baz": {
Addr: addrs.Resource{
Mode: addrs.ManagedResourceMode,
Type: "test_thing",
Name: "baz",
},
EachMode: EachList,
Instances: map[addrs.InstanceKey]*ResourceInstance{
addrs.IntKey(0): {
Current: &ResourceInstanceObject{
SchemaVersion: 1,
Status: ObjectReady,
AttrsJSON: []byte(`{"woozles":"confuzles"}`),
},
Deposed: map[DeposedKey]*ResourceInstanceObject{},
},
},
ProviderConfig: addrs.ProviderConfig{
Type: "test",
}.Absolute(addrs.RootModuleInstance),
},
},
},
"module.child": {
Addr: addrs.RootModuleInstance.Child("child", addrs.NoKey),
LocalValues: map[string]cty.Value{},
OutputValues: map[string]*OutputValue{
"pizza": {
Value: cty.StringVal("hawaiian"),
Sensitive: false,
},
},
Resources: map[string]*Resource{},
},
},
}

It is inevitable that we will make further changes to these models as we work to integrate them into existing codepaths. This PR is really just a first pass to sketch out the overall structure, with the intent of fleshing out and adjusting the details as we continue down this road. With this in mind, there is relatively low test coverage of the new methods in order to avoid friction during ongoing iteration.

Explicitly Modelling Resources

As noted above, the existing state models lack an object representing the state of an entire resource, instead skipping straight to resource instances. Since resources and resource instances are becoming usable as object values in their own right in the next release, it is useful to track some resource-level metadata in order to explicitly represent resources that have count = 0, and to lay the groundwork for later supporting instances with string keys in the planned for_each feature.

Alongside the map of instances, resource states track the following metadata:

  • Whether accessing the resource in an expression should return a single object, a list, or a map.
  • Which provider configuration most recently managed the resource. (This was previously retained at the instance level, but it's a resource-level setting in configuration and so this change aligns the two to ensure that instances of a resource can't get out of sync with one another.)

Instance Generations and Object Statuses

Terraform uses the ideas of "tainted" and "deposed" to manage different sorts of partial operations, but the prior models didn't make these very explicit and we lacked some nouns for talking about these ideas and how they relate to one another.

These new models introduce some new vocabulary in an attempt to model these more explicitly:

  • Each resource instance has multiple generations, each of which tracks a separate "instance object". The current generation (formerly "primary") is the remote object that is currently represented by the instance, while potentially many deposed generations represent objects that were represented by the instance at some point but have yet to be destroyed after an interrupted "create before destroy" operation.
  • Each instance object has a status, which can currently either be "ready" or "tainted". "Ready" indicates that the object is either in a known-good state or in a known and recoverable bad state. "Tainted" indicates that the object is in an unknown or known and unrecoverable bad state, where the only recourse is to destroy the object and create a new replacement.

An important detail to note here is that there is an additional level of heirarchy that our previous mental models previously often missed: each instance has potentially multiple objects -- zero or one "current", zero or more "deposed" -- and each of those objects has its own status.

The prior models tracked deposed objects by their index within a slice, which meant that they lacked an identity that would be stable across runs (if one of several deposed instances is destroyed, for example). The new model assigns each deposed instance a pseudo-random eight-character hex unique identifier that remains stable as other deposed instances are added and removed, and lays groundwork for being able to consistently correlate deposed instances between the state, plan, and UI output.

Instance Object Attributes

In the old state model, instance object attributes are tracked as a map[string]string, with complex data structures flattened into dot-separated string keys using flatmap. The flatmap encoding is not rich enough to represent the new type system in full, and so a new representation is required.

To ensure robust handling of object attributes set in configuration, we've introduced to Terraform Core the idea of resource type schemas, allowing Terraform to do initial processing and validation of configuration before passing the result onto providers for further processing.

Terraform state has the additional constraint that existing persisted states must be accepted by newer versions of Terraform and newer providers. While Terraform Core can use the resource type schemas alone to decode configuration, the persisted state may contain state data that conforms to the schema from an older provider version, and so it must be "migrated" to conform to the latest version before use. Migration is performed via arbitrary logic in the provider, and so Core alone cannot fully decode object attributes during state loading.

The new state models therefore follow the lead of the config models again by performing partial decoding, retrieving all of the fixed structure of the state but leaving the instance object attributes in a raw form so that upgrading and interpretation can be delayed until a later stage, when provider plugins are available.

Although Terraform Core currently has shims in place to interpret a flatmap encoding into a full cty.Value given the expected value type, the expected type for an earlier version is not available to Terraform Core. To deal with this, states.ResourceInstanceObjectcan represent object attributes either as a JSON representation of the object value yet to be decoded or as a flatmap. When encountering an object using an outdated resource type schema, Terraform Core must first pass these raw values to the provider for migration before they can be finally resolved into a cty.Value conforming to the current schema.

In practice, most of Terraform Core doesn't care about object attributes and so this raw encoding is not a problem. The exception is the code that deals with references like aws_instance.foo in the configuration language, which must be able to place the instance object's final value into the expression scope. In practice, this codepath is already dealing with the flatmap encoding today and so we know that it has sufficient information available to interpret the raw attribute data on the condition that all objects are migrated to the latest schema prior to evaluating any expressions.

Concurrency

The old state models all had unexported mutex fields inside them with the intent that callers would lock these objects before working with them. In practice we get by with less-granular locking of the entire state structure managed within the graph walker, and so those embedded mutexes are removed here.

The new position on concurrency is hopefully more straightforward: the states models are explicitly not safe for concurrent writes or reads concurrent with writes, and it's always the caller's responsibility to handle mutual exclusion where required using some suitable means. In practice, that will be the state mutex already present in our graph walker for the forseeable future.

Where I expected the requirements would not be obvious, I made a point of calling out any concurrency-related caveats in the documentation comments for each function.

statefile: state file formats

The statefile package is responsible for serializing the persistent subset of a states.State so that it can be stored as an opaque blob in various blob storage systems.

This functionality was previously served by directly mapping the in-memory models to JSON using encoding/json and struct tags. That approach was convenient, but it forces the structs to be a compromise between what is convenient for efficient in-memory use and what is convenient in a file format, and makes it hard for us to evolve either over time without affecting the other.

The file format handling is therefore now a separate concern from the in-memory models, and hand-written code translates between the in-memory format and the serialized format. Internally encoding/json is still used, but it is used with unexported types tailored to the need of the file formats.

terraform statefile
s = terraform.ReadState(r) s = statefile.Read(r)
terraform.WriteState(s, w) statefile.Write(s, w)
s.MarshalEqual(o) statefile.StatesMarshalEqual(s, o)

We also need to accept older generations of our state file format for reading. The statefile package handles this automatically using a combination of existing code adapted from the terraform package (for versions up to 3) and newly-written code to migrate version 3 to the new version 4.

State format version 4

The new state format is designed to store the subset of state information that needs to persist between runs in a compact way that is will hopefully be convenient for other software to interpret once we are ready to promise compatibility with it. (That compatibility promise will not come with the first release, to allow for any adjustments to the format that might become necessary based on real-world feedback.)

The main changes are:

  • Only the root module outputs are persisted, with child module outputs discarded. All outputs can be easily re-evaluated from configuration when needed, and so it is not strictly necessary to store any of them for Terraform's main workflow, but having the root outputs stored is convenient for commands like terraform output and for the terraform_remote_state data source which both lack the context necessary to re-evaluate these.
  • Modules are not explicitly represented in the structure, and are instead represented only as an address value within each resource object. The address values are now strings in Terraform's standard module address format, which will allow for the future addition of instance keys when support for count and for_each is added to modules in a later release.
  • Since the multiple object generations associated with an instance are also just a matter of addressing at this level, these too are all flattened into a single sorted array with the additional object property deposed indicating the pseudo-randomly-allocated deposed generation key for non-current objects.

Although state version 4 primarily uses real JSON data structures to represent attribute values, it also supports flatmap encoding as a fallback behavior in order to properly model instance objects with outdated schema versions that have not yet been migrated.

The general shape of the new format can be seen in some of the test fixtures in this package, including the following simple one:

{
"version": 4,
"serial": 0,
"lineage": "f2968801-fa14-41ab-a044-224f3a4adf04",
"terraform_version": "0.12.0",
"outputs": {
"numbers": {
"type": "string",
"value": "0,1"
}
},
"resources": [
{
"mode": "managed",
"type": "null_resource",
"name": "bar",
"provider": "provider.null",
"instances": [
{
"schema_version": 0,
"attributes_flat": {
"id": "5388490630832483079",
"triggers.%": "1",
"triggers.whaaat": "0,1"
},
"depends_on": ["null_resource.foo"]
}
]
},
{
"mode": "managed",
"type": "null_resource",
"name": "foo",
"provider": "provider.null",
"each": "list",
"instances": [
{
"index_key": 0,
"schema_version": 0,
"attributes_flat": {
"id": "8212585058302700791",
"triggers.%": "1",
"triggers.what": "0"
}
},
{
"index_key": 1,
"schema_version": 0,
"attributes_flat": {
"id": "1523897709610803586",
"triggers.%": "1",
"triggers.what": "0"
}
}
]
}
]
}

statemgr: state managers

The state manager concept is left mostly untouched here compared to what was previously in the top-level package state (not the new states). Some naming is adjusted to hopefully read better in calling code:

state states/statemgr
state.State statemgr.Full
state.StateReader statemgr.Reader
state.StateWriter statemgr.Writer
state.StateRefresher statemgr.Refresher
state.StatePersister statemgr.Persister
state.Locker statemgr.Locker

The contents of each of these interfaces is intentionally left unchanged so that the state versions can eventually become type aliases, with the exception of StateReader and StateWriter now dealing with *states.State values instead of *terraform.State.

Since the idea of a state and the idea of a state file are now separated into two packages, state managers have the new role of translating between those two "worlds". In particular, this completes our work started earlier to move the management of Lineage and Serial to be handled only within state managers, with these now not even being visible to callers. In order to retain the existing testing utilities that verify correct handling of Lineage and Serial I introduce a new optional interface to allow the test utility to access these; I don't expect any "real" caller to need them, but they are available if e.g. we want to include these values in logs.

In an attempt to clarify the relationships between the Reader/Writer pair and the Refresher/Persister pair, I introduced new interfaces statemgr.Transient and statemgr.Persistent that are present mainly just to hold documentation that attempts to introduce the concepts of transient snapshots and persistent snapshots, including how they relate to each other and to the mutable state objects they are built from. However, this also presented the opportunity to recast state.InMemState as an implementation just of statemgr.Transient, making it clear within the type system that it is not actually capable of persistence, with statemgr.NewFakeFull returning an explicitly-fake implementation of statemgr.Full that is documented as being only for tests.

The type statemgr.Filesystem is a light adaptation of state.LocalState to implement statemgr.Full instead of state.State. Its functionality is unchanged, which means it still incorrectly implements the interface by handling persistence inside Writer rather than Persister. We can address that in a later project.

There are various other state manager implementations in state and its subdirectories. These are not yet adapted to the new interfaces and will be dealt with in later changes, since this changeset is large enough as it is.

@apparentlymart apparentlymart self-assigned this Jun 9, 2018
@apparentlymart apparentlymart requested review from jbardin, mildwonkey and a team and removed request for jbardin and mildwonkey June 9, 2018 01:04
@jbardin
Copy link
Member

jbardin commented Jun 11, 2018

We're going to need to revisit the concurrency on the state in this round. The reason for the internal state locks was that the sidecar lock used during evaluation is not sufficient to protect the state in all places where it's accessed (this also applies to the Diff). Removal of those lock values from the evaluation context was something I had planned to so, because at this point they were only a false sense of security.

I have yet to review the internals in detail, but the unit tests show 42 new data races for me with the new states. IIRC the primary culprit for breaking the synchronization contracts was the interpolator, and the new races seem to confirm that now that the entry points are entirely confined to the hclsyntax package, so it may be feasible to lock around it here.

My main concern here is that the pattern of having the lock separate from the structure leaves too many chances for unsynchronized access. Having a fully synchronized data structure, only accessed via mutator methods, forces all usage of the data to be synchronized.

@apparentlymart
Copy link
Contributor Author

I've generally leaned towards "caller handles locking" for stuff like this because synchronization needs can vary by caller, and I feel like this follows the lead set by Go itself in leaving the caller to handle synchronization of access to maps and slices.

If we find that more granular locking is needed then I'd still lean towards having the subsystem that causes that need deal with the additional complexity that implies, and have states.State (and its child nodes) remain just a simple data structure.

If we wanted to go down the road of having locking be handled internally in the state structures, I think we'd need to make all of the struct attributes unexported and force all read accesses through methods as well. I don't think that's necessarily bad, but it's a pretty big shift from how we deal with things today, and I expect we'd end up with a lot more methods in the state package to deal with multi-step operations that need to happen "atomically".

I'd like to propose a compromise: rather than introducing all this additional complexity into the state models themselves, perhaps we can have an access layer that wraps a *state.State and contains top-level methods for all of the things we need to do during a graph walk operation -- I expect it'd map approximately to one method per state-related EvalNode implementation, along with one or two for interpolation. That layer would then ensure that all of the accesses to the state (read or write) are concurrency-safe, while callers with more esoteric needs remain free to just access the state structure directly in a non-concurrent context.

What do you think?

@jbardin
Copy link
Member

jbardin commented Jun 11, 2018

this follows the lead set by Go itself in leaving the caller to handle synchronization of access to maps and slices.

That's really only for primitives, and there is plenty of precedent for complex data types that are safe for concurrent access via methods, e.g. net.Conn, http.Client, context.Context, etc. The fact that maps themselves are a common point for race conditions is illustrative that it's all too easy to forget or lose track of synchronization.

@mildwonkey
Copy link
Contributor

... I think we'd need to make all of the struct attributes unexported and force all read accesses through methods ...

Apologies if this isn't the most helpful comment, but the above both feels and smells right to me. I know that I'm saying this from the perspective of someone who doesn't know the internals, but my instinct says state is too critical to leave that responsibility for locking to a caller.

@apparentlymart
Copy link
Contributor Author

apparentlymart commented Jun 11, 2018

My interpretation of the distinction has been as more between data objects vs. "actor" objects, as opposed to primitive vs. non-primitive. For example, while http.Client (an "actor") is concurrency safe, http.Request doesn't have any mutexes around it because it's just data.

However, I take the point that our state object sits somewhere between the two extremes here: the higher-level portions of it are generally acted on rather than accessed, like http.Client, while the leaf objects are used as just data, more like http.Request.

With that in mind, I've pushed here a new type SyncState that aims to be the higher-level "actor" around the data structure, with each operation using mutexes to ensure it can complete atomically, and with all of the "get" operations returning deep copies to avoid callers racing with shared objects.

I think this is a nice compromise because it gives us the safe API we need to implement the operations during the graph walk (though we'll probably need to add some additional methods along the way once we start integrating) but still retains the convenience of working directly with the data structures in some cases, at the expense of some additional copying.

For example, during an eval tree for a resource instance I expect EvalReadState would call ResourceInstanceObject to take a snapshot of the current object, and then the other eval steps will do what they're gonna do, and then EvalWriteState will call SetResourceInstanceCurrent (or SetResourceInstanceDeposed) to safely write a copy of a new or modified instance object into the state.

This new wrapper can also take care of some housekeeping details that require atomicity, such as removing a module from the state automatically if an operation leaves it empty. I decided to reframe the "move current object into deposed" operation as one of these higher-level operations too, since it's making a sequence of coordinated changes with multiple objects.

states/import.go Outdated
// expressions and is compared with configuration when producing a diff.
Value cty.Value

// Internal corresponds to the field of the same name on RemoteObject,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field name out of sync

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yes, thanks! I didn't end up wiring up any other stuff related to import handling yet, and so I wasn't really paying close attention to this type. I'll fix this docstring up for now, but I expect we'll want to fuss with this some more later once the interactions between this and the import operation become clearer.


const (
// ObjectReady is an object status for an object that is ready to use.
ObjectReady ObjectStatus = 'R'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While having unset mean "not-ready" is logical if this were a boolean value, I feel that having 3 states describes by 2 values is a bit awkward. Maybe we should have a ObjectPending = 'P'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not intending "pending" to be represented at all yet, because we don't have any support for it elsewhere. I expected we'd add it only when we're ready to use it, since I don't think we know 100% how the behavior of that will work yet, and it won't require any change to the shape of these data structures.

Did I imply in the docs somewhere that there was supposed to be a third value in this enum right now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, nothing here was incorrect. IIRC when we discussed handling an interrupted apply, I agreed that "pending" could just as well be represented as "!ready", so I was incorrectly applying that idea here.

If you intended to add other states as needed, then ignore my first comment ;)

Copy link
Member

@jbardin jbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great overall, just a couple comments inline

Our previous state models in the "terraform" package had a few limitations
that are addressed here:

- Instance attributes were stored as map[string]string with dot-separated
  keys representing traversals through a data structure. Now that we have
  a full type system, it's preferable to store it as a real data
  structure.

- The existing state structures skipped over the "resource" concept and
  went straight to resource instance, requiring heuristics to decide
  whether a particular resource should appear as a single object or as
  a list of objects when used in configuration expressions.

- Related to the previous point, the state models also used incorrect
  terminology where "ResourceState" was really a resource instance state
  and "InstanceState" was really the state of a particular remote object
  associated with an instance. These new models use the correct names for
  each of these, introducing the idea of a "ResourceInstanceObject" as
  the local record of a remote object associated with an instance.

This is a first pass at fleshing out a new model for state. Undoubtedly
there will be further iterations of this as we work on integrating these
new models into the "terraform" package.

These new model types no longer serve double-duty as a description of the
JSON state file format, since they are for in-memory use only. A
subsequent commit will introduce a separate package that deals with
persisting state to files and reloading those files later.
Whereas the parent directory "states" contains the models that represent
state in memory, this package's responsibility is in serializing a subset
of that data to a JSON-based file format and then reloading that data
back into memory later.

For reading, this package supports state file formats going back to
version 1, using lightly-adapted versions of the migration code previously
used in the "terraform" package. State data is upgraded to the latest
version step by step and then transformed into the in-memory state
representation, which is distinct from any of the file format structs in
this package to enable these to evolve separately.

For writing, only the latest version (4) is supported, which is a new
format that is a slightly-flattened version of the new in-memory state
models introduced in the prior commit. This format retains the outputs
from only the root module and it flattens out the module and instance
parts of the hierarchy by including the identifiers for these inside
the child object. The loader then reconstructs the multi-layer structure
we use for more convenient access in memory.

For now, the only testing in this package is of round-tripping different
versions of state through a read and a write, ensuring the output is
as desired. This exercises all of the reading, upgrading, and writing
functions but should be augmented in later commits to improve coverage
and introduce more focused tests for specific parts of the functionality.
This idea of a "state manager" was previously modelled via the
confusingly-named state.State interface, which we've been calling a "state
manager" only in some local variable names in situations where there were
also *terraform.State variables.

As part of reworking our state models to make room for the new type
system, we also need to change what was previously the state.StateReader
interface. Since we've found the previous organization confusing anyway,
here we just copy all of those interfaces over into statemgr where we can
make the relationship to states.State hopefully a little clearer.

This is not yet a complete move of the functionality from "state", since
we're not yet ready to break existing callers. In a future commit we'll
turn the interfaces in the old "state" package into aliases of the
interfaces in this package, and update all the implementers of what will
by then be statemgr.Reader to use *states.State instead of
*terraform.State.

This also includes an adaptation of what was previously state.LocalState
into statemgr.FileSystem, using the new state serialization functionality
from package statefile instead of the old terraform.ReadState and
terraform.WriteState.
This is a wrapper around State that is able to perform higher-level
manipulations (at the granularity of the entire state) in a
concurrency-safe manner, using the lower-level APIs exposed by State and
all of the types it contains.

The granularity of a SyncState operation roughly matches the granularity
off a state-related EvalNode in the "terraform" package, performing a
sequence of more primitive operations while guaranteeing atomicity of the
entire change.

As a compromise for convenience of usage, it's still possible to access
the individual state data objects via this API, but they are always copied
before returning to ensure that two distinct callers cannot have data
races. Callers should access the most granular object possible for their
operation.
@ghost
Copy link

ghost commented Apr 3, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants