Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remaster: a reckoning of weird ideas (+ add gRPC runtime service) #186

Merged
merged 37 commits into from
Jun 12, 2022

Conversation

vito
Copy link
Owner

@vito vito commented Jun 12, 2022

This is a doozie of a PR. I'll summarize each change into sections. This is a case where one thing led to another, to another, to another, and I'd rather just preserve the history as-is than butcher it too much with rebasing.

The north star here was to support a gRPC service interface for remote runtime usage for Bass Loop. This led me down a Protobuf adventure that involved a few intermediate iterations where I went back and forth (e.g. making bass.lock protobuf based, then not, then back again).

This path was a bit of a reckoning: nearly every tentative decision I made in the past where I thought "this may blow up in my face someday" subsequently did. I'm happy with the end result: now that I had a real use case driving evidence for each of these changes, I felt much more comfortable with the trade-offs.

no more duck-typing JSON

Previous versions of Bass relied on "duck typing" JSON structures to re-hydrate them into complex Bass values - chiefly thunks, thunk paths, host paths, and other types of paths. As in, "if a JSON object looks like a thunk, i.e. has all the necessary thunk fields, it's a thunk!" This approach worked pretty well but it was obviously open to ambiguity and false positives - if someone, for whatever reason, wanted to emit a regular JSON object in the same encoded shape as a Bass value, they were out of luck.

This is now removed. Instead, thunks, thunk paths, and other complex types encode using Protobufs. Complex values now have a well-defined schema. JSON values are no longer magically converted into rich Bass types based on their shape. They may still encode to JSON (using Protobuf's JSON encoding), but can only decode into a Bass value when the other end knows exactly what type they want. This is how bass --export works, for example: it takes a JSON object and tries to unmarshal it either into a Thunk (image export) or a ThunkPath (artifact export), rather than reading a Bass value and switching on its type.

Huzzah!

.bass is no longer implicitly added to the command path during (load)

When initially adding support for (use) and (load) I made the decision to use this syntax for loading the foo.bass module, a sibling to the current module:

(use (*dir*/foo))

The motivation was that (use (*dir*/foo.bass)) felt a little less elegant. Comparing to Ruby for example, you say require "foo" not require "foo.rb". Incidentally, you do say load "foo.rb", but I see use as closer to require than load in that it is the go-to function for loading dependent modules.

The weird thing was if you took that same (*dir*/foo) and passed it to (run) instead, Bass wouldn't append .bass to the path. This forced a divide: use .bass for libraries, and don't use it for scripts. This matches my expectations - I prefer scripts to not have a file extension so they can be refactored into other languages without breaking all the call sites.

However it meant that the thunk (*dir*/foo) could not know what file it was referring to until it was used. This is problematic in the case of (fs/foo) where fs is a Go fs.FS backed filesystem and we need to encode it into a message, which is necessary for calculating the thunk's name (a base64-encoded sha256 checksum of the thunk).

So: the magic is gone. When loading a Bass module, you just give the exact filename.

This is a slight departure from Ruby, but really (use) was never anything like require. In Ruby, require takes the given string and traverses your library load path until it finds what you're looking for. In Bass, thunks are always fully qualified. In that sense I think it's much clearer to see exact filenames.

Take this for example:

; load dependencies
(use (.git (linux/alpine/git))
     (*dir*/bass/bass.bass)
     (git:github/vito/bass-loop/ref/main/bass/github.bass))

The second module definitely has a bit of stutter, but some of that is owed to the fact that this is the helper module for Bass itself. Moving briskly past that to the last module, it becomes much clearer that we're simply specifying a file within the repository rather than using some sort of advanced package loading scheme. It also looks much different from the first module, which takes an image reference as an argument ((linux/alpine/git)).

All in all I'm happy with this change as it makes module loading much easier to understand without context, and means (*foo*/bar) always says what it means.

bass.lock is now protobuf-encoded

Building on the introduction of Protobufs, files written using the memo system are now Protobuf-encoded, rather than JSON-encoded. This is a backwards-incompatible change; existing lock files will need to be re-created. I don't want or expect this sort of breaking change to happen again in the future, for obvious reasons (it breaks loading modules!). Now that we're using Protobuf there are well-defined ways for maintaining backwards compatibility - not that it's a silver bullet.

the Bass runtime is no longer a runtime

This is just a internal refactor with no external impact. I had some vague ideas which provide some context on this change, but in the end I just changed the Bass runtime to bass.Session and removed Load from the bass.Runtime interface.

scopes now encode and decode from JSON in deterministic order

A small quality-of-life change; rather than following lexicographical key order (Go default behavior), JSON encoding/decoding now matches Scope semantics in respecting key definition order.

scopes no longer underscorize <-> hyphenate keys

Previously, {:foo-bar 1} would encode to {"foo_bar": 1}, and vice versa. This was always an obvious YOLO. The motivation was to make using Concourse resource-like interfaces feel more "native" because Lisp uses hyphens as a symbol separator. This subsequently blew up in my face on multiple occasions, and is now removed. So if you've got any places pattern-matching JSON keys using hyphens, you'll need to change those to underscores.

concatenating thunks/values uses lists rather than duck typing {:str [vals]}

Previously, if you wanted to pass a CLI flag like --foo=./thunk/path you had to wrap it in a structure like {:str ["--foo=" path]} - exported by stdlib as (str-thunk). Now you just pass the list value instead, and lists are given special treatment in args/env because those have to resolve to strings anyway.

add gRPC runtime client/server implementation

Previously, Bass Loop used its own built-in runtimes to communicate to SSH-forwarded services. This meant the runtime code was running server-side, which had a few downsides:

  1. Because the Buildkit runtime injects a precompiled shim executable into every thunk, this involved a ~2.5MB network transfer for each call.
  2. Users could only use runtimes that Bass Loop knew about ahead of time.
  3. Runtime code ran separately from the runner's machine, meaning it couldn't e.g. use the OS X keychain for Docker Hub authentication.

With gRPC, Bass Loop can just be a dumb server that talks to a fully remote runtime, solving all of these problems!


All in all, the above changes bring Bass much closer to a stable release - I had a feeling a lot of these problems would surface someday, and they finally did!

vito added 30 commits June 10, 2022 22:19
scopes preserve order, so it's natural to expect their JSON form to as
well, and this allows tests to be stable.
this was never a great fit; half of interface was not relevant to it,
and half of it was not relevant to real runtimes

I'd like to scope the Runtime interface down to "real" runtimes now and
have it just use protobuf types directly. at that point it really makes
no sense for the Bass runtime to be a runtime.
as preparation for protobufs, rename String to Repr since protoc
generates a String() method already

(i like this change regardless as it's more explicit)
* fs paths not yet supported; need a bigger change here
* tests not updated
at a crossroads

gonna look into having thunk just embed a proto thunk again
probably just gonna roll back the memo/runtime interface changes at this point though
using protobuf here was unnecessary; would rather just keep it limited
to when we actually need to send something over the wire.
it's much more useful to have a canonical fmt.Stringer implementation,
especially for scenarios where we just have 'any' and can't assume
Repr() is there
...and backfill tests for JSON/Protobuf marshal/unmarshaling, greatly
expanding Thunk coverage in particular
should behave more predictably than reflection
this is just too surprising. it was originally motivated by having
params sent to Concourse resources feel more native, e.g. :private-key
instead of :private_key. it was a funky idea I went forward with to see
how it went, and to be honest I don't think it went that well.

there were a few cases where I *didn't* want it to happen, and the
translation was destructive in those cases. for example, using the
base64-encoded sha256 of a thunk as a key in bass.lock would become
corrupted.

it's better to just have this match exactly what the user says. if
there's some ergononomic reason to do this conversion it should be
applied in a much more localized area (e.g. a Concourse lib) rather than
the entire language.
thunks saved into memos should have a reliable encoding scheme for rich
types contained in stdin/args/etc. - protobuf solves this better than
duck-typing JSON. removing this gets us one closer to removing Descope.
this was always a little bit of a wild west, so it's a little comforting
seeing it go away.

thunks and thunk paths now marshal to JSON using protobuf encoding, but
they - along with other rich path types - will no longer be
automagically translated from their JSON encoding. this always risked
ambiguity, with it being unclear whether a {"file":"./foo"} form should
actually turn into a bass.FilePath or a general *bass.Scope.

now that thunks and memos encode using protobufs, this ambiguity is
gone. emitting a thunk path as JSON to stdout and piping to `bass
--export` still works just fine because --export knows it's expecting a
bass.ThunkPath.
vito added 4 commits June 11, 2022 16:30
continuing on the cleanup journey, this has always been a bit of an
oddity. now FSPaths have reference equality semantics while being
content-addressed for purposes of caching and mounting. equality is not
really useful for FSPaths, and that was one of the reasons to keep ID
around.
it's always been a little weird that (run) and (load) will look for
different file paths; it meant you couldn't re-use a script as a
library.

it also meant you couldn't marshal/serialize a thunk ahead-of-time that
used a fs.FS-backed command path because you couldn't know whether it
was going to use .bass or not.

now paths passed to (load) are always regular paths without any magic
suffix, which IMO is easier to understand anyway.

the original goal here was to mimic other languages like Ruby where you
do `require "foo"` without the extension - however Bass load paths are
fundamentally different in that Ruby will take that string and traverse
a load path, while Bass thunks are more like fully qualified paths.
@vito vito added enhancement New feature or request breaking Removes or replaces previously supported functionality labels Jun 12, 2022
vito added 3 commits June 12, 2022 14:21
this goes away with the removal of duck typing
this is a fundamental requirement, as we don't want secret rotation to
change the 'identity' of a thunk, and we don't want to send secrets over
the wire in the first place.

the next iteration on this may change secret names from simple strings
to some sort of namespacing scheme so that user-provided runtimes can
fetch them from a local secret store.
this was another weird duck-typing pattern. now you can just use a list
to concatenate values in command args/env.
@vito vito merged commit c7cb9a5 into main Jun 12, 2022
@vito vito deleted the remastered branch June 12, 2022 19:24
@vito vito changed the title remaster: a reckoning of weird ideas while adding gRPC runtimes remaster: a reckoning of weird ideas (+ add gRPC runtime service) Jun 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Removes or replaces previously supported functionality enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant