Skip to content
This repository has been archived by the owner on Sep 26, 2021. It is now read-only.

Make libmachine consumable by outside world #1729

Merged
merged 1 commit into from
Sep 23, 2015

Conversation

nathanleclaire
Copy link
Contributor

Alright, I'm hesitant to turn this in incomplete (which it is so far-- probably about 60-70% done), but better early than never, and I'd prefer to iterate on something crude and incomplete rather than toil away and learn that others have some fundamental disagreements on my approach.

Sorry for the size of the patch, but you'll notice (in the current state at least) there are about 2x as many deletions as additions, and a chunk of it is now-or-never cleanup work to make the API more sane.

I'll annotate the code a little to guide since the patch is so large.

Design Principles:

  • Never, ever write to disk inside the host module. That's the persist module's job. To that end, it would be good to get another Store implementation to prove viability outside of Filestore, and automatically test it. To that extent, ideally Store needs some more re-working so it can keep track of artifacts (secrets such as certs and mundane artifacts like ISOs) instead of having everybody write them to the filesystem willy-nilly.
  • Provider in its current form is completely gone as its role conflicted heavily with host.Host. Later, I want to bring back Provider as a notion of something which "provides" host creation and management and provides the things needed to bootstrap control of them, such as API tokens.

Something like the following code is my rough roadmap for where I'm working towards with the libmachine module. I'm not really sure how to best handle the awkward DriverOptions bit. interface{} seems wrong, but we don't have strongly typed Providers to give us Hosts (like alluded to above).

package main

// Sample Virtualbox create independent of Machine CLI.

import (
    "fmt"

    "github.com/docker/machine/libmachine"
    "github.com/docker/machine/libmachine/engine"
    "github.com/docker/machine/libmachine/host"
    "github.com/docker/machine/drivers/virtualbox"
)

func main() {
    // returns the familiar store at $HOME/.docker/machine
    store, _ := libmachine.GetDefaultStore()

    h := &host.Host{
        Name: "mycoolhost",
        HostOptions: &host.HostOptions{
                EngineOptions: &engine.EngineOptions{
                    StorageDriver: "overlay",
                },
        },
        Driver: &virtualbox.Driver{
            CPU: 2,
            Memory: 2048,
        },
    }

    // This method is a very high-level method which includes host creation,
    // saving the newly created host in the store, and provisioning.
    h, _ = libmachine.Create(store, h)

    out, _ := h.RunSSHCommand("df -h")

    fmt.Printf("Results of your disk space query:\n%s\n", out)

    fmt.Println("Powering down machine now...")
    _ := h.Stop()
}

Thoughts:

  • I'd like to rename host module to something a bit less confusing, to avoid collision with the instantiated Host struct. To that extent, I really am wondering if we should just move Host to be Machine, as it is libmachine after all.
  • You'll notice that store is now in its own module, as are most of the libmachine-related functions. The core libmachine namespace will be reserved for very high-level functions meant for calling from the outside world, and/or commands.
  • libmachine depends on other things such as github.com/docker/machine/log, do those go in libmachine as well? How does that all work? Logging etc. since I'm sure consumers of the API won't want us logging to terminal all over the place implicitly. Guess we should pass in an io.Writer as a "config" property of... something?
  • The API should probably be versioned, how do we do so? Sounds like another case for a libmachine client struct that you should be instantiating before doing anything.

cc @ehazlett @ibuildthecloud @hairyhenderson @SvenDowideit @tianon @bfirsh 🎵 MACHINE PARTY 🎉 🎈 🎊

@nathanleclaire nathanleclaire force-pushed the git_r_done_libmachine branch 2 times, most recently from 644edfe to 2584280 Compare August 19, 2015 05:49
@@ -12,22 +12,8 @@ func cmdActive(c *cli.Context) {
log.Fatal("Error: Too many arguments given.")
}

certInfo := getCertPathInfo(c)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is / was all over the place, and it's super repetitive, so it's been replaced by a call to getStore which is a helper function that performs this logic.

@ehazlett
Copy link
Contributor

Huge +1 to this! Great work! I think this is a great start. 👍 for the deletes!

This. Is. Awesome (to be said in a Spartan voice). :)

@nathanleclaire nathanleclaire force-pushed the git_r_done_libmachine branch 4 times, most recently from 8fc990d to 7dbd979 Compare August 21, 2015 05:29
@nathanleclaire
Copy link
Contributor Author

So, the tests are hella broken, BUT! I finally have a working POC standalone usage of libmachine which you can go run and it Just Works. And the generated store can be used interchangeably with the Docker Machine CLI (in this branch at least- I'm not 100% positive about master).

@nathanleclaire
Copy link
Contributor Author

My general thinking is, now that something relatively working-ish is put together, we can iterate on the design of the API and talk about next steps. There's still a lot of work left to go, mostly converting existing drivers over to the new method, and discussing how to handle the change, but this script works out of the box now!

package main

import (
        "fmt"
        "log"

        "github.com/docker/machine/libmachine"
        "github.com/docker/machine/libmachine/drivers/virtualbox"
)

func main() {
        libmachine.SetDebug(true)

        // returns the familiar store at $HOME/.docker/machine
        store := libmachine.GetDefaultStore()

        // over-ride this for now (don't want to muck with my default store)
        store.Path = "/tmp/automatic"

        hostName := "myfunhost"

        // Set some options on the provider...
        driver := virtualbox.NewDriver(hostName, "/tmp/automatic")
        driver.CPU = 2
        driver.Memory = 2048

        h, err := store.NewHost(driver)
        if err != nil {
                log.Fatal(err)
        }

        h.HostOptions.EngineOptions.StorageDriver = "overlay"

        if err := libmachine.Create(store, h); err != nil {
                log.Fatal(err)
        }

        out, err := h.RunSSHCommand("df -h")
        if err != nil {
                log.Fatal(err)
        }

        fmt.Printf("Results of your disk space query:\n%s\n", out)

        fmt.Println("Powering down machine now...")
        if err := h.Stop(); err != nil {
                log.Fatal(err)
        }
}

@nathanleclaire
Copy link
Contributor Author

My thinking around people wanting to create REST APIs and such with this is, you can unmarshal the HostOptions and Driver options onto the structs created via the factory methods before doing the create, and all the boilerplate set up for those structs will still be in place, but the user-defined options will over-ride all the other defaults.

@ehazlett
Copy link
Contributor

Looks good! Could we get a list of remaining items so we could help knock those out?

@nathanleclaire
Copy link
Contributor Author

Looks good! Could we get a list of remaining items so we could help knock those out?

Definitely! I intend to squash down to a single commit as well soon (it will help with rebase hell down the line).

Here's my list so far:

Must haves

  • Fix unit tests (I intend to do this before handing off this PR and going on vacation)
  • Fix integration tests (It's been a while since I've run them unfortunately and there may be things which I haven't foreseen that have broken.)
  • Integrate all drivers to the new model of doing things. This will be a decent chunk of rote work and I also intend to put together a guide, even if it is brief, on how to do it. Mostly it is writing a new NewDriver method and adding them to the switch block in drivers/driverfactory/maker.go.

UPDATE: I have updated the drivers partially to the new NewDriver function style but naturally we need to test them, also that function is duplicated in most of them so I wonder if there's a better way to handle it. Many of them will need to have their signatures corrected to return just *Driver, too.

  • Speaking of NewDriver, we need to introduce a uniform way to handle driver defaults. That is to say, we have defaults in the flags today which we can no longer reliably expect the CLI to handle for machines created with libmachine. So, we need to introduce into each driver, a set of per-module level default consts for flags that get re-used for the flag defaults and struct defaults, i.e. defaultCPU, defaultMemory for VirtualBox. I'll take a look at using that one as an example to show what I mean.

    UPDATE: If you take a look at virtualbox.go in this PR you should be able to get a feel for what I am talking about.

  • There are places where we are reading from / setting environment variables in libmachine and they are changing behavior. I've debugged what I can so far of this but I know there are still instances lurking around the corner waiting to bite us. READING AND WRITING TO ENVIRONMENT VARIABLES SHOULD NEVER EVER HAPPEN INSIDE THE libmachine PACKAGE!!!!! We have to fix every single instance of this or it will cause really sneaky bugs. There's still quite a bit of work to do here as calls to functions like utils.GetMachineDir, which rely on environment variables, are littered all over the place.

UPDATE: I've factored out a lot of the offending code into mcndir module within commands. However, there are still little bits here and there, especially in the tests, that need cleaning up.

  • There may be a bug persisting the ArtifactPath (it gets over-written as an empty string on some subsequent commands after Create), but I haven't debugged it yet. So, that needs to be fixed.
  • Documentation for the API, and preferably some more examples as well. In general I think we should probably just have the documentation for the API on Github and this will allow us to move quickly on updating it. When it becomes fairly stable, we can talk about how / where we want to put it on the official docs site.
  • Create a real libmachine.Client struct. Possibly, we can make it embed store.Store, so that you can just call mcnClient.NewHost, mcnClient.Load, etc. directly.
  • I haven't figured out a good way to pass the Host-level SwarmOptions information to the drivers yet, so that's just been punted for now :( Maybe they will need to get passed in to NewDriver methods, and BaseDriver will just embed SwarmOptions as well.

Nice to haves

  • I think that we need a per-driver log in addition to the "global" level log. This will be a pretty good size chunk of search-replace work to integrate into all drivers though.
  • Roundup easy / obvious PRs to the core and merge them before merging this in so contributors don't have to go through rebase hell.
  • Add more examples of usage to examples module.
  • There are some methods in drivers.go that seem kind of spurious to me (they should be moved to a different module), but maybe we can delay that to another time.
  • Before subjecting other people to the API, I highly suggest that we rename store.Get to store.Load ;P (Save and Load makes more sense to me than Save and Get)
  • Again, maybe a bikesheddy point, but I really want to rename Host to Machine. It's just way more consistent - "I use libmachine to create 'hosts'" vs. "I use libmachine to create 'machines'". Or at least rename host module to something like hostmanage.
  • I know I sound like a broken record on this point, but the longer we put off a real "Provider" model the worse off we will be. Others downstream will want to have a notion of a "Provider" that they can use to launch resources in and continuing down the path of managing API tokens etc. directly in the 'machine' drivers like we have so far just seems very wrong to me.
  • Move swarm container provisioning to use libcompose? I think it would allow us a lot more flexibility in terms of re-running provisioning (right now Swarm is the only thing that chokes when you re-run a provision on a finished machine, because you can't docker run the containers again if they're already there).
  • I really don't like our current ssh wrappers and want to clean it up to use variadic args (like exec.Cmd) and to return STDOUT and STDERR separately instead of mashing them together like we do today.
  • Game plan for moving libmachine into its own repo, and the drivers we will be maintaining into their own repos. It would allow us to move faster on both projects and keep the API barrier nicely separated. If Machine CLI and API stay in the same repo it is a huge risk that the boundary will bleed over between them again and totally ruin the hard work to get a clean separation between them.
  • The "Artifact" model as it is now is a little hacky (but much better than what we were doing before) and needs some love / design discussion. The basic dilemma is that drivers need to be able to write to the local filesystem somewhere to store, say, SSH keys BUT I wanted to ensure in libmachine that this was kept as separate from storing host metadata (Store) as possible. So, it's a bit hacky today as each driver has a LocalArtifactPath (for per-machine artifacts like hard disks and SSH key files) and GlobalArtifactPath (almost exclusively for ISOs in the current implementation) method. There may well be a better way.
  • Really would like to switch the "action" strings over to a proper type and expose the concurrent action stuff to the outside world, it's pretty great and I bet people will want to use it.

Side note: I've been thinking about how to integrate this with plugins and wondering, maybe we should just have one plugin driver which wraps the real-deal binaries? Interesting food for thought at least.

@nathanleclaire
Copy link
Contributor Author

There seems to be a bug persisting CaPrivateKeyPath in some cases as well (I think reading from the environment when it shouldn't be).

EDIT: server.pem and server-key.pem seem to be going to the wrong place as well.

@nathanleclaire
Copy link
Contributor Author

No one's said anything at : #1730 so I'm just going to go ahead and remove the useless coverage check for Travis here.

@nathanleclaire nathanleclaire force-pushed the git_r_done_libmachine branch 2 times, most recently from a8d63f3 to 83d3f60 Compare August 23, 2015 07:03
@nathanleclaire
Copy link
Contributor Author

Still a long way to go but all the unit tests are passing again :)

@nathanleclaire nathanleclaire force-pushed the git_r_done_libmachine branch 3 times, most recently from dac03dd to 36de55f Compare August 23, 2015 09:10
@nathanleclaire
Copy link
Contributor Author

I feel confident about merging this very soon -- currently I am running through the integration tests to check up, but all new development work on this branch is completed, the unit tests are passing, and I'm really keen to move this upstream ASAP and get started PRing the plugin work.

I do want to make a followup PR to add some documentation and more examples, though. Additionally, there seems to be at least one issue with cert placement when using the standalone program.

@nathanleclaire
Copy link
Contributor Author

screen shot 2015-09-17 at 6 38 35 pm

These integration tests are failing with Virtualbox for this branch, going to try with DigitalOcean and see which ones are consistently failing, I'm not sure which failures might be inherited from master as well.

@nathanleclaire
Copy link
Contributor Author

The env_shell is one of my own doing, at least. I had added that in to account for the unit test(s) for config and env I had taken out. I'd say this PR is "ready to merge pending fixed integration tests" status.

@nathanleclaire
Copy link
Contributor Author

OK, everything looks G2G now on integration tests:

image

@nathanleclaire
Copy link
Contributor Author

So I'm ready to merge this whenever comfortable, but first there are a few things I want to merge upstream and rebase to.

@nathanleclaire
Copy link
Contributor Author

It should be noted, though, that while this is a backwards-compatible change, it is not necessarily forwards-compatible (due to a config migration) -- so, it is a little dangerous to use older versions of Machine after using this on hosts you have created.

@hairyhenderson
Copy link
Contributor

it is a little dangerous to use older versions of Machine after using this on hosts you have created.

probably out of scope for this PR, but I wonder if we should prompt before migration so users are duly warned...

@nathanleclaire
Copy link
Contributor Author

probably out of scope for this PR, but I wonder if we should prompt before migration so users are duly warned...

Or at least make a backup copy of their old config before attempting migration, etc.? I'm more concerned about users (not neccesarily intentionally) switching back and forth between old versions of Machine (currently it will FUBAR your config to do so if it has been migrated) than issues with the migration itself.

@nathanleclaire nathanleclaire force-pushed the git_r_done_libmachine branch 2 times, most recently from c84605e to fe0ae54 Compare September 23, 2015 01:17
- Clear out some cruft tightly coupling libmachine to filestore

- Comment out drivers other than virtualbox for now

- Change way too many things

- Mostly, break out the code to be more modular.

- Destroy all traces of "provider" in its current form.  It will be
brought back as something more sensible, instead of something which
overlaps in function with both Host and Store.

- Fix mis-managed config passthru

- Remove a few instances of state stored in env vars

- This should be explicitly communicated in Go-land, not through the
shell.

- Rename "store" module to "persist"

- This is done mostly to avoid confusion about the fact that a concrete
instance of a "Store" interface is oftentimes referred to as "store" in
the code.

- Rip out repetitive antipattern for getting store

- This replaces the previous repetive idiom for getting the cert info, and
consequently the store, with a much less repetitive idiom.

- Also, some redundant methods in commands.go for accessing hosts have
either been simplified or removed entirely.

- First steps towards fixing up tests

- Test progress continues

- Replace unit tests with integration tests

- MAKE ALL UNIT TESTS PASS YAY

- Add helper test files

- Don't write to disk in libmachine/host

- Heh.. coverage check strikes again

- Fix remove code

- Move cert code around

- Continued progress: simplify Driver

- Fixups and make creation work with new model

- Move drivers module inside of libmachine

- Move ssh module inside of libmachine

- Move state module to libmachine

- Move utils module to libmachine

- Move version module to libmachine

- Move log module to libmachine

- Modify some constructor methods around

- Change Travis build dep structure

- Boring gofmt fix

- Add version module

- Move NewHost to store

- Update some boring cert path infos to make API easier to use

- Fix up some issues around the new model

- Clean up some cert path stuff

- Don't use shady functions to get store path :D

- Continue artifact work

- Fix silly machines dir bug

- Continue fixing silly path issues

- Change up output of vbm a bit

- Continue work to make example go

- Change output a little more

- Last changes needed to make create finish properly

- Fix config.go to use libmachine

- Cut down code duplication and make both methods work with libmachine

- Add pluggable logging implementation

- Return error when machine already in desired state

- Update example to show log method

- Fix file:// bug

- Fix Swarm defaults

- Remove unused TLS settings from Engine and Swarm options

- Remove spurious error

- Correct bug detecting if migration was performed

- Fix compilation errors from tests

- Fix most of remaining test issues

- Fix final silly bug in tests

- Remove extraneous debug code

- Add -race to test command

- Appease the gofmt

- Appease the generate coverage

- Making executive decision to remove Travis coverage check

In the early days I thought this would be a good idea because it would
encourage people to write tests in case they added a new module.  Well,
in fact it has just turned into a giant nuisance and made refactoring
work like this even more difficult.

- Move Get to Load
- Move HostListItem code to CLI

Signed-off-by: Nathan LeClaire <nathan.leclaire@gmail.com>
@nathanleclaire
Copy link
Contributor Author

I've added a config.json.bak, as well as made the store save the JSON in "pretty" format for easier editing on disk just in case we need to instruct a small subset of users to edit the JSON manually.

@nathanleclaire
Copy link
Contributor Author

I know of at least one issue using this lib independently but I will make a followup issue for it this PR LGTM

nathanleclaire added a commit that referenced this pull request Sep 23, 2015
Make libmachine consumable by outside world
@nathanleclaire nathanleclaire merged commit bae2d33 into docker:master Sep 23, 2015
@zchee
Copy link
Contributor

zchee commented Sep 23, 2015

@nathanleclaire Awesome 🎉

@nathanleclaire
Copy link
Contributor Author

👍

@hairyhenderson
Copy link
Contributor

I've added a config.json.bak, as well as made the store save the JSON in "pretty" format for easier editing on disk just in case we need to instruct a small subset of users to edit the JSON manually.

👍 awesome!

also, yay for merge! :D

@briceburg
Copy link

@nathanleclaire curios here -- does the merging of libmachine mean we'll be able to interface with machines from the docker command (without having to eval first)? Something like docker --machine ocean-1 ps ?

Thanks!

@nathanleclaire
Copy link
Contributor Author

@briceburg Not exactly -- libmachine is intended to be more like an SDK for Docker Machine, written in the Go language. There's some rough edges in the current implementation, though.

@dweomer
Copy link
Contributor

dweomer commented Jan 29, 2016

+1 on the branch name

@coveralls
Copy link

Coverage Status

Coverage increased (+35.4%) to 47.826% when pulling b5927f1 on nathanleclaire:git_r_done_libmachine into f2bb2e0 on docker:master.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants