Skip to content

Latest commit

 

History

History
443 lines (308 loc) · 19.4 KB

README.rdoc

File metadata and controls

443 lines (308 loc) · 19.4 KB

Galaxy

Galaxy is a lightweight software deployment and management tool. We use it at Ning to manage the Java cores and Apache httpd instances that make up the Ning platform (www.ning.com).

Architecture

Galaxy has four components:

  • Agent

  • Console

  • Command-line client

  • Gepo

Agent

The Galaxy agent, galaxy-agent, is a Ruby process responsible for managing deployments, starting and stopping processes, and registering with the console. One galaxy-agent process runs on each host (physical machine, Solaris zone, Virtual Machine) managed by Galaxy.

By default, it runs as the xncore user.

Each agent can manage a single software deployment at a time. For this reason, Galaxy is commonly used with OS virtualization (e.g., Solaris zones), wherein new OS instances, and thus Galaxy agent instances, can be instantiated with low overhead.

Console

The Galaxy console, galaxy-console, is a Ruby process responsible for receiving announcements from agents, logging state changes, and providing an agent list to the command-line tool.

Typically, each environment (e.g. production, QA, development) has its own console.

Command-line client

The galaxy command-line client is the primary user interface to the Galaxy system. It retrieves a list of agents from the console, and contacts individual agents to do its work. Cores can be assigned (deployed), cleared (undeployed), updated, started, and stopped from the command-line tool.

The user must specify which console to use, either through a command-line argument (-c) or an environment variable (GALAXY_CONSOLE).

Gepo

Gepo is the Galaxy software depot. It is a web-accessible repository that contains applications binaries and configuration properties.

Typically, the configuration files are stored in a SCM repository (subversion, git, …), and the binary packages are stored in a single flat directory on the file system.

Gepo is not part of the Galaxy code base but rather an external webserver that needs to be setup.

Gepo architecture tips

The Gepo SCM server should have two top-level directories:

  • config/: The root of a hierarchical directory structure defining a “config path” that contains configuration files

  • binaries/: A flat directory containing application binary packages of the form <type>-<version>-<build>.tar.gz.

Concepts

Binaries

Binaries are stored in the binary branch in Gepo.

Config path

Core configuration properties are stored in the config branch in Gepo, usually broken down per environment (each environment should have its own Gonsole):

Subdirectories under the config branch are known as “config path”. Each host managed by Galaxy can be assigned to one config path, which specifies which binaries and properties to download.

Here is an example of a config path, which has three components:

<site>/<ver>/<type>

where:

  • <site> specifies the site (datacenter)

  • <ver> specifies the version tag associated with the release

  • <type> specifies the core type abbreviation

The config path may include optional sub-types:

<site>/<ver>/<type>/<sub-type>/...

Configuration properties are built by scanning each node of the config path for properties files and merging them to create one file. Properties defined at deeper levels override properties of the same name defined at higher levels.

For example, consider the following files:

uk_datacenter/xncore.properties:

greatest.band="Led Zeppelin"
greatest.drummer="John Bonham"
greatest.guitarist="Jimmy Page"

uk_datacenter/6.5.15/xncore.properties:

greatest.band="Primus"
greatest.bassist="Les Claypool"

uk_datacenter/6.5.15/apache/xncore.properties:

greatest.band="Dave Matthews Band"
greatest.drummer="Carter Beauford"

The configuration file generated from merging these files would contain:

greatest.guitarist="Jimmy Page"
greatest.bassist="Les Claypool"
greatest.band="Dave Matthews Band"
greatest.drummer="Carter Beauford"

Note that this property inheritance/overriding only works on files with names ending in “.properties”. With other files, such as jvm.config or log4j.xml, the file at the deepest level is used as-is.

Deployments

When a host is assigned a config path, updated to a different config path, updated to the same config path (to force a properties refresh), or cleared, the Galaxy agent begins a new “deployment”.

A deployment is a set of files and directories used to download, manage, and run software releases.

A deployment consists of:

  • A sequence number. All Galaxy agents begin with deployment number 1 and increment for each new deployment. This sequence number is stored in the “deployment” file in the agent’s data directory (typically ~xncore/data).

  • A data file describing the deployment core type, build number, and version. The data file is stored in the agent’s data directory (typically ~xncore/data); the file name is the same as the deployment sequence number (e.g., ~xncore/data/1).

  • A deployment directory in the agent’s deploy directory (typically ~xncore/deploy) that contains the contents of the binary archive downloaded from Gepo.

  • A symbolic link called “current” that points to the current deployment directory.

An example might help illustrate these points. Consider the following directory listings (under ~xncore) for a host that has been assigned to 5 different config paths over its lifetime:

xncore@prod1:~: pwd
/home/xncore
xncore@prod1:~: ls -l . data deploy
.:
total 14
drwxr-xr-x 2 xncore xncore 8 Oct 24 04:08 data
drwxr-xr-x 7 xncore xncore 8 Oct 24 04:08 deploy

data:
total 12
-rw-rw-rw- 1 xncore xncore 162 Oct 3 18:16 1
-rw-rw-rw- 1 xncore xncore 162 Oct 12 23:45 2
-rw-rw-rw- 1 xncore xncore 162 Oct 13 00:08 3
-rw-rw-rw- 1 xncore xncore 162 Oct 17 04:58 4
-rw-rw-rw- 1 xncore xncore 164 Oct 24 04:08 5
-rw-rw-rw- 1 xncore xncore 1 Oct 24 04:08 deployment

deploy:
total 16
drwxr-xr-x 8 xncore xncore 9 Oct 12 13:26 1
drwxr-xr-x 7 xncore xncore 7 Oct 11 19:02 2
drwxr-xr-x 7 xncore xncore 8 Oct 17 04:58 3
drwxr-xr-x 7 xncore xncore 8 Oct 24 04:08 4
drwxr-xr-x 7 xncore xncore 9 Oct 24 04:09 5
lrwxrwxrwx 1 xncore xncore 27 Oct 24 04:08 current -> /home/xncore/deploy/5

xncore@prod1:~:cat data/deployment ; echo # the deployment file does not contain a linefeed
5

xncore@prod1:~: cat data/5
--- !ruby/object:OpenStruct
table:
:build: 6.5.10-9378
:core_base: /home/xncore/deploy/5
:config_path: /uk_datacenter/6.5.10/apache
:core_type: resolver

xncore@prod1:~: ls -l deploy/current/
total 767612
drwxr-xr-x 2 xncore xncore 5 Oct 23 22:08 bin
drwxr-xr-x 2 xncore xncore 6 Oct 24 04:08 etc
-rw-rw-rw- 1 xncore xncore 392679750 Dec 1 09:06 launcher.log
-rw-rw-rw- 1 xncore xncore 5 Oct 24 04:09 launcher.pid
drwxr-xr-x 4 xncore xncore 4 Oct 23 22:07 lib
drwxr-xr-x 15 xncore xncore 15 Oct 16 16:56 static-resources
drwxr-xr-x 3 xncore xncore 5 Oct 23 22:07 test

Old deployment directories are not removed when new deployments are created. Thus, a core which has been assigned, cleared, or updated several times will contain several old versions of binaries and logs. It is safe for users to remove all but the current deployment directory if needed.

Launcher

Launcher is a script responsible for starting and stopping cores. It is distributed with each core type, and may vary as needed depending on the type of core it maintains.

Galaxy calls launcher to start and stop cores, and to check core status.

Note that you need to write your own launcher scripts as they are deployments specific. See the section “How to Galaxify an application” below.

Launcher can be invoked directly from the command line, allowing cores to be controlled without the galaxy command-line tool. This is useful for debugging:

xncore@prod1# su - xncore
xncore@prod1% cd deploy/current
xncore@prod1% bin/launcher start

xncore@prod1# su - xncore
xncore@prod1% cd deploy/current
xncore@prod1% bin/launcher stop

How-To

Dependencies

Mongrel gem.

Getting started

You can run a Gonsole and an agent locally. In two different terminals, run:

rake run:gonsole
rake run:gagent

You can now use the command line client to interact with theses:

galaxy > ruby -Ilib bin/galaxy show-agent -c localhost
localhost            online   3.0.0
galaxy > ruby -Ilib bin/galaxy show-console -c localhost
druby://localhost:4440  http://Pierre-Alexandre-Meyers-MacBook-Pro.local:4442   localhost   -   10

Getting help

Running the galaxy command with -help will display on-line help.

xncore@prod1:~: galaxy -help
galaxy [options] <command> [args]

Options:
    -h, --help                       Display a help message and exit
    -c, --console CONSOLE            Galaxy console host (overrides GALAXY_CONSOLE)
    -C, --config FILE                Configuration file (overrides GALAXY_CONFIG)
    -p, --parallel-count THREADS     Maximum number of threads to use, default 25
    -r, --relaxed-versioning         Allow updates to the currently assigned version
    -V                               Display the Galaxy version number and exit
    -y, --yes                        Avoid confirmation prompts by automatically confirming all actions

Filters:
    -i, --host HOST                  Select a specific agent by hostname
    -I, --ip IP                      Select a specific agent by IP address
    -m, --machine MACHINE            Select agents by physical machine
    -M, --cohabitants HOST           Select agents that share a physical machine with the specified host
    -s, --set SET                    Select 'e{mpty}', 't{aken}' or 'a{ll}' hosts
    -S, --state STATE                Select 'r{unning}' or 's{topped}' hosts
    -A, --agent-state STATE          Select 'online' or 'offline' agents
    -e, --env ENV                    Select agents in the given environment
    -t, --type TYPE                  Select agents with a given software type
    -v, --version VERSION            Select agents with a given software version

Notes:
    - Filters are evaluated as: set | host | (env & version & type)
    - The HOST, MACHINE, and TYPE arguments are regular expressions (not globs)
    - The default filter selects all hosts

Commands:
    assign
    cleanup
    clear
    perform
    reap
    restart
    rollback
    show
    show-agent
    show-console
    show-core
    ssh
    start
    stop
    update
    update-config

Installation

Galaxy requires Ruby 1.8.7 or Ruby 1.9 and Ruby Gems. See github.com/ning/galaxy/downloads.

Setup

The galaxy command-line client is used to interact with the deployments via the Gonsole. For convenience, the GALAXY_CONSOLE environment variable can be set prior to running the galaxy command, e.g.:

export GALAXY_CONSOLE=gonsole.prod.company.com

To deploy one core to one machine, you need at least an Agent and a Gonsole running. The Agent needs to run on the same machine where the code is being deployed. The command-line client and the Gonsole can be anywhere (same machine, or not).

Command-line client

Core status is displayed using the ‘show’ command.

xncore@prod1:~: galaxy show -help
galaxy [options] <command> [args]

Usage for 'show':
    show

    Show software deployments on the selected hosts

    Examples:

    - Show all hosts:
        galaxy show

    - Show unassigned hosts:
        galaxy -s empty show

    - Show assigned hosts:
        galaxy -s taken show

    - Show a specific host:
        galaxy -i foo.bar.com show

    - Show all widgets:
        galaxy -t widget show
options

defines a search filter, and is some combination of:

  • -i <host> => Selects the single host specified by the supplied host name

  • -s <site> => Selects the hosts in the specified site <site>, corresponding to the first node of the config path (e.g., “uk_datacenter”)

  • -v <version> => Selects the hosts running the specified version <version>, corresponding to the second node of the config path (e.g., “6.6”)

  • -t <type> => Selects the hosts running the specified core type <type>, corresponding to the third node (and any subsequent nodes) of the config path (e.g., “apache”, “apache/ssl”, …).

  • -s <set> => Selects the hosts in the specified set, which is one of:

  • a => Selects all hosts running a Galaxy agent

  • t => Selects all hosts that have been assigned a role

  • e => Selects all hosts that have NOT been assigned a role

Other search filters are available; see the Galaxy command-line help for more information.

By default, core sub-types (e.g., “apache/ssl”, “apache/passenger”, etc.) are NOT included in searches for the core super-type (e.g., “apache”). To retrieve cores with a sub-type, append a .* to the core type. Note that some shells require the wildcard to be escaped; bash does not (as long as the current working directory does not contain files matching the glob pattern).

How to Galaxify an application

Two files are required to Galaxify an application:

  • A build.properties file uploaded to the desired config path in the Gepo config branch. This file must contain at least one property, type, which defines the _type> portion of the tarball name described below. It may also contain a version property, which defines the <version> portion of the tarball name, which is an arbitrary string that may include a build number, patch date, or other useful information.

A typical build.properties file looks like:

type=apache
build=2.2

Note that if the version property is not specified, it will be inherited from a build.properties file located higher up in the Gepo config path.

  • An appropriately named binary archive uploaded to the binaries branch in Gepo. The file name should be: <type>-<version>.tar.gz.

The archive must contain at least one file, bin/xndeploy, that must return a 0 exit status. This script can do whatever is needed to deploy the application being installed. Note that most xndeploy scripts require the Galaxy library to be included in the archive, under lib/galaxy.

If Galaxy is to start and stop processes, a bin/launcher script is also required. This script takes one of “start”, “stop”, “restart”, or “status”, and returns with exit status 0 (stopped), 1 (running), or 2 (unknown).

The archive can contain any other files the application requires, commonly placed under the lib directory. The archive will be extracted to the deployment directory (~xncore/deploy/<sequence number>) and linked to ~xncore/deploy/current for convenience.

Running an agent

Running the galaxy-agent command with no arguments will display on-line help.

xncore@prod1:~: galaxy-agent -help
Usage: galaxy-agent <command> [options]
  Commands, use just one of these
    -s, --start                      Start the agent
    -k, --stop                       Stop the agent
  Options for Start
    -C, --config FILE                Configuration file (overrides GALAXY_CONFIG)
    -i, --host HOST[:PORT]           Hostname this agent manages (default localhost)
    -m, --machine MACHINE            Physical machine where the agent lives (overrides -M)
    -M, --machine-file FILE          Filename containing the physical machine name
    -c ['http://'|'druby://']HOST[:PORT]
        --console                    Hostname where the console is listening
    -r, --repository URL             Base URL for the repository
    -b, --binaries URL               Base URL for the binary archive
    -d, --deploy-to DIR              Directory where to make deployments
    -x, --data DIR                   Directory for the agent's database
    -f, --fore, --foreground         Run agent in the foreground
    -a, --announce-interval INTERVAL How frequently (in seconds) the agent should announce
  General Options
    -z, --event_listener URL         Which listener to use
    -l, --log LOG                    STDOUT | STDERR | SYSLOG | /path/to/file.log
    -L, --log-level LEVEL            DEBUG | INFO | WARN | ERROR. Default=INFO
    -u, --user USER                  User to run as
    -g, --agent-log FILE             File agent should rediect stdout and stderr to
    -t, --test                       Test, displays as -v without doing anything
    -v, --verbose                    Verbose output
    -V, --version                    Print the galaxy version and exit
    -h, --help                       Show this help

For example:

galaxy-agent -s -i prod1.company.com -c gonsole.prod.company.com -r http://gepo.company.com/config/trunk/prod -b http://gepo.company.com/binaries -d ~xncore/deploy -x ~xncore/data

To see the agent debug log on the terminal, append the following to the end of the above command:

-f -l STDOUT -L DEBUG -v

How to create a Gonsole

Same steps as the agent:

xncore@prod1:~: galaxy-console -help
Usage: galaxy-console <command> [options]
  Commands, use just one of these
    -s, --start                      Start the console
    -P, --start-proxy                Start the proxy console
    -k, --stop                       Stop the console
  Options for Start
    -C, --config FILE                Configuration file (overrides GALAXY_CONFIG)
    -i, --host HOST                  Hostname this console runs on
    -a HOST[:PORT]                   Port for Http post announcements
        --announcement-url
    -p, --ping-interval INTERVAL     How many seconds an agent can be silent before being marked dead
    -f, --fore, --foreground         Run console in the foreground
    -Q, --console-proxied-url URL    Gonsole to proxy
  General Options
    -z, --event_listener URL         Which listener to use
    -l, --log LOG                    STDOUT | STDERR | SYSLOG | /path/to/file.log
    -L, --log-level LEVEL            DEBUG | INFO | WARN | ERROR. Default=INFO
    -g, --console-log FILE           File agent should rediect stdout and stderr to
    -u, --user USER                  User to run as
    -t, --test                       Test, displays as -v without doing anything
    -v, --verbose                    Verbose output
    -V, --version                    Print the galaxy version and exit
    -h, --help                       Show this help

Configuring Galaxy

The preferred method of configuring Galaxy is to write an /etc/galaxy.conf file.

See github.com/ning/galaxy/blob/master/lib/galaxy/config.rb for up-to-date configuration options.

Build

mvn clean package

The Rakefile has a few rules to build gems, RPM and Solaris packages. See:

rake -T

for more information.

License (see LICENSE-2.0.txt file for full license)

Copyright 2010 Ning

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.