Skip to content

Logging Storage Stats

Kelly McLaughlin edited this page Jun 12, 2012 · 8 revisions

Logging User Storage Statistics

Storage stats are tracked on a per-user basis, as rollups for slices of time. They are stored in the same Riak cluster as other MOSS data, in the moss.storage bucket.

For information about querying storage stats, please read Querying Storage Stats.

High Level

The 10k ft. view of RCS storage logging is:

  1. Storage is calculated for all users either:
    1. on a regular schedule
    2. or when manually triggered with the riak-cs-storage script
  2. Each user's sum is stored in an object named for the timeslice in which the aggregation happened. Sums are broken down by bucket.

Log retrieval is then simply requesting from Riak all slice objects for a user in a time period.

Prerequisite: Code Paths for MapReduce

The storage calculation system uses MapReduce to sum the files in a bucket. This means you must tell all of your Riak nodes where to find Riak CS's compiled .beam files before calculating storage.

To expose RCS's modules to Riak, edit Riak's app.config file, and add this to the riak_kv section:

{add_paths, ["/Path/To/RiakCS/lib/riak_moss-X.Y/ebin"]}

where /Path/To/RiakCS is the path to your RCS installation, and X.Y is the version of RiakCS you have installed.

If it is not acceptable to restart the node to make this change, the path can be added by connecting to the Riak node's console and running:

(riak@127.0.0.1)1> code:add_path("/Path/To/RiakCS/lib/riak_moss-X.Y/ebin").
true

If the result is true, adding the path was successful. An error message should describe what happened otherwise. When using this technique, it is recommended that you also use add_paths in Riak's app.config to make sure that the path is re-added if the node is restarted.

To test that you have configured a Riak node correctly, connect to its console (using riak attach), then run:

(riak@127.0.0.1)1> code:which(riak_moss_storage).
"/Path/To/RiakCS/riak_moss-X.Y/ebin/riak_moss_storage.beam"

If the path that you added to Riak's app.config is returned, your node is configured correct. If instead, the atom non_existing is returned, Riak was unable to find the RCS code.

Note: currently the only three RCS modules needed for the storage calculation are riak_moss_storage, riak_moss_manifest, and riak_moss_manifest_resolution. If necessary, it will suffice to expose only these three modules to the Riak cluster.

Scheduling and Manual Triggering

Triggering the storage calculation is a matter of setting up a regular schedule or manually starting the process via the riak-cs-storage script.

Regular Schedules

If you would like to have an RCS node calculate the storage used by every user at the same time (or times) each day, specify a schedule in that node's app.config file.

In the riak_moss section of the file, add an entry for storage_schedule like:

{storage_schedule, "0600"}

The time is given as a string of the form HHMM, representing the hour and minute GMT to start the calculation process. In this example, the node would start the storage calculation at 6am GMT every day.

To set up multiple times, specify a list in the schedule. For example, to schedule the calculation to happen at both 6am and 6pm, use:

{storage_schedule, ["0600", "1800"]}

Important: When using multiple times in a storage schedule, they must be scheduled for different archive periods (see details for storage_archive_period in the Archival section below). Extra scheduled times in the same archive period are skipped. This is intended to allow more than one Riak CS node to calculate storage stats concurrently, as they will take notice of users already calculated by other nodes and skip them (see details in the Manual Triggering section about overriding this behavior).

By default, no schedule is specified, so the calculation is never done automatically.

Manual Triggering

If you would rather trigger storage calculations manually, simply use the batch command in the riak-cs-storage script:

$ riak-cs-storage batch
Batch storage calculation started.

If there is already a calculation in progress, or if starting the calculation fails for some other reason, the script will print an error message saying so.

By default, a manually-triggered calculation run will skip users that have already been calculated in the current archive period (see the Archival section below for details about storage_archive_period). If you would rather calculate an additional sample for every user in this period, add the --recalc (or -r for short) option to the commandline:

$ riak-cs-storage batch -r  # force recalculation of every user

Further Control

In-process batch calculations can also be paused or canceled using the riak-cs-storage script.

To pause an in-process batch, use:

$ riak-cs-storage pause
The calculation was paused.

To resume a paused batch, use:

$ riak-cs-storage resume
The calculation was resumed.

To cancel an in-process batch (whether paused or active), use:

$ riak-cs-storage cancel
The calculation was canceled.

You can also retrieve the current state of the daemon by using the status command. The first line will indicate whether the daemon is idle, active, or paused, and it will be followed by further details based on progress. For example:

A storage calculation is in progress
  Schedule: none defined
  Last run started at: 20120316T204135Z
  Current run started at: 20120316T204203Z
  Next run scheduled for: unknown/never
  Elapsed time of current run: 3
  Users completed in current run: 1
  Users left in current run: 4

Results

When the node finishes calculating every user's storage, it will print a message to the log noting how long the entire process took:

08:33:19.282 [info] Finished storage calculation in 1 seconds.

Internals

If you're hacking on this system, you may be interested in the process and the archival.

Process

The calculation process is coordinated by the riak_moss_storage_d gen_fsm. This is a long-lived FSM, and it handles both the scheduling (if a schedule is defined) and the running of the process.

When a storage calculation starts, the first step is to grab a list of known users. This is simply a list of the keys in the moss.users bucket.

The list of each user's buckets is found by reading that user's record.

For each bucket that a user owns, a MapReduce query is run. The query's inputs are the list of the keys in the bucket (the input is <<"BucketName">>, so the keys stay on the server). The query then has two phases: a map that produces tuples of the form {1, ByteSize(File)} (if active; nothing if inactive), and a reduce that sums those tuples element-wise. The result is one tuple whose first element is the number of files in the bucket, and whose second is the total number of bytes stored in that file.

Only one bucket is calculated at a time, to prevent too high of a load on the Riak cluster. Only one user is calculated at a time as well, to prevent too large of a temporary list on the RCS node.

Once the sum for each of the user's buckets is calculated, a record is written to the <<"moss.storage">> Riak bucket.

Archival

Records written to the <<"moss.storage">> bucket are very similar to records written to the <<"moss.access">> bucket, used when Logging Access Stats. The value is a JSON object with one field per bucket. The key is a combination of the user's key_id and the timestamp of the timeslice for which the calculation was run.

The period for storage archival is separate from the period for access archival. The storage archival period is configured by the riak_moss application environment variable storage_archive_period. The default is 86400 (one day). This is because storage calculations are expected to be archived much less often than access logs, so specifying fewer possible keys to look up later reduces overhead at reporting time.