sep-(a)CMA-ES does escape the solution basin on fsphere in high dimension with gradient injection #97

beniz · 2015-01-08T13:12:38Z

First, comparing sep-aCMA-ES with and without gradient injection (GI), GI allows fast convergence:

The difference in initial value is due to not showing iteration 0 (will be ticketed and fixed).

However, a closer look shows that while nearing the optima, sep-aCMA-ES with GI escapes the solution basin and stagnates around 5000 or so in 1M-D. Below is a logscale of the objective function value on the same run as above:

Same behavior is witnessed with sep-CMA-ES.

To reproduce:

./test_functions -fname fsphere -alg sepacmaes -ftarget 1e-8 -dim 1000000 -with_gradient -no_stagnation -max_hist 1000 -x0 1 -sigma0 0.1

This might be a bug, in all cases, this is a weird behavior.

The text was updated successfully, but these errors were encountered:

nikohansen · 2015-01-08T14:01:49Z

It must already be a bug in "pure" separable CMA-ES, as it is supposed to converge. I also suggest to always plot the y-axis in log-scale. For diagnosis, the standard output with step-size(s), eigenvalues etc is advisable.

EDIT: I didn't consider that it was 1M-D, in which case this should be considered expected behavior, if the initial step-size is large.

beniz · 2015-01-08T14:20:06Z

Yes, sigma moves down by ~1e-5, and the condition number up by ~1e-7 in both cases. I can't see any difference before and after step 60 when GI is activated.

Since this is in 1M-D, I will try to reproduce in lower dimension, which would simplify plotting of eigen and param values as well.

Sep was tested on BBOB and on neural nets of up to ~400-D without experiencing this behavior.

I will report here as it goes...

loshchil · 2015-01-08T14:27:51Z

modification of the initial step-size and a longer run should help

On Thu, Jan 8, 2015 at 3:20 PM, Emmanuel Benazera notifications@github.com
wrote:

Yes, sigma moves down by ~1e-5, and the condition number up by ~1e-7 in
both cases. I can't see any difference before and after step 60 when GI is
activated.

Since this is in 1M-D, I will try to reproduce in lower dimension, which
would simplify plotting of eigen and param values as well.

Sep was tested on BBOB and on neural nets of up to ~400-D without
experiencing this behavior.

I will report here as it goes...

—
Reply to this email directly or view it on GitHub
#97 (comment).

beniz · 2015-01-08T14:35:13Z

modification of the initial step-size and a longer run should help

playing with the initial step-size certainly allows to trigger the problem:

./test_functions -fname fsphere -alg sepcmaes -ftarget 1e-8 -dim 50000 -x0 1 -sigma0 1 -no_stagnation

immediately diverges, while

./test_functions -fname fsphere -alg sepcmaes -ftarget 1e-8 -dim 50000 -x0 1 -sigma0 0.1 -no_stagnation

I'm investigating whereas this could be due to a bug. I am worried that users might try in high dimensions and reject the whole thing while a change in sigma would do.

beniz · 2015-01-08T15:02:30Z

So, a few more experiments in lower dims show that on sufficiently long runs, sigma lowers and the search takes the right direction again.

So I am thinking that it could make sense to have a trick that tries a larger than usual change in sigma when the search goes for a certain number of steps in the wrong direction ?

nikohansen · 2015-01-09T00:13:29Z

I see, then "initial elitism" might be a good try, and you are right, additionally decreasing sigma by a factor of 2 until the first success is observed is probably quite effective.

One general problem is that cumulative step-size adaptation has a horizon that depends on the dimension (though it is possible to modify this dependency down to sqrt dimension by setting cs accordingly). The parameters or even the method might not be perfectly suited for 1M-D problems. Two-point adaptation is a promising alternative candidate, where it should also be simpler to adjust the parameters.

To get the full picture: what is the population size?

beniz · 2015-01-09T09:03:06Z

To get the full picture: what is the population size?

lambda = 45

then "initial elitism" might be a good try

The current implementation of initial elitism enables after the run ends. In high dim, an earlier trigger could be useful.

nikohansen · 2015-01-09T11:51:04Z

To me it seems generally useful even without restarts. Actually, it seems particularly useful for a user-supplied initial point to be "elitist" for a single run.

nikohansen · 2015-01-09T13:13:26Z

It remains to be understood why the gradient approach stagnates in the end?

beniz · 2015-01-09T13:38:07Z

Yes, I may need to run it longer. I will post here results from a set of experiments.

beniz · 2015-01-10T11:18:38Z

So, I am able to report on three tiny fronts:
1- the stagnation of the gradient approach after a good dip toward the optima (i.e the original pb this ticket started on);
2- an elitist x0 scheme where x0 is reinjected as long as half of candidates are above f(x0);
3- a sigma step-control scheme (division by half) whenever the search appears to be going in the wrong direction for a number of steps.

First, elitist x0 (the 2-) does not appear to work always well: in some reproducible cases, the search immediately stales at f(x0) value unless lambda is increased. This may be that somehow the reinjection prevents the cov matrix to shape such that a direction appears ?

Second, I have tried 3 on 1, with mitigated results, that may indicate either a bug in gradient injection and/or that scheme 3 needs more careful control.

Due to the linearity of github comments, I will separate and results into different comments.

beniz · 2015-01-10T11:27:44Z

1- gradient injection scheme stagnation

with sufficient steps, on a 5000-D problem (instead of the 1M-D initial pb), the initial problem can be observed til further into the run:
on the 1M-D initial problem, a longer run allows to see that after the bump the search starts converging again, though extremely slowly.

beniz · 2015-01-10T11:49:00Z

2-elitist x0
This is a simple modification of 'initial elitism', available for testing on branch 'sep_97', that applies elitism on first run to x0, i.e. it reinjects x0 until half of the candidate fvalues are < f(x0). It is activated by using -elitist 2, while -elitist 1 applies the 'original' behavior, i.e. re-injection of previous run's best candidate.

While the scheme sometimes works, it appears to stagnate indefinitely in certain cases. Typically, taking the diverging case of a few comments above, applying the new scheme:

./test_functions -fname fsphere -alg sepcmaes -ftarget 1e-8 -dim 50000 -x0 1 -sigma0 1 -no_stagnation -elitist 2

which yields:

INFO - iter=245 / evals=7105 / f-value=5000 / sigma=0.628006445685068 / last_iter=8
INFO - iter=246 / evals=7134 / f-value=5000 / sigma=0.62681772456298 / last_iter=8
INFO - iter=247 / evals=7163 / f-value=5000 / sigma=0.62563139059145 / last_iter=8
INFO - iter=248 / evals=7192 / f-value=5000 / sigma=0.62444528509015 / last_iter=8
...

The scheme appears to even block convergence on a an otherwise converging example:

./test_functions -fname fsphere -alg sepcmaes -ftarget 1e-8 -dim 50000 -x0 1 -sigma0 0.1 -no_stagnation -elitist 2

beniz · 2015-01-10T12:04:51Z

3- sigma (crude) step-control
This scheme halves sigma is search goes into the wrong direction for a number of step. In the following the number of steps has been arbitrarily set to 10. It is activated by using the -stigma_stepc option in 'test_functions'.

Tested two tests for triggering the scheme:
1- a test of whether f(x_cand) > f(x0) : this works but can fail when the test triggers repeatedly and lowers sigma too much to the point where f(x_cand) stagnates while above f(x0).
2- a test of whether f(x_{t-1}) < f(x_t) for a number of steps: this appears to work better on my tiny set of experiments, though there's always the risk of lowering sigma to the point of stagnation.

Using test 2 above:

Results on the non-converging example of a few comments above:

to reproduce:

./test_functions -fname fsphere -alg sepcmaes -ftarget 1e-8 -dim 50000 -x0 1 -sigma0 1 -no_stagnation -sigma_stepc

Results on the initial problem of gradient injection dip and stagnation:

Here there's obviously the problem that sigma becomes too low and stagnation occurs. The trigger for this behavior seems to be the weird behavior of the search that goes back up after going toward the optimal. This behavior is very clearly repeated every time sigma is halved, with a dip and an upward shot. This might indicated a bug in gradient injection.
To reproduce:

./test_functions -fname fsphere -alg sepacmaes -ftarget 1e-8 -dim 1000000 -with_gradient -no_stagnation -max_hist 1000 -x0 1 -sigma0 0.1

In all cases, the latter behavior is my next point of investigation. Any tip on how to understand this better is welcome.

beniz · 2015-01-10T12:06:36Z

One general problem is that cumulative step-size adaptation has a horizon that depends on the dimension (though it is possible to modify this dependency down to sqrt dimension by setting cs accordingly). The parameters or even the method might not be perfectly suited for 1M-D problems. Two-point adaptation is a promising alternative candidate, where it should also be simpler to adjust the parameters.

Shall we think of a 'set_high' (much as the 'set_noisy') function that would attempt to set a slightly improved set of parameters for high dimensions ? If this appears feasible, I could open a dedicated ticket and prepare the required range of simulations for selecting the parameter values...

nikohansen · 2015-01-10T12:39:10Z

1- gradient injection scheme stagnation: Try with different initial step-sizes. My prediction is that the y-location of the spike corresponds to the step-size. This would mean, that the reason is that step-size cannot converge as fast to zero as the gradient can point the algorithm to the optimum. Not a surprise, because, the gradient makes effectively 5000 evaluations in one iteration.

The best resolution is probably to apply TPA instead of CSA and use a constant damping parameter instead a dimension-dependent setting.

beniz · 2015-01-10T13:02:05Z

Not a surprise, because, the gradient makes effectively 5000 evaluations in one iteration.

Understood. Just FTR, I am using the function derivative to avoid the cost per iteration.

The best resolution is probably to apply TPA instead of CSA and use a constant damping parameter instead a dimension-dependent setting.

OK, I'll make that ticket higher on my list.

nikohansen · 2015-01-10T13:33:40Z

3- sigma (crude) step-control: I don't see anything wrong with the test cases. Just the same "problem" as before: the expected convergence rate (steepness of the f-graph) of the default strategy is roughly 1 order of magnitude for 10*dimension evaluations.

In the second example, the step-size is in the end IMHO far too large, because the gradient quickly shifted the x-value quickly much closer to the optimum, which renders a given optimal step-size to be too large. Same as the spike effect seen in the other experiments.

nikohansen · 2015-01-10T13:54:57Z

instead of

cs = (mueff + 2) / (N + mueff + 3)

we could use

cs = (sqrt(mueff) + 2) / (sqrt(N) + sqrt(mueff) + 3)

as alternative setting. It reduces the time horizon to sqrt(dimension), but I don't think it solves the problem to facilitate quicker step-size decrements.

nikohansen · 2015-01-10T16:41:22Z

2-elitist x0: I can't reproduce the "blocking" behavior.

INFO - iter=100 / evals=3600 / f-value=50000 / sigma=0.0976089129404275 / last_iter=92
.
.
.
INFO - iter=8100 / evals=291600 / f-value=9053.62436620463 / sigma=0.0435240965572909 / last_iter=106

It seems just to be about as slow as I expect (I didn't check the rate precisely).

beniz · 2015-01-12T10:28:32Z

2-elitist x0: I can't reproduce the "blocking" behavior.

OK, thanks for pointing this out, my appreciation of convergence rate is still very much blurry.

At first I thought that elitist-x0 could be a default option, but then I witnessed that it was slowing (or hampering) convergence on the same run with sigma=0.1:

./test_functions -fname fsphere -alg sepcmaes -ftarget 1e-8 -dim 50000 -x0 1 -sigma0 0.1 -no_stagnation -elitist 2

compared to:

./test_functions -fname fsphere -alg sepcmaes -ftarget 1e-8 -dim 50000 -x0 1 -sigma0 0.1 -no_stagnation -elitist 2

Initially I thought that increasing lambda in this particular case would limit the impact of the reinjected x0, but it doesn't appear to do so. I still have difficulties grasping this one.

Note that elitist-x0 does re-inject a single instance of the initial candidate.

beniz · 2015-01-12T10:53:11Z

1- gradient injection scheme stagnation: Try with different initial step-sizes. My prediction is that the y-location of the spike corresponds to the step-size. This would mean, that the reason is that step-size cannot converge as fast to zero as the gradient can point the algorithm to the optimum. Not a surprise, because, the gradient makes effectively 5000 evaluations in one iteration.

I believe you are correct, see below:

beniz · 2015-01-12T11:00:04Z

FYI, this whole thing started because I had noted from our chat, to double-check whether there was a bug in GI in high dim as I could not witness any difference with and without GI on solving neural nets for instance.

In practice, my opinion is that if the gradient is available for a high dimensional problem and can be injected even every x steps, it would make sense to mitigate the problem above.

I can try to see what comes out of injecting the gradient less often, which would also in practice be a computational gain. There are other simple things I am willing to try out, such as reinjecting the best solution so far (instead of waiting til the end of the run).

nikohansen · 2015-01-12T11:27:49Z

Right, exactly as predicted (the kink at the end of the right flank is supposed to correspond to step-size).

The most useful mitigation is to enable quick decrease of the step-size. I think reinjection of the best will only help if mu=1, probably not even then so much. Of course in any case the graphs for the current best might look better (depending on what exactly is plotted), but this is rather of cosmetic nature.

beniz · 2015-01-12T11:32:05Z

we could use

cs = (sqrt(mueff) + 2) / (sqrt(N) + sqrt(mueff) + 3)

Interesting. It does appear to work better in general on the initial problem. In practice it seems to report the low spike in y to the first couple of steps of the algorithm. With initial sigma0=1 this early spike is so acute that it triggers the stopping criteria (e.g. going < 1e-8).

Same experiments as before:

EDIT: fixed the log-scale

beniz · 2015-01-12T11:32:53Z

Of course in any case the graphs for the current best might look better (depending on what exactly is plotted), but this is rather of cosmetic nature.

Yes, expected that as well. The bumps in the curves might simply be replaced by flat (reinjected) values.

nikohansen · 2015-01-12T11:54:51Z

very interesting and somewhat surprising, looks almost like problem virtually solved, given we see about 1e4 evaluations in 1M-D. We probably could still improve quite a bit with TPA (I will try to find some time to cross check this with my implementation, though in smaller dimension).

EDIT: it was 5K-D not 1M-D, so no conclusion for can be drawn for the latter case.

beniz · 2015-01-12T12:12:54Z

very interesting and somewhat surprising, looks almost like problem virtually solved, given we see about 1e4 evaluations in 1M-D.

Hum, if you are referring to the change of csigma 'horizon' last graph reports in 5K-D unfortunately. I've tried in 1M-D with the updated csigma, and while the spike is closer to the optima, the rebound appears to jump much higher. ~~I will post a graph of a re-run~~. Here:

with sigma (green) and condition number (red):

I will focus on the TPA scheme as a next step.

Also, I did run into this yesterday, not that I believe it may be of direct help here, but FTR:
http://www.reddit.com/r/MachineLearning/comments/2rzmj4/3_new_adaptive_sgd_papers_in_iclr_2015/

EDIT: graphs, sigma seems to move faster now.

beniz · 2015-01-14T15:34:17Z

Below is a run with TPA (see #88 for updates), it is to be compared with the set of runs on fsphere in 5K from two days ago.

It is reproducible from branch 'tpa_88' with:

./test_functions -fname fsphere -alg sepcmaes -ftarget 1e-8 -dim 50000 -x0 1 -sigma0 0.1 -no_stagnation -with_gradient -tpa

beniz · 2015-01-14T15:56:06Z

And this is the 1M-D run with TPA, to be compared with the last 1M-D run from two days ago.

nikohansen · 2015-01-14T16:26:52Z

The convergence speed seems rather independent of dimension. Do I interpret this correctly? Generally: could you add an x-annotation, such that it is clear whether it is evaluations or iterations?

nikohansen · 2015-01-14T16:43:23Z

I suggest two experiments: (1) setting c_sigma=1 (instead 0.3), to check whether this is the source of the oscillations. (2) setting d_sigma=10 (instead of sqrt(dimension)), and (2a) maybe both at the same time.

beniz · 2015-01-14T17:08:15Z

Generally: could you add an x-annotation, such that it is clear whether it is evaluations or iterations?

Yes, I've committed a fix. This is all evaluations except on the two initial plots that the ticket was created from and the multi-plots with various values of sigma. Sorry about that, not using the same tools.

EDIT: fix

nikohansen · 2015-01-14T17:21:47Z

When I run TPA-ES (without CMA, with gradient injection) in dimension 5K with my Python code, I get:

and with d_sigma=5:

In particular I don't see these oscillations. Did you check that it small dimension everything is fine?

beniz · 2015-01-14T17:28:29Z

I am puzzled, in 5K it looks worse (than with CSA and modified damping), in 1M it looks better. Also seems the convergence speed rather independent of dimension. Do I interpret this correctly?

My understanding is that it is much better with TPA in both cases.

For clarity, below are the two plots in 5K-D with sep-CMA-ES:

with csigma for high dimension:
with TPA:

EDIT: reproduce TPA on 5K-D from branch 'tpa_88' with:

./test_functions -fname fsphere -alg sepcmaes -ftarget 1e-8 -dim 50000 -x0 1 -sigma0 0.1 -no_stagnation -with_gradient -tpa

nikohansen · 2015-01-14T17:35:18Z

Agreed. The oscillations are worrying, they might indicate a bug.

beniz · 2015-01-14T17:46:11Z

In particular I don't see these oscillations.

My code is young, there could be a bug in current TPA. I was not worried about the oscillations because they looked just like the mitigation of the original problem... will continue to investigate.

EDIT: also, I am already making sure that the gradient injection does not mix with the points used for the TPA.

beniz · 2015-01-14T17:51:41Z

Here TPA / dsigma=5 / 5K-D / sepcma / gradient injection:

EDIT: could oscillations be due to the factor to z ? I am using the E||N(0,I)||.

nikohansen · 2015-01-14T18:51:31Z

Cross-check whether they are related to dimension, i.e. run in 20-D. Then see above suggested experiment with c_sigma=1. I cannot remember to have ever seen them and I don't see how they could be related to a certain normalization.

nikohansen · 2015-01-14T20:33:30Z

FYI, I get something very similar as you if I set d_sigma to 1.

beniz · 2015-01-16T20:26:58Z

FYI, I get something very similar as you if I set d_sigma to 1.

Right to the point. I did fix the init of d_sigma. The bug was making use of the CMA value of d_sigma. I now can reproduce your code's results (branch 'tpa_88').

I did investigate several little changes in the scaling of z points and d_sigma, but did not find anything very exciting. Only that changing d_sigma sometimes highly affects the convergence speed.

I am wondering if it'd make sense to have a dedicated default value of d_sigma when gradient injection is active.

Also, I believe I will be able to close this ticket. For now the default with CMA is to activate the update value of csigma above 1000-D.

Finally, I can run tests in high dimension for TPA to see how best to set d_sigma in this case. I will report about this in ticket #88.

beniz · 2015-01-19T09:35:38Z

FTR, sepacmaes / 5K-D/ TPA / Gradient injection:

dsigma default:
dsigma=5:

beniz · 2015-01-19T10:24:54Z

I'm bringing both the new csigma value (active above 1000-D) to 'dev' branch (readying it for next release).

I will close once tpa makes it as well to 'dev' branch.

nikohansen · 2015-01-20T11:00:24Z

I did a few experiments and

d_sigma = 4 - 3.6 / sqrt(dim)

seems a valid choice for TPA, which (a) reflects the experiments in the original paper up to 200-D about as well as sqrt(dim) and (b) is a more sensible choice with popsize=dim or with gradient injections, because it reflects the possible convergence rate in this case.

nikohansen · 2015-01-20T11:07:35Z

For the record, I don't see anything wrong making

cs = (sqrt(mueff) + 2) / (sqrt(N) + sqrt(mueff) + 3)

the general default setting. AFAIK there is no empirical or theoretical evidence that ~ N^-1/2 is a better or worse choice than ~ N^-1 other than what we have found here.

beniz · 2015-01-20T11:09:49Z

Thanks. FYI I have a set of experiments still running for determining best d_sigma in high dimensions. I will post the plots in the tpa ticket for cma/sepcma with/without gradient injection. This will allow to make sure we get similar plots as well.

beniz · 2015-01-20T11:11:00Z

AFAIK there is no empirical or theoretical evidence that ~ N^-1/2 is a better or worse choice than ~ N^-1 other than what we have found here.

OK, well noted. I've noticed the other day that it was the value already used in VD-CMA.

nikohansen · 2015-01-20T11:34:59Z

Good point, the underlying reasoning is that in default CMA we have c_mu ~ N^-2, while in VD-CMA, and sep-CMA we have c_mu ~ N^-1 which suggests to also reduce the time horizon for the cumulation (cc and cs) to keep a gap. I use

cc = (1 + 1 / N + mueff / N) / (N**0.5 + 1 / N + 2 * mueff / N)

in the sepCMA case in the Python code. Unfortunately, I can't give a reasoning for the different mueff-corrections for cc and cs out of my head.

beniz · 2015-02-04T09:14:47Z

I'm trying to wrap this ticket up, and my understanding is that this issue is solved by TPA.

Taking the initial problem of sphere in 1M-D with sep-aCMA-ES and gradient injection, and given the change to cmu operated in #88 for sep without TPA as well as the new default of TPA's dsigma (#97 (comment)), here are the current results:

CMA

Reproduced with:

./test_functions -fname fsphere -alg sepacmaes -ftarget 1e-8 -dim 1000000 -with_gradient -no_stagnation -max_hist 1000 -x0 1 -sigma0 0.1

TPA

Reproduced with:

./test_functions -fname fsphere -alg sepacmaes -ftarget 1e-8 -dim 1000000 -with_gradient -no_stagnation -max_hist 1000 -x0 1 -sigma0 0.1 -tpa 2

The oscillations in the latter (TPA) are observed here in the case of 1M-D but not in the case of 5K-D (I've checked again just right now). Not sure is this qualifies as an issue.

EDIT: added mention of gradient injection

beniz · 2015-02-20T11:46:30Z

Solved with TPA, closing for now.

beniz added optimization bug labels Jan 8, 2015

beniz mentioned this issue Jan 9, 2015

Two-points adaptation for CMA-ES #88

Closed

beniz pushed a commit that referenced this issue Jan 10, 2015

elitist-x0 + sigma step-control, ref #97

c6cf801

beniz pushed a commit that referenced this issue Jan 13, 2015

for problems above 1000-D, use slightly different csigma, ref #97

ce9c431

beniz pushed a commit that referenced this issue Jan 20, 2015

for problems above 1000-D, use slightly different csigma, ref #97

534c05f

beniz closed this as completed Feb 20, 2015

beniz mentioned this issue Feb 20, 2015

in dimension above 38 or so, gradient injection on sep-CMA triggers stagnation criteria #96

Closed

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

for problems above 1000-D, use slightly different csigma, ref CMA-ES#97

f9b2120

sep-(a)CMA-ES does escape the solution basin on fsphere in high dimension with gradient injection #97

sep-(a)CMA-ES does escape the solution basin on fsphere in high dimension with gradient injection #97

Comments

beniz commented Jan 8, 2015

nikohansen commented Jan 8, 2015

beniz commented Jan 8, 2015

loshchil commented Jan 8, 2015

beniz commented Jan 8, 2015

beniz commented Jan 8, 2015

nikohansen commented Jan 9, 2015

beniz commented Jan 9, 2015

nikohansen commented Jan 9, 2015

nikohansen commented Jan 9, 2015

beniz commented Jan 9, 2015

beniz commented Jan 10, 2015

beniz commented Jan 10, 2015

beniz commented Jan 10, 2015

beniz commented Jan 10, 2015

beniz commented Jan 10, 2015

nikohansen commented Jan 10, 2015

beniz commented Jan 10, 2015

nikohansen commented Jan 10, 2015

nikohansen commented Jan 10, 2015

nikohansen commented Jan 10, 2015

beniz commented Jan 12, 2015

beniz commented Jan 12, 2015

beniz commented Jan 12, 2015

nikohansen commented Jan 12, 2015

beniz commented Jan 12, 2015

beniz commented Jan 12, 2015

nikohansen commented Jan 12, 2015

beniz commented Jan 12, 2015

beniz commented Jan 14, 2015

beniz commented Jan 14, 2015

nikohansen commented Jan 14, 2015

nikohansen commented Jan 14, 2015

beniz commented Jan 14, 2015

nikohansen commented Jan 14, 2015

beniz commented Jan 14, 2015

nikohansen commented Jan 14, 2015

beniz commented Jan 14, 2015

beniz commented Jan 14, 2015

nikohansen commented Jan 14, 2015

nikohansen commented Jan 14, 2015

beniz commented Jan 16, 2015

beniz commented Jan 19, 2015

beniz commented Jan 19, 2015

nikohansen commented Jan 20, 2015

nikohansen commented Jan 20, 2015

beniz commented Jan 20, 2015

beniz commented Jan 20, 2015

nikohansen commented Jan 20, 2015

beniz commented Feb 4, 2015

beniz commented Feb 20, 2015