refactor output stream and / or python plotting script, with conversion format for other plotting scripts outside the lib #110

beniz · 2015-01-30T14:17:03Z

A few issues:

regular and surrogates default output streams differ and thus at the moment require plotting with different scripts;
there are existing plotting scripts with nice visual goodies, and it would make sense to format the output for them

EDIT: relevant comment, #106 (comment)

nikohansen · 2015-01-30T17:42:52Z

I volunteer to move the plotting functionality from the cma.py code into a stand alone python module.

beniz · 2015-01-30T20:03:01Z

Thanks!

It can also be the right moment to decide upon a standard format for CMA output. Typically what is the reason for using a file per subfigure as you described in #106 (comment) ?

At the moment the multiple file design does not fit well with the libcmaes output model. The lib allows for custom and distinct progress and output functions, with defaults provided anyways. The output function writes to a single output stream and in terms of design and performances I'd favor keeping it this way. But there may be other elements to consider as well.

nikohansen · 2015-01-30T22:25:35Z

I am open to any considerations about the best format. The many-files format is, I agree, a little ugly, but makes reading in vectors with unknown length simple, in particular when future extension cannot be ruled out (for example, I added recently an output file for the eigenvalues of the correlation matrix). Otherwise one needs to have a syntax or special format to discover the dimension and possibly identify groups.

I am a little stuck with the described format, because all of my code complies with it (5 implementations of CMA-ES and 3 implementations of plotting the data). I am not likely to change all eight implementations unless for a very good reason. It should be simple though to write a transformer one-to-many-files and/or the other way around.

beniz · 2015-01-31T10:41:33Z

Understood. It is possible to describe the format in a generic higher level language and have it parsed the way we want to one or many files and back into memory. One widely spread tool for defining structured format across most platforms and languages are 'protocol buffers', https://code.google.com/p/protobuf/ (and Python tutorial https://developers.google.com/protocol-buffers/docs/pythontutorial)

Format descriptions are independent of language and platform, and provide objects to be filled out into memory and written to file. The format is evolutive and accomodates optional as well as new variables and structures without breaking compatibility.

I am totally familiar with protocol buffers and can make a format description proposal that would match the legacy one (yours) while retaining the ability to choose between one or more files, as well as to re-acquire the data without need to bother about the number of columns etc...

In short, the description for the first few columns of you outcmaesfit.dat file could be something like:

message CMAFitLine {
  required int32 iteration = 1;
  required double feval = 2;
  required double sigma = 3;
}

message CMAFit {
  repeated CMAFitLine fitline = 1;
}

One drawback is that in serialized (and possibly compressed) form the data file would not be human readable anymore, though some implementations do support writing raw data. The serialized form would be a plus in high dimensions however.

nikohansen · 2015-01-31T10:58:26Z

How do you describe a field with variable variable length / number of data?

nikohansen · 2015-01-31T11:08:35Z

I am generally not quite in favor of writing encoded/compressed data. The use case to have a quick look at the data file is just too common.

beniz · 2015-01-31T11:09:54Z

The use case to have a quick look at the data file is just too common.

Totally agreed, this is one thing I'd need to check per implementation. Also, not saying we must go down this road, just a proposal at this stage.

beniz · 2015-01-31T11:10:54Z

How do you describe a field with variable variable length / number of data?

The repeated keyword does this, as in the very short example above. The size is then dynamically obtained from the in-memory object obtained from parsing the file.

beniz · 2015-02-04T13:28:37Z

Below is a first proposal for an extendable output format based on protocol buffers. The following decisions & assumptions apply:

ability to read / write with protocol buffer code (i.e. one line of code) in serialized form
ability to write in non-serialized form, but losing the ability to read it back without custom code
usage of a column-based storage, i.e. every array (keyword repeated) stores values in time, as opposed to a row-based representation as in the previous tiny 'sketch'. This is open for discussion of course, I simply deemed it more practical for plotting (because getting a full vector at once instead of having to read everything back line by line)
removed the void and 0 in your legacy format since I understand they are used to mark the beginning of the vector entries. However they can easily be added back when writing the output in human readable form
put sigma in header even if not present in outcmaesxmean.dat, just thought it'd make better sense
ability to extend format with custom output (see example with accuracy, e.g. optimizing for a machine learning application)

Here is the format proposal:

message Header
{
 repeated int32 iteration = 1;
 repeated int32 evaluation = 2;
 repeated double sigma = 3;
 optional int32 seed = 4;
 optional string date = 5;
}

message CMAFit
{
 optional Header head = 1;
 repeated double axis_ratio = 3;
 repeated double bestever = 4;
 repeated double best = 5;
 repeated double median = 6;
 repeated double worst = 7;
 repeated double more_data = 8; // XXX: or use extensions
}

message CMAXRecentBest
{
 optional Header head = 1;
 repeated double fitness = 3;
 repeated double xbest = 4;
}

message CMAXMean
{
 optional Header head = 1;
 repeated XMean xmean = 2;
}

message CMAAXLen
{
 optional Header head = 1;
 repeated double max_axis_length = 3;
 repeated double min_axis_length = 4;
 repeated SqrtEigenVals all_axes_length = 5;
}

message CMAStdDev
{
 optional Header head = 1;
 repeated Stds stds = 3;
}

message XMean
{
 repeated double x = 1;
}
message SqrtEigenVals
{
 repeated double sqrteigenval = 1;
}

message Stds
{
 repeated double std = 1;
}

message LegacyCMAOutput
{
 required CMAFit fit = 1;
 required CMAXRecentBest recentbest = 2;
 required CMAXMean xmean = 3;
 required CMAAXLen axlen = 4;
 required CMAStdDev std = 5;
}

message UniqueCMAOutput
{
 required Header head = 1;
 required CMAFit fit = 2;
 required CMAXRecentBest recentbest = 3;
 required CMAXMean xmean = 4;
 required CMAAXLen axlen = 5;
 required CMAStdDev std = 6;
 extensions 100 to 150; // for custom output additions
}

and example of a custom extension:

import "out.proto";

extend UniqueCMAOutput
{
 repeated double accuracy = 100;
}

Besides discussion, corrections and improvements, a next step could be for me to open a new independent git repository with support for the output format, protocol buffers with Python and C++code procedures for using the format in typical CMA implementation.

nikohansen · 2015-02-05T12:35:10Z

I spotted two possible additions:

The Header could have optionally the dimension
CMAXMean could have optionally the fitness of the mean

I wouldn't put sigma in the header, it fits best into CMAStdDev and second in CMAAXLen. The main reason why the legacy has sigma and axis_ratio in CMAFit is that they do not depend on dimension. That is, there is a single file holding all possibly relevant data that do not depend on dimension and are therefore easy to manage also with very large dimension.

Having max_axis_length and min_axis_length instead of axis_ratio is maybe better. The legacy has (a) chosen data which are often plotted without further processing, that is, I can do the plotting in two Python lines or so and (b) adheres to the (weird) first-five-columns-are-meta-data rule also for CMAFit. That's why we see axis_ratio.

I guess my concern about human readability remains.

beniz · 2015-02-05T14:37:14Z

I guess my concern about human readability remains.

A human readable output could be worked out for both the single and multiple files formats. In this case, of course one of the only remaining advantages of a structured format such as the one above is the clarity within the code.

Taking a look at the future, a few points that could be taken into consideration in the present discussion:

ability to store the search state of the optimizer in serialized form (e.g. in-memory)
ability to exchange search states across machines for distributed computations
ability to reuse search states in other applications, including optimizers.

Probably there's no full use for this in a very near future, but I am considering ways that it could be a building block of later use.

beniz · 2015-02-13T15:35:48Z

The extendable output format comes with some difficulties, one of which is to keep the ability to serialize it to disk incrementally, i.e. without keeping the full history object into memory. I believe this is a good thing to have in the mid-term, but that right now, there's enough to do to not introduce such a big piece of code immediately. Plus I'd like to release the series of bug fixes as a new release.

Therefore, I am implementing a first path to fulfilling this issue as follows:

in-lib capability of a 'full' output function that writes to a single file all data required by the legacy format (i.e. worst candidate, best ever, ...);
a python script containing a conversion function from a single file to the multiple files of the legacy format.

This should allow to work along with the minimal Python workflow of #116.

INFO: in 'legacy_106' branch (106 is a mistake, should have called it 116 or 110...)

…rt the legacy format to plotting CMA-ES results + worst candidate, ref #110

…equested, ref #110

nikohansen · 2015-02-14T10:54:44Z

one of which is to keep the ability to serialize it to disk incrementally

I assume this doesn't prevent us to monitor a run online, right? In practice, this is what I always do, unless the objective function is extremely cheep (which is virtually never the case).

In general, if the output format will not allow incremental writing, I have doubts that it will ever meet the performance objectives you have for the library.

beniz · 2015-02-14T11:11:19Z

Yes this is correct, the output to multiple files will not be incremental for now, precluding the online monitoring with the legacy plotting functions. This is until the full generic format gets implemented with incremental serialization to disk.

The reason why I switched to an easier immediate solution yesterday is that for incremental serialization to function properly, the format above needs a full 'line-based' refactoring, which will complicate the plotting code as well. I still believe this is the way to go in the future, but not immediately as I need to focus on more important tasks, such as the profile likelihood in eigenspace.

My immediate target is the simple workflow in Python along with (not online) legacy plotting capability so that results can be more easily compared across implementations.

Let me know if you believe this is not a good intermediate decision.

nikohansen · 2015-02-14T11:17:45Z

But, if I understand correctly then, that prevents online monitoring altogether, e.g. of a remote job?

beniz · 2015-02-14T11:21:56Z

Depends if I can get the one to multiple file conversion script to work on a partially filled output. Should be able to though...

…e files as required by legacy format, ref #110

beniz · 2015-02-16T16:10:27Z

Added a legacy format conversion tool to branch 'legacy_106', as python/cma_legacyplt.py. It has a convert function, and can be used as well as:

python cma_legacyplt.py ros_full.dat

where ros_full.dat is obtained with:

./tests/test_functions -fname rosenbrock -dim 20 -full_fplot -fplot ros_full.dat

At the moment, I am able to plot from the converted files with the plotcmaesdat script for Octave, but not from the Python code, i.e. with

import cma
cma.plot()

which yields the following error:

WARNING (module=cma, class=CMADataLogger, method=load):  reading from file "outcmaesaxlencorr.dat" failed
WARNING (module=cma, class=CMADataLogger, method=load):  no data for outcmaesaxlencorr.dat
/home/beniz/research/siminole/dev/libcmaes/python/cma.py:6403: RuntimeWarning: invalid value encountered in less
  dfit[dfit < 1e-98] = np.NaN
/home/beniz/research/siminole/dev/libcmaes/python/cma.py:6430: RuntimeWarning: invalid value encountered in less
  sgn[np.abs(dat.f[:, 5]) < 1e-98] = 0
/home/beniz/research/siminole/dev/libcmaes/python/cma.py:6431: RuntimeWarning: invalid value encountered in less
  idx = np.where(sgn < 0)[0]
/home/beniz/research/siminole/dev/libcmaes/python/cma.py:6437: RuntimeWarning: invalid value encountered in less
  start_idx = 1 + np.where((dsgn < 0) * (sgn[1:] < 0))[0]
/home/beniz/research/siminole/dev/libcmaes/python/cma.py:6438: RuntimeWarning: invalid value encountered in greater
  stop_idx = 1 + np.where(dsgn > 0)[0]
Traceback (most recent call last):
  File "test_legacy.py", line 2, in <module>
    cma.plot()
  File "/home/beniz/research/siminole/dev/libcmaes/python/cma.py", line 6795, in plot
    x_opt, fontsize)
  File "/home/beniz/research/siminole/dev/libcmaes/python/cma.py", line 6153, in plot
    self.plot_divers(iabscissa, foffset)
  File "/home/beniz/research/siminole/dev/libcmaes/python/cma.py", line 6491, in plot_divers
    text(dat.f[idx, iabscissa][-1], dfit[idx][-1],
IndexError: index out of bounds

…lot output stream, ref #110

nikohansen · 2015-02-16T16:59:47Z

For some reason I have no ./tests/test_functions (anymore).

beniz · 2015-02-16T17:09:28Z

The only logical explanation would be that you are missing gflags and therefore test_functions doesn't get built.

nikohansen · 2015-02-16T17:33:15Z

Right, how can I reproduce the problem then? If you can provide ros_full.dat, I should be fine.

…function, ref #110

beniz · 2015-02-16T17:43:52Z

Now, you can do it with python with p.set_full_fplot(True) where p is a CMAParametersXX object (XX=NB, PB, ...). This is on branch legacy_106.

nikohansen · 2015-02-16T18:15:38Z

OK, the reason for the failure is that the first f-value is nan. I will prepare a fix and also fix the runtime warnings.

nikohansen · 2015-02-16T18:29:43Z

The median and largest f-value are both 0.0 in iteration 0, which I would consider to be a semi-bug.

nikohansen · 2015-02-16T18:41:37Z

On the contrary, the axis ratio in iteration 0 could rather be 1.0 instead of nan (this was the reason for one of the runtime warnings).

nikohansen · 2015-02-16T23:54:40Z

Fix for plotting with Python is available here.

… is computed, ref #110

beniz · 2015-02-17T09:52:38Z

The two commits above do fix the initial values for initial median, worst and condition number.

I've tested the new plotting script, and it works just fine, though I am still experiencing the plot to disappear too quickly to be seen. Here is what I do:

import cmaplt
cmaplt.plot()

In order to see the plot, I do remove the pyplot.ion() call at https://github.com/CMA-ES/plotting-cma-data/blob/master/src/cmaplt.py#L1071

nikohansen · 2015-02-17T15:16:09Z

I can't reproduce this and don't quite understand why this is the case :-( Does cmaplt.pyplot.show() or cmaplt.pyplot.gcf() have any effect? For the latter, do you see a figure coming up and what is the figure number in the window? Does cmaplt.pyplot.ioff(); cmaplt.pyplot.show() work (after cmaplt.plot())?

beniz · 2015-02-23T07:14:08Z

So, in lcmaes_interface.py, replacing the plot high level function https://github.com/beniz/libcmaes/blob/dev/python/lcmaes_interface.py#L83 with the one below does the trick (no other combination would work on my machine):

def plot(file=None):
    cmaplt.plot(file if file else fplot_current)
    cmaplt.pylab.ioff() 
    cmaplt.pylab.show()

The same applies to simple.py, so I'll commit this change.

…dow when plotting from Python, ref #110

…at for CMA-ES, ref #110

beniz · 2015-02-23T07:41:49Z

The legacy format generator and conversion tools are now in 'dev' branch, ready for next release. I've tested again against your new cmplt.py and it works just fine.

I've added a python/README.legacy file with short explanations on how to plot in legacy format. I believe this is mostly for comparison with other existing implementations and because the graphs are a bit more informed and beautiful.

Unless there are other details, I believe this ticket should be fulfilled for now.

…rt the legacy format to plotting CMA-ES results + worst candidate, ref CMA-ES#110

…equested, ref CMA-ES#110

…lot output stream, ref CMA-ES#110

… is computed, ref CMA-ES#110

…S#110

beniz added enhancement python labels Jan 30, 2015

beniz mentioned this issue Jan 30, 2015

Efficiency of surrogates for higher dim #106

Closed

beniz mentioned this issue Feb 12, 2015

simple workflow under python #116

Open

beniz added the in progress label Feb 13, 2015

beniz pushed a commit that referenced this issue Feb 13, 2015

added support for a 'full' output function to file, in order to suppo…

736a179

…rt the legacy format to plotting CMA-ES results + worst candidate, ref #110

beniz pushed a commit that referenced this issue Feb 13, 2015

fixed wrapper override of the plotting function when full output is r…

b40b168

…equested, ref #110

beniz self-assigned this Feb 14, 2015

beniz pushed a commit that referenced this issue Feb 16, 2015

added legacy plotting conversion script: converts from one to multipl…

1702158

…e files as required by legacy format, ref #110

beniz pushed a commit that referenced this issue Feb 16, 2015

removed debug in legacy conversion script, ref #110

425bbd1

beniz pushed a commit that referenced this issue Feb 16, 2015

allow to separate date from other values in header of full (legacy) p…

440814b

…lot output stream, ref #110

beniz pushed a commit that referenced this issue Feb 16, 2015

allowing full legacy output from Python bindings with set_full_fplot …

d63d5a5

…function, ref #110

beniz pushed a commit that referenced this issue Feb 17, 2015

default candidate fvalue is now NaN until an objective function value…

107b0b3

… is computed, ref #110

beniz pushed a commit that referenced this issue Feb 17, 2015

initial condition number value is now 1 in plotting output, ref #110

1def7a6

beniz pushed a commit that referenced this issue Feb 23, 2015

working around the pyplot.ion() oddity that flickers the plotting win…

486837f

…dow when plotting from Python, ref #110

beniz pushed a commit that referenced this issue Feb 23, 2015

added README.legacy with explanations on how to plot with legacy form…

1a8ac08

…at for CMA-ES, ref #110

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

added support for a 'full' output function to file, in order to suppo…

da43049

…rt the legacy format to plotting CMA-ES results + worst candidate, ref CMA-ES#110

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

fixed wrapper override of the plotting function when full output is r…

4723ef5

…equested, ref CMA-ES#110

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

allow to separate date from other values in header of full (legacy) p…

913a222

…lot output stream, ref CMA-ES#110

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

default candidate fvalue is now NaN until an objective function value…

8523e2b

… is computed, ref CMA-ES#110

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

initial condition number value is now 1 in plotting output, ref CMA-E…

747babf

…S#110

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor output stream and / or python plotting script, with conversion format for other plotting scripts outside the lib #110

refactor output stream and / or python plotting script, with conversion format for other plotting scripts outside the lib #110

beniz commented Jan 30, 2015

nikohansen commented Jan 30, 2015

beniz commented Jan 30, 2015

nikohansen commented Jan 30, 2015

beniz commented Jan 31, 2015

nikohansen commented Jan 31, 2015

nikohansen commented Jan 31, 2015

beniz commented Jan 31, 2015

beniz commented Jan 31, 2015

beniz commented Feb 4, 2015

nikohansen commented Feb 5, 2015

beniz commented Feb 5, 2015

beniz commented Feb 13, 2015

nikohansen commented Feb 14, 2015

beniz commented Feb 14, 2015

nikohansen commented Feb 14, 2015

beniz commented Feb 14, 2015

beniz commented Feb 16, 2015

nikohansen commented Feb 16, 2015

beniz commented Feb 16, 2015

nikohansen commented Feb 16, 2015

beniz commented Feb 16, 2015

nikohansen commented Feb 16, 2015

nikohansen commented Feb 16, 2015

nikohansen commented Feb 16, 2015

nikohansen commented Feb 16, 2015

beniz commented Feb 17, 2015

nikohansen commented Feb 17, 2015

beniz commented Feb 23, 2015

beniz commented Feb 23, 2015

refactor output stream and / or python plotting script, with conversion format for other plotting scripts outside the lib #110

refactor output stream and / or python plotting script, with conversion format for other plotting scripts outside the lib #110

Comments

beniz commented Jan 30, 2015

nikohansen commented Jan 30, 2015

beniz commented Jan 30, 2015

nikohansen commented Jan 30, 2015

beniz commented Jan 31, 2015

nikohansen commented Jan 31, 2015

nikohansen commented Jan 31, 2015

beniz commented Jan 31, 2015

beniz commented Jan 31, 2015

beniz commented Feb 4, 2015

nikohansen commented Feb 5, 2015

beniz commented Feb 5, 2015

beniz commented Feb 13, 2015

nikohansen commented Feb 14, 2015

beniz commented Feb 14, 2015

nikohansen commented Feb 14, 2015

beniz commented Feb 14, 2015

beniz commented Feb 16, 2015

nikohansen commented Feb 16, 2015

beniz commented Feb 16, 2015

nikohansen commented Feb 16, 2015

beniz commented Feb 16, 2015

nikohansen commented Feb 16, 2015

nikohansen commented Feb 16, 2015

nikohansen commented Feb 16, 2015

nikohansen commented Feb 16, 2015

beniz commented Feb 17, 2015

nikohansen commented Feb 17, 2015

beniz commented Feb 23, 2015

beniz commented Feb 23, 2015