-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor output stream and / or python plotting script, with conversion format for other plotting scripts outside the lib #110
Comments
I volunteer to move the plotting functionality from the |
Thanks! It can also be the right moment to decide upon a standard format for CMA output. Typically what is the reason for using a file per subfigure as you described in #106 (comment) ? At the moment the multiple file design does not fit well with the libcmaes output model. The lib allows for custom and distinct progress and output functions, with defaults provided anyways. The output function writes to a single output stream and in terms of design and performances I'd favor keeping it this way. But there may be other elements to consider as well. |
I am open to any considerations about the best format. The many-files format is, I agree, a little ugly, but makes reading in vectors with unknown length simple, in particular when future extension cannot be ruled out (for example, I added recently an output file for the eigenvalues of the correlation matrix). Otherwise one needs to have a syntax or special format to discover the dimension and possibly identify groups. I am a little stuck with the described format, because all of my code complies with it (5 implementations of CMA-ES and 3 implementations of plotting the data). I am not likely to change all eight implementations unless for a very good reason. It should be simple though to write a transformer one-to-many-files and/or the other way around. |
Understood. It is possible to describe the format in a generic higher level language and have it parsed the way we want to one or many files and back into memory. One widely spread tool for defining structured format across most platforms and languages are 'protocol buffers', https://code.google.com/p/protobuf/ (and Python tutorial https://developers.google.com/protocol-buffers/docs/pythontutorial) Format descriptions are independent of language and platform, and provide objects to be filled out into memory and written to file. The format is evolutive and accomodates optional as well as new variables and structures without breaking compatibility. I am totally familiar with protocol buffers and can make a format description proposal that would match the legacy one (yours) while retaining the ability to choose between one or more files, as well as to re-acquire the data without need to bother about the number of columns etc... In short, the description for the first few columns of you
One drawback is that in serialized (and possibly compressed) form the data file would not be human readable anymore, though some implementations do support writing raw data. The serialized form would be a plus in high dimensions however. |
How do you describe a field with variable variable length / number of data? |
I am generally not quite in favor of writing encoded/compressed data. The use case to have a quick look at the data file is just too common. |
Totally agreed, this is one thing I'd need to check per implementation. Also, not saying we must go down this road, just a proposal at this stage. |
The |
Below is a first proposal for an extendable output format based on protocol buffers. The following decisions & assumptions apply:
Here is the format proposal:
and example of a custom extension:
Besides discussion, corrections and improvements, a next step could be for me to open a new independent git repository with support for the output format, protocol buffers with Python and C++code procedures for using the format in typical CMA implementation. |
I spotted two possible additions:
I wouldn't put Having I guess my concern about human readability remains. |
A human readable output could be worked out for both the single and multiple files formats. In this case, of course one of the only remaining advantages of a structured format such as the one above is the clarity within the code. Taking a look at the future, a few points that could be taken into consideration in the present discussion:
Probably there's no full use for this in a very near future, but I am considering ways that it could be a building block of later use. |
The extendable output format comes with some difficulties, one of which is to keep the ability to serialize it to disk incrementally, i.e. without keeping the full history object into memory. I believe this is a good thing to have in the mid-term, but that right now, there's enough to do to not introduce such a big piece of code immediately. Plus I'd like to release the series of bug fixes as a new release. Therefore, I am implementing a first path to fulfilling this issue as follows:
This should allow to work along with the minimal Python workflow of #116. INFO: in 'legacy_106' branch (106 is a mistake, should have called it 116 or 110...) |
…rt the legacy format to plotting CMA-ES results + worst candidate, ref #110
I assume this doesn't prevent us to monitor a run online, right? In practice, this is what I always do, unless the objective function is extremely cheep (which is virtually never the case). In general, if the output format will not allow incremental writing, I have doubts that it will ever meet the performance objectives you have for the library. |
Yes this is correct, the output to multiple files will not be incremental for now, precluding the online monitoring with the legacy plotting functions. This is until the full generic format gets implemented with incremental serialization to disk. The reason why I switched to an easier immediate solution yesterday is that for incremental serialization to function properly, the format above needs a full 'line-based' refactoring, which will complicate the plotting code as well. I still believe this is the way to go in the future, but not immediately as I need to focus on more important tasks, such as the profile likelihood in eigenspace. My immediate target is the simple workflow in Python along with (not online) legacy plotting capability so that results can be more easily compared across implementations. Let me know if you believe this is not a good intermediate decision. |
But, if I understand correctly then, that prevents online monitoring altogether, e.g. of a remote job? |
Depends if I can get the one to multiple file conversion script to work on a partially filled output. Should be able to though... |
…e files as required by legacy format, ref #110
Added a legacy format conversion tool to branch 'legacy_106', as
where
At the moment, I am able to plot from the converted files with the import cma
cma.plot() which yields the following error:
|
For some reason I have no |
The only logical explanation would be that you are missing |
Right, how can I reproduce the problem then? If you can provide |
Now, you can do it with python with |
OK, the reason for the failure is that the first f-value is nan. I will prepare a fix and also fix the runtime warnings. |
The median and largest f-value are both 0.0 in iteration 0, which I would consider to be a semi-bug. |
On the contrary, the axis ratio in iteration 0 could rather be |
Fix for plotting with Python is available here. |
The two commits above do fix the initial values for initial median, worst and condition number. I've tested the new plotting script, and it works just fine, though I am still experiencing the plot to disappear too quickly to be seen. Here is what I do: import cmaplt
cmaplt.plot() In order to see the plot, I do remove the |
I can't reproduce this and don't quite understand why this is the case :-( Does |
So, in def plot(file=None):
cmaplt.plot(file if file else fplot_current)
cmaplt.pylab.ioff()
cmaplt.pylab.show() The same applies to |
…dow when plotting from Python, ref #110
The legacy format generator and conversion tools are now in 'dev' branch, ready for next release. I've tested again against your new cmplt.py and it works just fine. I've added a Unless there are other details, I believe this ticket should be fulfilled for now. |
…rt the legacy format to plotting CMA-ES results + worst candidate, ref CMA-ES#110
…lot output stream, ref CMA-ES#110
A few issues:
EDIT: relevant comment, #106 (comment)
The text was updated successfully, but these errors were encountered: