Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parsable grid data output #52

Open
wants to merge 30 commits into
base: develop
Choose a base branch
from

Conversation

spencerharmon
Copy link
Contributor

The purpose of this patch is to add an option to output the grid state in JSON format each epoch along with the .mfs save files. This output includes a string with data member names and values.
I've built a bit of tooling around this patch and I'm convinced it's a workable solution for easily analyzing logical (non/extra-spatial?) relationships between atoms and changes to systems of atoms within the MFM over time.
There are some improvements which could be made to this feature.
First, it doesn't output base layer state, which would be nice to have.
Second, data member names and values are output as a non-JSON string which requires further parsing. I've made a python implementation of a parser to convert this into standard JSON: mfm-griddata-parser
Third, very long strings representing data member names and values are truncated and some data is lost.

//for some reason, grid.GetAtomInSite doesn't work. Maybe because it's constant? Optimizing happening?
//for some reason, grid.GetAtomInSite doesn't do what we need. Maybe because it's constant? Optimizing happening?
//todo: figure out why that doesn't work. GetAtomInSite has an option to get the base layer, which would
//be nice to have.
T* atom = grid.GetWritableAtom(siteInGrid);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'd think 'const T* atom = grid.getAtomInSite(getFromBase, siteInGrid);' What goes wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check this out tonight. I made that change initially back in November, so it escapes me what happens in that case. I'd love to have another option to export the base layer or append it in a separate list in the same document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get a chance to check on this tonight after all, but I'll check it out tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally got to check this out. You're right: GetAtomInSite works as expected. Latest commit adds base layer to JSON output!

const Element<EC> * e = grid.LookupElement(t);
const UlamElement<EC> * uelt = e->AsUlamElement();

//todo: are there any alternatives to the string un this buffer that contain the names of the data members and their value at current epoch?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider adding like
PRINT_FORMAT_JSON = 0x00001000, //< Format data members in JSON
at UlamClass.h:71 or so (hmm and at DebugUtils.ulam:20 or so in ULAM),
then checking for PRINT_FORMAT_JSON like around UlamClass.tcc:103 and 110.
Not super easy but simplifies and flattens the JSON

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it drops the need for anyone else to have to use the Python deserializer I wrote or otherwise implement their own, I think it's worth it. I'll see if I can pick up on what you're suggesting when I get to this tonight.
I assume the idea would be to add an additional method in order to not mess up the display of data member values in the mfms gui; is that the right way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new commits are in the spirit of this. I added this flag and added a condition for this in UlamClass.tcc. It could be prettier since I repeated most of what's already there. I can clean this up, but the resulting output looks closer to what I was hoping for in the first place.

@@ -1244,7 +1243,7 @@ namespace MFM
"\"symbol\":\"%s\", "
"\"name\":\"%s\", "
"\"argb\":%d, "
"\"data_string\":\"%s\"}"
"\"data_members\":%s}"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this Printf was broken into two parts -- the part up through "data_string":, and the part after the %s there -- then the OString buff could be avoided entirely. (Admittedly it was always gross.) fs.Printf(FRONT_PART); then uelt->Print(ucr,fs,*atom,ETC); then fs.Printf(BACK_PART). Passing fs to uelt->Print instead of buff.

If we're pretty confident OString2048 will do the job, given how much expansion we might get printing one atom in JSON, maybe not such an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could begin to speculate the max character length of the data members object, but chances are I'd be wrong.
Funny enough, I left a todo in the latest commit about converting the event_layer_atoms and base_layer_atoms lists into char buffers (up to the number of sites in the grid times the max length of an atom long!). I didn't do it because I'd want to use one of these OverflowableCharBufferByteSync templates with a buffsize that scales with the grid size (though I imagine this might kill compile times so it may be not worth it for that reason alone).
The problems were I was hoping to solve with this approach were:

  1. to avoid excessive fopen/fclose operations by caching the whole JSON document in memory before streaming the whole thing to disk in one go. I mean, I'm running a rather large tmpfs for my /tmp, but supposing someone specifies their output to be written to, e.g. a 5500 rpm HDD, I'm wondering if that would unnecessarily increase iowait on the system running the simulation (and the frustration of the programmer using the simulator). And,
  2. so that I can grab non-empty atoms from the event layer and base layer in one pass.

I tried digging into the FileByteSync to see if this problem is already solved, but I didn't get to the bottom of it.
I think maybe I'm trying to optimize too much for an unlikely scenario.

tldr; if you think I should drop the buffers altogether, I can pass the FileByteSync object instead.

…uary, it seems. It looks like I was in the process of removing this atombuff buffer. This other data_members thing in the atom printf statement

rings a bell, too. It fixed some corner case, I think. Anyway, I'm trusting myself here because I need to go troubleshoot a Makefile
DaveAckley pushed a commit that referenced this pull request Oct 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants