Rework the .hdf5/.npz model format #99

superbock · 2016-02-29T09:00:59Z

Right now the model format is suited only for RNNs (for historical reasons), but we should extend/rework it in a way it is able to do everything. I hereby propose the following:

we keep .hdf5/.npz as our primary formats, since this works quite well
to flatten a .hdf5 file to .npz (a flat format) we use the names (HDF5_object.name) as keys
objects and functions are stored as a group (arbitrary names are allowed)
the arguments needed to instantiate an object are stored as datasets with the name of the argument
in case a list of objects is needed as an argument, these get nested in a group again (with the argument as the group name)
to determine which class/type an object/function is of we save it as "type" attribute
HDF5 attributes are stored with key "attrs/attribute_name" in the NPZ dictionary

Questions:

how do we store the order of the items in a lists? Should we attach an "id" attribute to the group or rather encode the order in the group name (making it sortable)?
how do we determine if an instance of the class is wanted or just the class/function itself?

Example:

h5py.File
> Group: rnn
  > Attribute: type='madmom.ml.rnn.RecurrentNeuralNetwork'
  > Group: layers
    > Group: hidden_layer_0
      > Attribute: type='madmom.ml.rnn.BidirectionalLayer'
      > Attribute: id=0
      > Group: fwd_layer
        > Attribute: type='madmom.ml.rnn.RecurrentLayer'
        > Dataset: weights
        > Dataset: bias
        > Dataset: recurrent_weights
        > Group: transfer_fn
          > Attribute: type='madmom.ml.rnn.tanh'
      > Group: bwd_layer
        > Attribute: type='madmom.ml.rnn.RecurrentLayer'
        > Dataset: weights
        > Dataset: bias
        > Dataset: recurrent_weights
        > Group: transfer_fn
          > Attribute: type='madmom.ml.rnn.tanh'
    > Group: hidden_layer_1
      > Attribute: type='madmom.ml.rnn.BidirectionalLayer'
      > Attribute: id=1
      > Group: fwd_layer
        > Attribute: type='madmom.ml.rnn.RecurrentLayer'
        > Dataset: weights
        > Dataset: bias
        > Dataset: recurrent_weights
        > Group: transfer_fn
          > Attribute: type='madmom.ml.rnn.tanh'
      > Group: bwd_layer
        > Attribute: type='madmom.ml.rnn.RecurrentLayer'
        > Dataset: weights
        > Dataset: bias
        > Dataset: recurrent_weights
        > Group: transfer_fn
          > Attribute: type='madmom.ml.rnn.tanh'

The text was updated successfully, but these errors were encountered:

fdlm · 2016-03-03T18:24:16Z

Seems fine to me. Some thoughts:

If a group represents a list, we will need an attribute type='list' so we can handle it appropriately
In this case, the names of the nested groups to not matter (they are just items in a list) and we could use them to define the order. However, we need to keep in mind that the group names are strings, and therefore the ordering of numbers is not "natural" (e.g., '0', '1', '10', '2', ...). The creator of the model file would thus be responsible to format the names accordingly (e.g. '00', '01', '02', '10'). We get the sorting for free, and thus our code is simpler. However, we are dependent on the order in which h5py returns the groups, which might change (although it probably won't)
If we define an 'id' attribute, we would need to first go through all the groups, get the id attributes, and sort the groups accordingly. This is easy: l = [(g_.attrs['id'], g_) for g_ in g.itervalues()]; l.sort(). It seems to me that this is the clearer and more future-proof solution.
how do we determine if an instance of the class is wanted or just the class/function itself? By adding an attribute instantiate=False if you want the class/function itself

fdlm · 2016-03-03T19:42:59Z

Here's a prototype (seems to work, didn't check if the produced model is correct):

Code: https://gist.github.com/fdlm/b4be1190af0bfc9f2e7a
HDF5-File: https://drive.google.com/file/d/0B0gBhdh1fIPKT3BDR2NJY2JVRkU/view?usp=sharing

superbock · 2016-03-04T04:08:12Z

Regarding your points:

1. type='list' is straight forward and inline with my suggestions
I prefer 3) over 2) since the ids can be generated automatically when saving a model to HDF5/NPZ format
I'm fine with the instantiate attribute as well

Let's do it this way.

superbock · 2016-03-06T13:34:04Z

As outlined in #102, I think the right way of doing this is to add the loading functionality to the Processor class and adapt the load() method to be able to handle not only pickled but also .hdf5 and .npz files -- preferably by adding a dedicated method for each format and load() just acting as a wrapper.

The dump() method should be adapted to be able to save the processor in the desired format.

Anything I have missed?

superbock · 2016-03-07T16:16:25Z

The whole thing is basically a reimplementation of pickle, so I propose to just use pickle and we're done.

Refactor the neural network stuff into ml.nn. Additionally, the models are simple pickles now; fixes #99.

superbock added this to the v0.14 milestone Feb 29, 2016

This was referenced Mar 5, 2016

Refactor NN and model io #101

Closed

Load functions for new model format #102

Closed

superbock mentioned this issue Mar 9, 2016

Refactor the neural network stuff into ml.nn #110

Merged

superbock closed this as completed in #110 Mar 9, 2016

superbock pushed a commit that referenced this issue Mar 9, 2016

Merge pull request #110 from CPJKU/refactor_ml_nn

04f3d4d

Refactor the neural network stuff into ml.nn. Additionally, the models are simple pickles now; fixes #99.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework the .hdf5/.npz model format #99

Rework the .hdf5/.npz model format #99

superbock commented Feb 29, 2016

fdlm commented Mar 3, 2016

fdlm commented Mar 3, 2016

superbock commented Mar 4, 2016

superbock commented Mar 6, 2016

superbock commented Mar 7, 2016

Rework the .hdf5/.npz model format #99

Rework the .hdf5/.npz model format #99

Comments

superbock commented Feb 29, 2016

fdlm commented Mar 3, 2016

fdlm commented Mar 3, 2016

superbock commented Mar 4, 2016

superbock commented Mar 6, 2016

superbock commented Mar 7, 2016