Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update API in Suitcase #62

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
141 changes: 141 additions & 0 deletions design/suitcase-preprocess-data-binning.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
=================
Suitcase Update
=================

Summary
=======


Suitcase is a simple utility for exporting data from the databroker
into a stand-alone, portable file, such as h5 file.


Current Implementation
======================

One of the functions currently used is called ``export``, which mainly
takes inputs including header, filename and metadatastore, and outputs
h5 file with the same structure of the data in databroker.

.. code-block:: python

from suitcase import hdf5
last_run = db[-1] # get header from databroker
hdf5.export(last_run, 'myfile.h5', db=db)

The first argument may be a single Header or a list of Headers. You
can also use keyword "fields" in the "export" function to define
specifically which data sets you want to output.

.. code-block:: python

from suitcase import hdf5
hdr = db[123]
un_wanted_fields = ['A', 'B', 'C']
fds = hdf5.filter_fields(hdr, un_wanted_fields)
filename = 'scanID_123.h5'
hdf5.export(hdr, filename, db=db, fields=fds)

Here I assume A, B, C are keywords for some vector data, like
images. You can define them as un_wanted_fields. If all vector data
are blocked, saving data with only scaler data and header information
should be very faster. Please also define filename clearly, so you
know which data it comes from.

Issues and Proposed Solutions
=============================

Easily support many formats
---------------------------

Currently each file format needs to implement ``export`` independently
which leads to duplication of the logic for handling the header to
document work and will require each file format to implement the
in-line processing described below.

At the top level we should have an export function with a signature

.. code-block:: python

def export(headers: List[Header],
format: Union[str, Callable[[Generator[None, [str, dict]], None]]],
format_kwargs=None: Optional[Dict[str, Any]]
stream_name=None : Optional[Union[str, Iterable[str]]],
fields=None: Optional[Iterable[str]],
timestamps=True: Bool,
filters=None: Optional[Generator[[str, dict], [str, dct]])

Issues:
1. where to inject the file name?
- zipped list of names?
- single name?
- filename template?
- leave it up to the format / consumer?


in-line processing
------------------

Users want to do binning on some of the datasets, i.e., changing the
shape of a given data from (100,100) to (50,50). So we need to change
both the data from events and the data shape information in the
descriptor. Here are some of the solutions.

solution 1: decorator
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

def make_rebinner(n, field):
def rebinner(name, doc):
if name =='descriptor':
#change information in descriptor
elif name == 'event':
#rebin data here from event here
else:
return doc
return rebinner

hdf5.export(last_run, 'myfile.h5', db=db, filter=make_rebinner(3, 'a'))



solution 2: partial function
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

from functools import partial

def rebinner(n, field, name, doc)
make_rebinner = partial(rebinner, 3, 'a')

hdf5.export(last_run, 'myfile.h5', db=db, filter=make_rebinner)


solution 3: use class
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

class ReBinner:

def __init__(self, n, field):
self.n = n
self.field = field

def __call__()self, name, doc):
...

hdf5.export(last_run, 'myfile.h5', db=db, filter=ReBinner(3, 'a'))

We can use base class from bluesky.

solution 4: based on original export function
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

hdf5.export(last_run, 'myfile.h5', db=db, filter, filter_kwargs)

# use filter function as filter(name, doc, filter_kwargs)