From c6909d869df161f7530c480f0bbbbebe8109edcf Mon Sep 17 00:00:00 2001 From: licode Date: Wed, 26 Apr 2017 10:30:57 -0400 Subject: [PATCH 1/5] DEV: init of suitcase redesign --- design/suitcase-update.rst | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 design/suitcase-update.rst diff --git a/design/suitcase-update.rst b/design/suitcase-update.rst new file mode 100644 index 0000000..e863b32 --- /dev/null +++ b/design/suitcase-update.rst @@ -0,0 +1,38 @@ +=============== +Suitcase Update +=============== + +Summary +======= +Suitcase is a simple utility for exporting data from the databroker into a stand-alone, portable file, such as h5 file. + + +Current Implementation +====================== +One of the functions currently used is called "export", which mainly takes header, filename and metadatastore as input and +output h5 file with the same structure of the data in databroker. + +Usage Example +------------- + +.. code-block:: python + + from suitcase import hdf5 + last_run = db[-1] # get header from databroker + hdf5.export(last_run, 'myfile.h5', mds=db.mds) + +The first argument may be a single Header or a list of Headers. You can also use keyword "fields" +in the "export" function to define specifically which data sets you want to output. + +.. code-block:: python + + from suitcase import hdf5 + hdr = db[123] + un_wanted_fields = ['A', 'B', 'C'] + fds = hdf5.filter_fields(hdr, un_wanted_fields) + filename = 'scanID_123.h5' + hdf5.export(hdr, filename, mds=db.mds, fields=fds) + +Here I assume A, B, C are keywords for some vector data, like images. You can define them as un_wanted_fields. +If all vector data are blocked, saving data with only scaler data and header information should be very faster. +Please also define filename clearly, so you know which data it comes from. From 881d4e71e5115913d2766fdaee73f6106c547b0f Mon Sep 17 00:00:00 2001 From: licode Date: Wed, 26 Apr 2017 12:06:42 -0400 Subject: [PATCH 2/5] WIP: quick solutions are added --- design/suitcase-update.rst | 44 +++++++++++++++++++++++++++++++++----- 1 file changed, 39 insertions(+), 5 deletions(-) diff --git a/design/suitcase-update.rst b/design/suitcase-update.rst index e863b32..31ad86a 100644 --- a/design/suitcase-update.rst +++ b/design/suitcase-update.rst @@ -9,11 +9,8 @@ Suitcase is a simple utility for exporting data from the databroker into a stand Current Implementation ====================== -One of the functions currently used is called "export", which mainly takes header, filename and metadatastore as input and -output h5 file with the same structure of the data in databroker. - -Usage Example -------------- +One of the functions currently used is called "export", which mainly takes inputs including header, filename and metadatastore, and +outputs h5 file with the same structure of the data in databroker. .. code-block:: python @@ -36,3 +33,40 @@ in the "export" function to define specifically which data sets you want to outp Here I assume A, B, C are keywords for some vector data, like images. You can define them as un_wanted_fields. If all vector data are blocked, saving data with only scaler data and header information should be very faster. Please also define filename clearly, so you know which data it comes from. + +Issue and Proposed Solution +=========================== +Users want to do binning on some of the datasets, i.e., changing the shape of a given data from (100,100) to (50,50). +So we need to change both the data from events and the data shape information in the descriptor. Here are some +of the solutions. + +solution 1: decorator +--------------------- + +.. code-block:: python + + def make_rebinner(n, field): + def rebinner(name, doc): + if name =='descriptor': + #change information in descriptor + elif name == 'event': + #rebin data here from event here + else: + return doc + return rebinner + + hdf5.export(last_run, 'myfile.h5', mds=db.mds, filter=make_rebinner(3, 'a')) + + + +solution 2: partial function +---------------------------- + +.. code-block:: python + from functools import partial + + def rebinner(n, field, name, doc) + make_rebinner = partial(rebinner, 3, 'a') + hdf5.export(last_run, 'myfile.h5', mds=db.mds, filter=make_rebinner) + + From 82997a37fbc4e635a728648da5f2d2410308c6f2 Mon Sep 17 00:00:00 2001 From: licode Date: Wed, 26 Apr 2017 15:59:26 -0400 Subject: [PATCH 3/5] DEV: more solutions --- design/suitcase-update.rst | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/design/suitcase-update.rst b/design/suitcase-update.rst index 31ad86a..eeda52a 100644 --- a/design/suitcase-update.rst +++ b/design/suitcase-update.rst @@ -9,7 +9,7 @@ Suitcase is a simple utility for exporting data from the databroker into a stand Current Implementation ====================== -One of the functions currently used is called "export", which mainly takes inputs including header, filename and metadatastore, and +One of the functions currently used is called ``export``, which mainly takes inputs including header, filename and metadatastore, and outputs h5 file with the same structure of the data in databroker. .. code-block:: python @@ -63,10 +63,38 @@ solution 2: partial function ---------------------------- .. code-block:: python + from functools import partial def rebinner(n, field, name, doc) make_rebinner = partial(rebinner, 3, 'a') + hdf5.export(last_run, 'myfile.h5', mds=db.mds, filter=make_rebinner) - + +solution 3: use class +--------------------- + +.. code-block:: python + + class ReBinner: + + def __init__(self, n, field): + self.n = n + self.field = field + + def __call__()self, name, doc): + ... + + hdf5.export(last_run, 'myfile.h5', mds=db.mds, filter=ReBinner(3, 'a')) + +We can use base class from bluesky. + +solution 4: based on original export function +--------------------------------------------- + +.. code-block:: python + + hdf5.export(last_run, 'myfile.h5', mds=db.mds, filter, filter_kwargs) + + # use filter function as filter(name, doc, filter_kwargs) From 0f57200aeecc65289a8f7451812998101d07c478 Mon Sep 17 00:00:00 2001 From: licode Date: Wed, 26 Apr 2017 17:12:33 -0400 Subject: [PATCH 4/5] DEV: better filename with more information as we do other designs here. And use db instead of db.mds --- ...date.rst => suitcase-preprocess-data-binning.rst} | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) rename design/{suitcase-update.rst => suitcase-preprocess-data-binning.rst} (86%) diff --git a/design/suitcase-update.rst b/design/suitcase-preprocess-data-binning.rst similarity index 86% rename from design/suitcase-update.rst rename to design/suitcase-preprocess-data-binning.rst index eeda52a..60dcaf1 100644 --- a/design/suitcase-update.rst +++ b/design/suitcase-preprocess-data-binning.rst @@ -16,7 +16,7 @@ outputs h5 file with the same structure of the data in databroker. from suitcase import hdf5 last_run = db[-1] # get header from databroker - hdf5.export(last_run, 'myfile.h5', mds=db.mds) + hdf5.export(last_run, 'myfile.h5', db=db) The first argument may be a single Header or a list of Headers. You can also use keyword "fields" in the "export" function to define specifically which data sets you want to output. @@ -28,7 +28,7 @@ in the "export" function to define specifically which data sets you want to outp un_wanted_fields = ['A', 'B', 'C'] fds = hdf5.filter_fields(hdr, un_wanted_fields) filename = 'scanID_123.h5' - hdf5.export(hdr, filename, mds=db.mds, fields=fds) + hdf5.export(hdr, filename, db=db, fields=fds) Here I assume A, B, C are keywords for some vector data, like images. You can define them as un_wanted_fields. If all vector data are blocked, saving data with only scaler data and header information should be very faster. @@ -55,7 +55,7 @@ solution 1: decorator return doc return rebinner - hdf5.export(last_run, 'myfile.h5', mds=db.mds, filter=make_rebinner(3, 'a')) + hdf5.export(last_run, 'myfile.h5', db=db, filter=make_rebinner(3, 'a')) @@ -69,7 +69,7 @@ solution 2: partial function def rebinner(n, field, name, doc) make_rebinner = partial(rebinner, 3, 'a') - hdf5.export(last_run, 'myfile.h5', mds=db.mds, filter=make_rebinner) + hdf5.export(last_run, 'myfile.h5', db=db, filter=make_rebinner) solution 3: use class @@ -86,7 +86,7 @@ solution 3: use class def __call__()self, name, doc): ... - hdf5.export(last_run, 'myfile.h5', mds=db.mds, filter=ReBinner(3, 'a')) + hdf5.export(last_run, 'myfile.h5', db=db, filter=ReBinner(3, 'a')) We can use base class from bluesky. @@ -95,6 +95,6 @@ solution 4: based on original export function .. code-block:: python - hdf5.export(last_run, 'myfile.h5', mds=db.mds, filter, filter_kwargs) + hdf5.export(last_run, 'myfile.h5', db=db, filter, filter_kwargs) # use filter function as filter(name, doc, filter_kwargs) From 0986e6f063722e2696ed4713422f6bc196081689 Mon Sep 17 00:00:00 2001 From: Thomas A Caswell Date: Thu, 28 Sep 2017 19:54:26 -0700 Subject: [PATCH 5/5] DOC: update suitcase pre-processing docs --- design/suitcase-preprocess-data-binning.rst | 81 ++++++++++++++++----- 1 file changed, 61 insertions(+), 20 deletions(-) diff --git a/design/suitcase-preprocess-data-binning.rst b/design/suitcase-preprocess-data-binning.rst index 60dcaf1..8258a74 100644 --- a/design/suitcase-preprocess-data-binning.rst +++ b/design/suitcase-preprocess-data-binning.rst @@ -1,16 +1,21 @@ -=============== -Suitcase Update -=============== +================= + Suitcase Update +================= Summary ======= -Suitcase is a simple utility for exporting data from the databroker into a stand-alone, portable file, such as h5 file. + + +Suitcase is a simple utility for exporting data from the databroker +into a stand-alone, portable file, such as h5 file. Current Implementation ====================== -One of the functions currently used is called ``export``, which mainly takes inputs including header, filename and metadatastore, and -outputs h5 file with the same structure of the data in databroker. + +One of the functions currently used is called ``export``, which mainly +takes inputs including header, filename and metadatastore, and outputs +h5 file with the same structure of the data in databroker. .. code-block:: python @@ -18,8 +23,9 @@ outputs h5 file with the same structure of the data in databroker. last_run = db[-1] # get header from databroker hdf5.export(last_run, 'myfile.h5', db=db) -The first argument may be a single Header or a list of Headers. You can also use keyword "fields" -in the "export" function to define specifically which data sets you want to output. +The first argument may be a single Header or a list of Headers. You +can also use keyword "fields" in the "export" function to define +specifically which data sets you want to output. .. code-block:: python @@ -30,18 +36,53 @@ in the "export" function to define specifically which data sets you want to outp filename = 'scanID_123.h5' hdf5.export(hdr, filename, db=db, fields=fds) -Here I assume A, B, C are keywords for some vector data, like images. You can define them as un_wanted_fields. -If all vector data are blocked, saving data with only scaler data and header information should be very faster. -Please also define filename clearly, so you know which data it comes from. +Here I assume A, B, C are keywords for some vector data, like +images. You can define them as un_wanted_fields. If all vector data +are blocked, saving data with only scaler data and header information +should be very faster. Please also define filename clearly, so you +know which data it comes from. + +Issues and Proposed Solutions +============================= + +Easily support many formats +--------------------------- + +Currently each file format needs to implement ``export`` independently +which leads to duplication of the logic for handling the header to +document work and will require each file format to implement the +in-line processing described below. + +At the top level we should have an export function with a signature + +.. code-block:: python + + def export(headers: List[Header], + format: Union[str, Callable[[Generator[None, [str, dict]], None]]], + format_kwargs=None: Optional[Dict[str, Any]] + stream_name=None : Optional[Union[str, Iterable[str]]], + fields=None: Optional[Iterable[str]], + timestamps=True: Bool, + filters=None: Optional[Generator[[str, dict], [str, dct]]) + +Issues: + 1. where to inject the file name? + - zipped list of names? + - single name? + - filename template? + - leave it up to the format / consumer? + + +in-line processing +------------------ -Issue and Proposed Solution -=========================== -Users want to do binning on some of the datasets, i.e., changing the shape of a given data from (100,100) to (50,50). -So we need to change both the data from events and the data shape information in the descriptor. Here are some -of the solutions. +Users want to do binning on some of the datasets, i.e., changing the +shape of a given data from (100,100) to (50,50). So we need to change +both the data from events and the data shape information in the +descriptor. Here are some of the solutions. solution 1: decorator ---------------------- +~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python @@ -60,7 +101,7 @@ solution 1: decorator solution 2: partial function ----------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python @@ -73,7 +114,7 @@ solution 2: partial function solution 3: use class ---------------------- +~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python @@ -91,7 +132,7 @@ solution 3: use class We can use base class from bluesky. solution 4: based on original export function ---------------------------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python