Merge pull request #1 from tacaswell/suitcase

DOC: update suitcase pre-processing docs
licode · Sep 29, 2017 · f61d6c8 · f61d6c8
2 parents 0f57200 + 0986e6f
commit f61d6c8
Showing 1 changed file with 61 additions and 20 deletions.
diff --git a/design/suitcase-preprocess-data-binning.rst b/design/suitcase-preprocess-data-binning.rst
@@ -1,25 +1,31 @@
-===============
-Suitcase Update
-===============
+=================
+ Suitcase Update
+=================
 
 Summary
 =======
-Suitcase is a simple utility for exporting data from the databroker into a stand-alone, portable file, such as h5 file.
+
+
+Suitcase is a simple utility for exporting data from the databroker
+into a stand-alone, portable file, such as h5 file.
 
 
 Current Implementation
 ======================
-One of the functions currently used is called ``export``, which mainly takes inputs including header, filename and metadatastore, and
-outputs h5 file with the same structure of the data in databroker.
+
+One of the functions currently used is called ``export``, which mainly
+takes inputs including header, filename and metadatastore, and outputs
+h5 file with the same structure of the data in databroker.
 
 .. code-block:: python
 
     from suitcase import hdf5
     last_run = db[-1]      # get header from databroker
     hdf5.export(last_run, 'myfile.h5', db=db)
 
-The first argument may be a single Header or a list of Headers. You can also use keyword "fields"
-in the "export" function to define specifically which data sets you want to output.
+The first argument may be a single Header or a list of Headers. You
+can also use keyword "fields" in the "export" function to define
+specifically which data sets you want to output.
 
 .. code-block:: python
 
@@ -30,18 +36,53 @@ in the "export" function to define specifically which data sets you want to outp
     filename = 'scanID_123.h5'
     hdf5.export(hdr, filename, db=db, fields=fds)
 
-Here I assume A, B, C are keywords for some vector data, like images. You can define them as un_wanted_fields.
-If all vector data are blocked, saving data with only scaler data and header information should be very faster.
-Please also define filename clearly, so you know which data it comes from.
+Here I assume A, B, C are keywords for some vector data, like
+images. You can define them as un_wanted_fields.  If all vector data
+are blocked, saving data with only scaler data and header information
+should be very faster.  Please also define filename clearly, so you
+know which data it comes from.
+
+Issues and Proposed Solutions
+=============================
+
+Easily support many formats
+---------------------------
+
+Currently each file format needs to implement ``export`` independently
+which leads to duplication of the logic for handling the header to
+document work and will require each file format to implement the
+in-line processing described below.
+
+At the top level we should have an export function with a signature
+
+.. code-block:: python
+
+   def export(headers: List[Header],
+   	      format: Union[str, Callable[[Generator[None, [str, dict]], None]]],
+	      format_kwargs=None: Optional[Dict[str, Any]]
+	      stream_name=None : Optional[Union[str, Iterable[str]]],
+	      fields=None: Optional[Iterable[str]],
+	      timestamps=True: Bool,
+              filters=None: Optional[Generator[[str, dict], [str, dct]])
+
+Issues:
+ 1. where to inject the file name?
+    - zipped list of names?
+    - single name?
+    - filename template?
+    - leave it up to the format / consumer?
+
+
+in-line processing
+------------------
 
-Issue and Proposed Solution
-===========================
-Users want to do binning on some of the datasets, i.e., changing the shape of a given data from (100,100) to (50,50).
-So we need to change both the data from events and the data shape information in the descriptor. Here are some
-of the solutions.
+Users want to do binning on some of the datasets, i.e., changing the
+shape of a given data from (100,100) to (50,50).  So we need to change
+both the data from events and the data shape information in the
+descriptor. Here are some of the solutions.
 
 solution 1: decorator
----------------------
+~~~~~~~~~~~~~~~~~~~~~
 
 .. code-block:: python
 
@@ -60,7 +101,7 @@ solution 1: decorator
 
 
 solution 2: partial function
-----------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 .. code-block:: python
 
@@ -73,7 +114,7 @@ solution 2: partial function
 
 
 solution 3: use class
----------------------
+~~~~~~~~~~~~~~~~~~~~~
 
 .. code-block:: python
 
@@ -91,7 +132,7 @@ solution 3: use class
 We can use base class from bluesky.
 
 solution 4: based on original export function
----------------------------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 .. code-block:: python