Skip to content

Commit

Permalink
Merge pull request #1 from tacaswell/suitcase
Browse files Browse the repository at this point in the history
DOC: update suitcase pre-processing docs
  • Loading branch information
licode authored Sep 29, 2017
2 parents 0f57200 + 0986e6f commit f61d6c8
Showing 1 changed file with 61 additions and 20 deletions.
81 changes: 61 additions & 20 deletions design/suitcase-preprocess-data-binning.rst
Original file line number Diff line number Diff line change
@@ -1,25 +1,31 @@
===============
Suitcase Update
===============
=================
Suitcase Update
=================

Summary
=======
Suitcase is a simple utility for exporting data from the databroker into a stand-alone, portable file, such as h5 file.


Suitcase is a simple utility for exporting data from the databroker
into a stand-alone, portable file, such as h5 file.


Current Implementation
======================
One of the functions currently used is called ``export``, which mainly takes inputs including header, filename and metadatastore, and
outputs h5 file with the same structure of the data in databroker.

One of the functions currently used is called ``export``, which mainly
takes inputs including header, filename and metadatastore, and outputs
h5 file with the same structure of the data in databroker.

.. code-block:: python
from suitcase import hdf5
last_run = db[-1] # get header from databroker
hdf5.export(last_run, 'myfile.h5', db=db)
The first argument may be a single Header or a list of Headers. You can also use keyword "fields"
in the "export" function to define specifically which data sets you want to output.
The first argument may be a single Header or a list of Headers. You
can also use keyword "fields" in the "export" function to define
specifically which data sets you want to output.

.. code-block:: python
Expand All @@ -30,18 +36,53 @@ in the "export" function to define specifically which data sets you want to outp
filename = 'scanID_123.h5'
hdf5.export(hdr, filename, db=db, fields=fds)
Here I assume A, B, C are keywords for some vector data, like images. You can define them as un_wanted_fields.
If all vector data are blocked, saving data with only scaler data and header information should be very faster.
Please also define filename clearly, so you know which data it comes from.
Here I assume A, B, C are keywords for some vector data, like
images. You can define them as un_wanted_fields. If all vector data
are blocked, saving data with only scaler data and header information
should be very faster. Please also define filename clearly, so you
know which data it comes from.

Issues and Proposed Solutions
=============================

Easily support many formats
---------------------------

Currently each file format needs to implement ``export`` independently
which leads to duplication of the logic for handling the header to
document work and will require each file format to implement the
in-line processing described below.

At the top level we should have an export function with a signature

.. code-block:: python
def export(headers: List[Header],
format: Union[str, Callable[[Generator[None, [str, dict]], None]]],
format_kwargs=None: Optional[Dict[str, Any]]
stream_name=None : Optional[Union[str, Iterable[str]]],
fields=None: Optional[Iterable[str]],
timestamps=True: Bool,
filters=None: Optional[Generator[[str, dict], [str, dct]])
Issues:
1. where to inject the file name?
- zipped list of names?
- single name?
- filename template?
- leave it up to the format / consumer?
in-line processing
------------------
Issue and Proposed Solution
===========================
Users want to do binning on some of the datasets, i.e., changing the shape of a given data from (100,100) to (50,50).
So we need to change both the data from events and the data shape information in the descriptor. Here are some
of the solutions.
Users want to do binning on some of the datasets, i.e., changing the
shape of a given data from (100,100) to (50,50). So we need to change
both the data from events and the data shape information in the
descriptor. Here are some of the solutions.
solution 1: decorator
---------------------
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
Expand All @@ -60,7 +101,7 @@ solution 1: decorator
solution 2: partial function
----------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
Expand All @@ -73,7 +114,7 @@ solution 2: partial function
solution 3: use class
---------------------
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
Expand All @@ -91,7 +132,7 @@ solution 3: use class
We can use base class from bluesky.
solution 4: based on original export function
---------------------------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
Expand Down

0 comments on commit f61d6c8

Please sign in to comment.