Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: update suitcase pre-processing docs #1

Merged
merged 1 commit into from
Sep 29, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 61 additions & 20 deletions design/suitcase-preprocess-data-binning.rst
Original file line number Diff line number Diff line change
@@ -1,25 +1,31 @@
===============
Suitcase Update
===============
=================
Suitcase Update
=================

Summary
=======
Suitcase is a simple utility for exporting data from the databroker into a stand-alone, portable file, such as h5 file.


Suitcase is a simple utility for exporting data from the databroker
into a stand-alone, portable file, such as h5 file.


Current Implementation
======================
One of the functions currently used is called ``export``, which mainly takes inputs including header, filename and metadatastore, and
outputs h5 file with the same structure of the data in databroker.

One of the functions currently used is called ``export``, which mainly
takes inputs including header, filename and metadatastore, and outputs
h5 file with the same structure of the data in databroker.

.. code-block:: python

from suitcase import hdf5
last_run = db[-1] # get header from databroker
hdf5.export(last_run, 'myfile.h5', db=db)

The first argument may be a single Header or a list of Headers. You can also use keyword "fields"
in the "export" function to define specifically which data sets you want to output.
The first argument may be a single Header or a list of Headers. You
can also use keyword "fields" in the "export" function to define
specifically which data sets you want to output.

.. code-block:: python

Expand All @@ -30,18 +36,53 @@ in the "export" function to define specifically which data sets you want to outp
filename = 'scanID_123.h5'
hdf5.export(hdr, filename, db=db, fields=fds)

Here I assume A, B, C are keywords for some vector data, like images. You can define them as un_wanted_fields.
If all vector data are blocked, saving data with only scaler data and header information should be very faster.
Please also define filename clearly, so you know which data it comes from.
Here I assume A, B, C are keywords for some vector data, like
images. You can define them as un_wanted_fields. If all vector data
are blocked, saving data with only scaler data and header information
should be very faster. Please also define filename clearly, so you
know which data it comes from.

Issues and Proposed Solutions
=============================

Easily support many formats
---------------------------

Currently each file format needs to implement ``export`` independently
which leads to duplication of the logic for handling the header to
document work and will require each file format to implement the
in-line processing described below.

At the top level we should have an export function with a signature

.. code-block:: python

def export(headers: List[Header],
format: Union[str, Callable[[Generator[None, [str, dict]], None]]],
format_kwargs=None: Optional[Dict[str, Any]]
stream_name=None : Optional[Union[str, Iterable[str]]],
fields=None: Optional[Iterable[str]],
timestamps=True: Bool,
filters=None: Optional[Generator[[str, dict], [str, dct]])

Issues:
1. where to inject the file name?
- zipped list of names?
- single name?
- filename template?
- leave it up to the format / consumer?


in-line processing
------------------

Issue and Proposed Solution
===========================
Users want to do binning on some of the datasets, i.e., changing the shape of a given data from (100,100) to (50,50).
So we need to change both the data from events and the data shape information in the descriptor. Here are some
of the solutions.
Users want to do binning on some of the datasets, i.e., changing the
shape of a given data from (100,100) to (50,50). So we need to change
both the data from events and the data shape information in the
descriptor. Here are some of the solutions.

solution 1: decorator
---------------------
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

Expand All @@ -60,7 +101,7 @@ solution 1: decorator


solution 2: partial function
----------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

Expand All @@ -73,7 +114,7 @@ solution 2: partial function


solution 3: use class
---------------------
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

Expand All @@ -91,7 +132,7 @@ solution 3: use class
We can use base class from bluesky.

solution 4: based on original export function
---------------------------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

Expand Down