Improve speed of getting local OCAT data #272

taldcroft · 2022-05-09T21:08:46Z

Description

Improve the speed of reading the local OCAT. Most of the time is spent decoding the UTF-8

Interface impacts

None.

Testing

Unit tests

Mac

Independent check of unit tests by Javier

Functional tests

Current release

(ska3) ➜  docs git:(ocat-local-faster) ipython
Python 3.8.12 (default, Oct 12 2021, 06:23:56) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.29.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from mica.archive.cda import get_ocat_local
In [2]: %time dat = get_ocat_local()
CPU times: user 1.69 s, sys: 83 ms, total: 1.77 s
Wall time: 1.77 s

New using current compressed HDF5

(ska3) ➜  mica git:(ocat-local-faster) ipython
Python 3.8.12 (default, Oct 12 2021, 06:23:56) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.29.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from mica.archive.cda import get_ocat_local
In [2]: %time dat = get_ocat_local()
CPU times: user 456 ms, sys: 93.3 ms, total: 549 ms
Wall time: 549 ms

New using uncompressed HDF5

In [2]: %time dat = get_ocat_local(datafile="ocat.h5")
CPU times: user 298 ms, sys: 103 ms, total: 401 ms
Wall time: 403 ms

javierggt · 2022-05-11T16:23:12Z

mica/archive/cda/services.py

+                # above 128 that signify a non-ASCII character.
+                itemsize = col.dtype.itemsize
+                col_bytes = col.view((np.uint8, (itemsize,)))
+                if np.all(col_bytes.flatten() < 128):


doesn't np.all handle all shapes?

Oops, good point, will fix.

mica/archive/cda/services.py

javierggt · 2022-05-11T16:53:31Z

mica/archive/cda/services.py

+                    # but with the single leading byte set.
+                    col_utf8 = np.zeros((col_bytes.shape[0], itemsize * 4), dtype=np.uint8)
+                    for ii in range(itemsize):
+                        col_utf8[:, ii * 4] = col_bytes[:, ii]


Isn't this loop the same as this?

col_utf8_2[:,::4] = col_bytes

Probably, but it requires a little thinking to be sure it will be right. Writing it out in a loop makes the intent blindingly obvious and is effectively just as fast (given all the other overhead).

I made this comment after testing that it was equivalent, so my question was a bit rhetorical, but ok.

taldcroft · 2022-05-14T09:57:06Z

I fixed the thing about using an observer name, though that comments seems to be gone now.

javierggt

I did the following:

ran pytest and it was ok.
reproduced the one-line functional tests in the description.
made the change to use col_utf8[:,::4] = col_bytes instead of the loop and ran the on-line tests:

In [1]: from mica.archive.cda import get_ocat_local

In [2]: %time dat = get_ocat_local()
CPU times: user 400 ms, sys: 104 ms, total: 504 ms
Wall time: 509 ms

In [3]: %time dat = get_ocat_local(datafile="ocat.h5")
CPU times: user 265 ms, sys: 63.9 ms, total: 329 ms
Wall time: 328 ms

javierggt · 2022-05-17T14:47:32Z

mica/archive/cda/services.py

+                    # but with the single leading byte set.
+                    col_utf8 = np.zeros((col_bytes.shape[0], itemsize * 4), dtype=np.uint8)
+                    for ii in range(itemsize):
+                        col_utf8[:, ii * 4] = col_bytes[:, ii]


I made this comment after testing that it was equivalent, so my question was a bit rhetorical, but ok.

taldcroft added 2 commits May 9, 2022 16:51

Improve speed of getting local OCAT data

f18dcf9

Do not use compression saving OCAT local HDF5 file

cd10ddf

javierggt reviewed May 11, 2022

View reviewed changes

taldcroft requested review from javierggt and jeanconn May 12, 2022 12:24

Address review comments and add a test

db7b2e0

taldcroft force-pushed the ocat-local-faster branch from b04bd13 to db7b2e0 Compare May 14, 2022 09:51

javierggt approved these changes May 17, 2022

View reviewed changes

taldcroft merged commit 756a27c into master May 17, 2022

taldcroft deleted the ocat-local-faster branch May 17, 2022 15:43

This was referenced May 17, 2022

Update sot/mica to 4.29.0 sot/skare3#843

Closed

2022.5 (closed) sot/skare3#850

Closed

javierggt mentioned this pull request May 31, 2022

2022.6 sot/skare3#860

Merged

javierggt mentioned this pull request Aug 3, 2022

2022.5 sot/skare3#899

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve speed of getting local OCAT data #272

Improve speed of getting local OCAT data #272

taldcroft commented May 9, 2022 •

edited by javierggt

Loading

javierggt May 11, 2022

taldcroft May 11, 2022

taldcroft May 12, 2022

javierggt May 11, 2022

taldcroft May 11, 2022

javierggt May 17, 2022

taldcroft commented May 14, 2022

javierggt left a comment •

edited

Loading

javierggt May 17, 2022

Improve speed of getting local OCAT data #272

Improve speed of getting local OCAT data #272

Conversation

taldcroft commented May 9, 2022 • edited by javierggt Loading

Description

Interface impacts

Testing

Unit tests

Functional tests

Current release

New using current compressed HDF5

New using uncompressed HDF5

javierggt May 11, 2022

Choose a reason for hiding this comment

taldcroft May 11, 2022

Choose a reason for hiding this comment

taldcroft May 12, 2022

Choose a reason for hiding this comment

javierggt May 11, 2022

Choose a reason for hiding this comment

taldcroft May 11, 2022

Choose a reason for hiding this comment

javierggt May 17, 2022

Choose a reason for hiding this comment

taldcroft commented May 14, 2022

javierggt left a comment • edited Loading

Choose a reason for hiding this comment

javierggt May 17, 2022

Choose a reason for hiding this comment

taldcroft commented May 9, 2022 •

edited by javierggt

Loading

javierggt left a comment •

edited

Loading