-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve speed of getting local OCAT data #272
Conversation
mica/archive/cda/services.py
Outdated
# above 128 that signify a non-ASCII character. | ||
itemsize = col.dtype.itemsize | ||
col_bytes = col.view((np.uint8, (itemsize,))) | ||
if np.all(col_bytes.flatten() < 128): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't np.all
handle all shapes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, good point, will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
# but with the single leading byte set. | ||
col_utf8 = np.zeros((col_bytes.shape[0], itemsize * 4), dtype=np.uint8) | ||
for ii in range(itemsize): | ||
col_utf8[:, ii * 4] = col_bytes[:, ii] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this loop the same as this?
col_utf8_2[:,::4] = col_bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, but it requires a little thinking to be sure it will be right. Writing it out in a loop makes the intent blindingly obvious and is effectively just as fast (given all the other overhead).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this comment after testing that it was equivalent, so my question was a bit rhetorical, but ok.
b04bd13
to
db7b2e0
Compare
I fixed the thing about using an observer name, though that comments seems to be gone now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the following:
- ran pytest and it was ok.
- reproduced the one-line functional tests in the description.
- made the change to use
col_utf8[:,::4] = col_bytes
instead of the loop and ran the on-line tests:
In [1]: from mica.archive.cda import get_ocat_local
In [2]: %time dat = get_ocat_local()
CPU times: user 400 ms, sys: 104 ms, total: 504 ms
Wall time: 509 ms
In [3]: %time dat = get_ocat_local(datafile="ocat.h5")
CPU times: user 265 ms, sys: 63.9 ms, total: 329 ms
Wall time: 328 ms
# but with the single leading byte set. | ||
col_utf8 = np.zeros((col_bytes.shape[0], itemsize * 4), dtype=np.uint8) | ||
for ii in range(itemsize): | ||
col_utf8[:, ii * 4] = col_bytes[:, ii] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this comment after testing that it was equivalent, so my question was a bit rhetorical, but ok.
Description
Improve the speed of reading the local OCAT. Most of the time is spent decoding the UTF-8
Interface impacts
None.
Testing
Unit tests
Independent check of unit tests by Javier
Functional tests
Current release
New using current compressed HDF5
New using uncompressed HDF5