add timeseries time query utils #1175

bendichter · 2020-02-05T23:08:18Z

Motivation

Efficient methods for getting time information for TimeSeries objects. This needs tests, but I wanted to check in and see what you all think.

codecov · 2020-02-05T23:31:48Z

Codecov Report

Merging #1175 into dev will increase coverage by 0.12%.
The diff coverage is 95.45%.

@@            Coverage Diff             @@
##              dev    #1175      +/-   ##
==========================================
+ Coverage   67.44%   67.57%   +0.12%     
==========================================
  Files          37       37              
  Lines        2310     2319       +9     
  Branches      406      410       +4     
==========================================
+ Hits         1558     1567       +9     
  Misses        683      683              
  Partials       69       69

Impacted Files	Coverage Δ
src/pynwb/icephys.py	`88.99% <95.45%> (+0.99%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e27b5d...359f8af. Read the comment docs.

bendichter · 2020-02-20T20:49:17Z

@oruebel @ajtritt @rly I think this would be useful for quite a few projects. Should I move forward with writing tests?

ajtritt · 2020-02-20T21:06:52Z

@bendichter It would be nice if you could use these somehow. https://hdmf.readthedocs.io/en/latest/hdmf.array.html

One option would be to construct these in the ObjectMapper, and always pass a SortedArray or LinSpace for the timestamps argument of the constructor.

bendichter · 2020-02-20T22:04:48Z

@ajtritt oh cool, I didn't know about those! It would definitely be nice to use those for timestamps in pynwb, since they should always be sorted. That should help with the methods you've implemented, <,>,<=,>=,==, and !=, and find_point (aka bisect I'm guessing). I don't see how they would help with any of the methods I have added here though.

As a side note, since you are already doing these, it would be nice to also have UniqueArray.

ajtritt · 2020-02-20T22:16:48Z

@bendichter

I don't see how they would help with any of the methods I have added here though.

LinSpace would need to be modified to have an appropriate __getitem__, but then get_mint, get_maxt, and get_timestamps would just be regular slices on TimeSeries.timestamps instead of special methods.

As a side note, since you are already doing these, it would be nice to also have UniqueArray.

What do you mean by unique?

bendichter · 2020-02-20T23:31:11Z

@ajtritt ah, I see. Using time_series.timestamps[0] and time_series.timestamps[-1] for regularly and irregularly sampled TimeSeries objects would be nice syntax. On the other hand, I generally prefer those types of things to be explicit. .timestamps is currently an attribute call. It returning non-None implies that that attribute was there, and might mislead a user into thinking that the TimeSeries is likely irregularly sampled. I generally prefer get so that it is clear to the user that you might be doing something under the hood to get the output. But I guess that's a stylistic difference. In your approach, how would you recommend a user determine if a TimeSeries stores a timestamps attribute (signaling that it is likely irregularly sampled)?

Is LinSpace 1D or can it be n-d?

ajtritt · 2020-02-20T23:52:43Z

On the other hand, I generally prefer those types of things to be explicit. .timestamps is currently an attribute call. It returning non-None implies that that attribute was there, and might mislead a user into thinking that the TimeSeries is likely irregularly sampled.

I appreciate the need for being explicit, but I think making the API clean and easy to use is also important. It's a disservice to users to just pass forward the complexity of the NWB standard (this is actually something drove away early users). PyNWB is not just meant to be a fancy layer on top of h5py--it is supposed to make it easier to work with data. At the end of the day, users are storing data so they can use it to do something--not just know how it was stored. If the user can always count on time information to be in the same place i.e. .timestamps, regardless of how it is stored, it makes it easier for them to work with their data.

If they need to know if data was regularly sampled or stored with starting_time/rate, then they can just look at those values. If timeseries.starting_time is None evaluates to True, then that means it wasn't stored, and they satisfy one-more condition for determining if the data was sampled irregularly. Alternatively, if you use SortedArray/LinSpace, then isinstance(timeseries.timestamps, LinSpace) means data was regularly sampled.

bendichter · 2020-02-21T00:45:35Z

@ajtritt I hear that you want to make this easy to use. I do too. I am pulling these functions from a pool of convenience functions I have made to deal with pynwb objects. @lvsltz told me during the hackathon that he found them helpful as well and suggested that other users might also want to access them.

I think in these types of situations if we know exactly how the user is going to use this, it can be easier to use, but if they use it differently than we expect, it could be more confusing for them. Overloading __getitem__ is a perfect example of this. If we implement it with standard array indexing, we are basically implying to users that this object should be treated like a numpy array.

On the face of it, it seems pretty easy, but actually __getitem__ for numpy arrays is pretty complex. What if a user wants to skip values, or increment backwards, or index timestamps with a boolean array? Those would all be straightforward slice calls in numpy, but are we covering all of those slice use-cases here? What if they are trying to iterate over timestamps? Now we need a next method. I think this will end up being more work than a few convenience methods. It's all doable though, if that's the direction we want to go.

ajtritt · 2020-02-21T00:53:34Z

I think this will end up being more work than a few convenience methods. It's all doable though, if that's the direction we want to go.

That is the direction I would like to go. I much prefer it to a catalog of instance methods that only do one thing.

@oruebel @rly thoughts?

oruebel · 2020-02-21T19:43:24Z

It's all doable though, if that's the direction we want to go.

I think ultimately that is the direction we should aim towards. I think the catalog of helper functions is useful as a means to figure out what the API should support and it is useful for specialized use-cases that may be out of scope, but for common things (like slicing into data), I think we want to have it under the main API.

ajtritt · 2020-02-21T19:47:58Z

I think the catalog of helper functions is useful as a means to figure out what the API should support and it is useful for specialized use-cases that may be out of scope

I don't want to go down this road unless we have very good reason to. Every function we add is something we can never get rid of and requires maintenance. I think we need to be more deliberate about what we add and how we add it. Otherwise, this thing is just going to continue to balloon into a maintenance nightmare with so many options for users that they don't know where to start.

I think we need to reprioritize simplicity and elegance, rather than blindly adding things for the sake of convenience.

oruebel · 2020-02-21T20:12:38Z

I don't want to go down this road unless we have very good reason to

Just to clarify, I didn't mean to imply a catalog of helper functions inside PyNWB, I was speaking more broadly to users with collections of helper functions. E.g, the functions that Ben is proposing are not part of NWB but come from a collection of functions he wrote outside of PyNWB.

bendichter · 2020-02-23T19:14:24Z

OK, I'll continue to maintain my own helper functions as I need them and transition to pynwb functionality as that becomes available.

bendichter · 2020-05-04T18:53:11Z

@ajtritt any progress on this front? It would be really great to have this tool as a response to the concern raised in AllenInstitute/AllenSDK#1421 (comment)

add timeseries time query utils

21717c2

bendichter requested review from ajtritt, rly and oruebel February 5, 2020 23:09

Merge branch 'dev' into enh/timeseries_utils

4e27b5d

Merge branch 'dev' into enh/timeseries_utils

359f8af

bendichter closed this Feb 23, 2020

bendichter mentioned this pull request Feb 23, 2020

add efficient methods for getting first and last spike time #1185

Closed

bendichter reopened this May 4, 2020

bendichter closed this May 4, 2020

bendichter mentioned this pull request May 4, 2020

Record sampling rate as part of the Electrical series AllenInstitute/AllenSDK#1421

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add timeseries time query utils #1175

add timeseries time query utils #1175

bendichter commented Feb 5, 2020

codecov bot commented Feb 5, 2020 •

edited

Loading

bendichter commented Feb 20, 2020

ajtritt commented Feb 20, 2020

bendichter commented Feb 20, 2020

ajtritt commented Feb 20, 2020 •

edited

Loading

bendichter commented Feb 20, 2020

ajtritt commented Feb 20, 2020

bendichter commented Feb 21, 2020

ajtritt commented Feb 21, 2020

oruebel commented Feb 21, 2020

ajtritt commented Feb 21, 2020

oruebel commented Feb 21, 2020

bendichter commented Feb 23, 2020

bendichter commented May 4, 2020

add timeseries time query utils #1175

add timeseries time query utils #1175

Conversation

bendichter commented Feb 5, 2020

Motivation

codecov bot commented Feb 5, 2020 • edited Loading

Codecov Report

bendichter commented Feb 20, 2020

ajtritt commented Feb 20, 2020

bendichter commented Feb 20, 2020

ajtritt commented Feb 20, 2020 • edited Loading

bendichter commented Feb 20, 2020

ajtritt commented Feb 20, 2020

bendichter commented Feb 21, 2020

ajtritt commented Feb 21, 2020

oruebel commented Feb 21, 2020

ajtritt commented Feb 21, 2020

oruebel commented Feb 21, 2020

bendichter commented Feb 23, 2020

bendichter commented May 4, 2020

codecov bot commented Feb 5, 2020 •

edited

Loading

ajtritt commented Feb 20, 2020 •

edited

Loading