Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add timeseries time query utils #1175

Closed
wants to merge 3 commits into from
Closed

add timeseries time query utils #1175

wants to merge 3 commits into from

Conversation

bendichter
Copy link
Contributor

Motivation

Efficient methods for getting time information for TimeSeries objects. This needs tests, but I wanted to check in and see what you all think.

@bendichter bendichter requested review from ajtritt, rly and oruebel February 5, 2020 23:09
@codecov
Copy link

codecov bot commented Feb 5, 2020

Codecov Report

Merging #1175 into dev will increase coverage by 0.12%.
The diff coverage is 95.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev    #1175      +/-   ##
==========================================
+ Coverage   67.44%   67.57%   +0.12%     
==========================================
  Files          37       37              
  Lines        2310     2319       +9     
  Branches      406      410       +4     
==========================================
+ Hits         1558     1567       +9     
  Misses        683      683              
  Partials       69       69
Impacted Files Coverage Δ
src/pynwb/icephys.py 88.99% <95.45%> (+0.99%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e27b5d...359f8af. Read the comment docs.

@bendichter
Copy link
Contributor Author

@oruebel @ajtritt @rly I think this would be useful for quite a few projects. Should I move forward with writing tests?

@ajtritt
Copy link
Member

ajtritt commented Feb 20, 2020

@bendichter It would be nice if you could use these somehow. https://hdmf.readthedocs.io/en/latest/hdmf.array.html

One option would be to construct these in the ObjectMapper, and always pass a SortedArray or LinSpace for the timestamps argument of the constructor.

@bendichter
Copy link
Contributor Author

@ajtritt oh cool, I didn't know about those! It would definitely be nice to use those for timestamps in pynwb, since they should always be sorted. That should help with the methods you've implemented, <,>,<=,>=,==, and !=, and find_point (aka bisect I'm guessing). I don't see how they would help with any of the methods I have added here though.

As a side note, since you are already doing these, it would be nice to also have UniqueArray.

@ajtritt
Copy link
Member

ajtritt commented Feb 20, 2020

@bendichter

I don't see how they would help with any of the methods I have added here though.

LinSpace would need to be modified to have an appropriate __getitem__, but then get_mint, get_maxt, and get_timestamps would just be regular slices on TimeSeries.timestamps instead of special methods.

As a side note, since you are already doing these, it would be nice to also have UniqueArray.

What do you mean by unique?

@bendichter
Copy link
Contributor Author

@ajtritt ah, I see. Using time_series.timestamps[0] and time_series.timestamps[-1] for regularly and irregularly sampled TimeSeries objects would be nice syntax. On the other hand, I generally prefer those types of things to be explicit. .timestamps is currently an attribute call. It returning non-None implies that that attribute was there, and might mislead a user into thinking that the TimeSeries is likely irregularly sampled. I generally prefer get so that it is clear to the user that you might be doing something under the hood to get the output. But I guess that's a stylistic difference. In your approach, how would you recommend a user determine if a TimeSeries stores a timestamps attribute (signaling that it is likely irregularly sampled)?

Is LinSpace 1D or can it be n-d?

@ajtritt
Copy link
Member

ajtritt commented Feb 20, 2020

On the other hand, I generally prefer those types of things to be explicit. .timestamps is currently an attribute call. It returning non-None implies that that attribute was there, and might mislead a user into thinking that the TimeSeries is likely irregularly sampled.

I appreciate the need for being explicit, but I think making the API clean and easy to use is also important. It's a disservice to users to just pass forward the complexity of the NWB standard (this is actually something drove away early users). PyNWB is not just meant to be a fancy layer on top of h5py--it is supposed to make it easier to work with data. At the end of the day, users are storing data so they can use it to do something--not just know how it was stored. If the user can always count on time information to be in the same place i.e. .timestamps, regardless of how it is stored, it makes it easier for them to work with their data.

If they need to know if data was regularly sampled or stored with starting_time/rate, then they can just look at those values. If timeseries.starting_time is None evaluates to True, then that means it wasn't stored, and they satisfy one-more condition for determining if the data was sampled irregularly. Alternatively, if you use SortedArray/LinSpace, then isinstance(timeseries.timestamps, LinSpace) means data was regularly sampled.

@bendichter
Copy link
Contributor Author

@ajtritt I hear that you want to make this easy to use. I do too. I am pulling these functions from a pool of convenience functions I have made to deal with pynwb objects. @lvsltz told me during the hackathon that he found them helpful as well and suggested that other users might also want to access them.

I think in these types of situations if we know exactly how the user is going to use this, it can be easier to use, but if they use it differently than we expect, it could be more confusing for them. Overloading __getitem__ is a perfect example of this. If we implement it with standard array indexing, we are basically implying to users that this object should be treated like a numpy array.

On the face of it, it seems pretty easy, but actually __getitem__ for numpy arrays is pretty complex. What if a user wants to skip values, or increment backwards, or index timestamps with a boolean array? Those would all be straightforward slice calls in numpy, but are we covering all of those slice use-cases here? What if they are trying to iterate over timestamps? Now we need a next method. I think this will end up being more work than a few convenience methods. It's all doable though, if that's the direction we want to go.

@ajtritt
Copy link
Member

ajtritt commented Feb 21, 2020

I think this will end up being more work than a few convenience methods. It's all doable though, if that's the direction we want to go.

That is the direction I would like to go. I much prefer it to a catalog of instance methods that only do one thing.

@oruebel @rly thoughts?

@oruebel
Copy link
Contributor

oruebel commented Feb 21, 2020

It's all doable though, if that's the direction we want to go.

I think ultimately that is the direction we should aim towards. I think the catalog of helper functions is useful as a means to figure out what the API should support and it is useful for specialized use-cases that may be out of scope, but for common things (like slicing into data), I think we want to have it under the main API.

@ajtritt
Copy link
Member

ajtritt commented Feb 21, 2020

I think the catalog of helper functions is useful as a means to figure out what the API should support and it is useful for specialized use-cases that may be out of scope

I don't want to go down this road unless we have very good reason to. Every function we add is something we can never get rid of and requires maintenance. I think we need to be more deliberate about what we add and how we add it. Otherwise, this thing is just going to continue to balloon into a maintenance nightmare with so many options for users that they don't know where to start.

I think we need to reprioritize simplicity and elegance, rather than blindly adding things for the sake of convenience.

@oruebel
Copy link
Contributor

oruebel commented Feb 21, 2020

I don't want to go down this road unless we have very good reason to

Just to clarify, I didn't mean to imply a catalog of helper functions inside PyNWB, I was speaking more broadly to users with collections of helper functions. E.g, the functions that Ben is proposing are not part of NWB but come from a collection of functions he wrote outside of PyNWB.

@bendichter
Copy link
Contributor Author

OK, I'll continue to maintain my own helper functions as I need them and transition to pynwb functionality as that becomes available.

@bendichter
Copy link
Contributor Author

@ajtritt any progress on this front? It would be really great to have this tool as a response to the concern raised in AllenInstitute/AllenSDK#1421 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants