Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add bad times object #224

Closed
bendichter opened this issue Nov 14, 2018 · 12 comments · Fixed by NeurodataWithoutBorders/pynwb#731
Closed

add bad times object #224

bendichter opened this issue Nov 14, 2018 · 12 comments · Fixed by NeurodataWithoutBorders/pynwb#731
Assignees
Labels
category: enhancement improvements of code or code behavior

Comments

@bendichter
Copy link
Contributor

I have previously discussed this with @ajtritt but wanted to get group feedback before implementing.

Several groups I am working with require a standardized way to store time intervals that should be excluded from analysis due to artifacts in the signal. This is commonplace in ECoG, and I believe could be useful in nearly every domain. Andrew and I settled on storing this as a TimeIntervals object inside /intervals. I would like the name of this to be standardized. What should it be? 'bad_times', 'artifacts', or 'exclude_intervals' would be options I am OK with. This data will be set to optional, so the change will be non-schema-breaking.

@bendichter bendichter added the category: enhancement improvements of code or code behavior label Nov 14, 2018
@bendichter bendichter changed the title ass bad times object add bad times object Nov 14, 2018
@bendichter
Copy link
Contributor Author

lol I meant "add" not "ass". What an unfortunate typo. Sorry if I sullied anyone's email inbox!

@oruebel
Copy link
Contributor

oruebel commented Nov 14, 2018

How about calling it invalid_times ?

Is there any additional metadata you want to associate with these times, e.g., descriptions of why these intervals are invalid?

@bendichter
Copy link
Contributor Author

@oruebel yeah good point. This should include a required description field. IntervalSeries does not have a description field, so we should subclass it and add it. We could call the class InvalidTimes with the group name pegged to invalid_times.

@tjd2002
Copy link
Contributor

tjd2002 commented Nov 14, 2018

I am not convinced we want a global 'bad_times' entry (or whatever name) across modalities since this 'bakes in' a particular decision about how to use that information that is better made by the analyst.

Instead, I think we should be more descriptive. Create top-level interval series objects called 'ecog_artifacts', or 'camera_occluded', or 'patient_sneezed', or whatever. These names could be standardized for a particular application.

Consider the case where some part of your data is bad (e.g. there's an electrode artifact), but you still want to analyze the subject's behavior, as measured by keypresses or video record. In this case, it's not really useful to call it 'Bad_times'

@bendichter
Copy link
Contributor Author

@tjd2002 I like the idea of allowing more specificity, but I think standardizing across that many different types of exclusion criteria the way you describe would be difficult. What if we had a table with the following fields:

  • start_time
  • stop_time
  • comments
  • timeseries (optional)

Every bad time interval row is associated with a required comment that explains why it should be removed. There is also an option to link explicitly to a TimeSeries object. I think this would strike a balance so that the unpredictable things that can go wrong are documentable and the TimeSeries reference would make this somewhat machine-readable.

@tjd2002
Copy link
Contributor

tjd2002 commented Nov 14, 2018

@bendichter Even if the names aren't standardized, perhaps we could still allow for multiple InvalidTimes objects per file. Then users could name and describe these as needed. An analyst could extract all the objects of type InvalidTimes and 'or' them together to get a list of all the times to exclude.

I don't really like the idea of implementing this as a single list with a per-row free-form comment field. Writing and parsing comment fields seems awkward, and generally I think we anticipate that there would only be a small number of categories of artifact.

I suppose I'm proposing a top-level MultiContainerInterface called invalid_times, which contains 0 or more objects of type InvalidTimes.

(Lastly, by 'timeseries', are you proposing a reference to the timeseries that the list of invalid times applies to? If so, I think this gets tricky very fast, and we should avoid it for now).

@oruebel
Copy link
Contributor

oruebel commented Nov 14, 2018

Even if the names aren't standardized, perhaps we could still allow for multiple InvalidTimes objects per file

I think the idea would be to have a default name of invalid_times but to allow multiple instance of this.

I suppose I'm proposing a top-level MultiContainerInterface called invalid_times, which contains 0 or more objects of type InvalidTimes.

I think the idea here is to have this as just another TimeIntervals dynamic table as part of the /intervals group which currently stores , e.g., epochs and trials.

I don't really like the idea of implementing this as a single list with a per-row free-form comment field.

I agree that we generally want to avoid free-form as much as possible. However, here the proposal is to just have comment field for which free-form text makes sense. Adding a more specific category column to classify these intervals but for now I would leave that up to the user to add. Since this is a dynamic table this should be easy for users to do. The problem here is that we simply don't know the reasons for why things are invalid so if there are categories for this then a user would need to supply them. Unfortunately we currently don't really have a good way of dealing with custom controlled vocabularies for terms. This is something that is on our roadmap.

Lastly, by 'timeseries', are you proposing a reference to the timeseries that the list of invalid times applies to?

I assume that is the case, but @bendichter can you clarify? For now, my feeling is that it's probably be best to leave this out of the table (but I'm happy to be convinced otherwise).

@tjd2002
Copy link
Contributor

tjd2002 commented Nov 14, 2018

I think the idea would be to have a default name of invalid_times but to allow multiple instance of this.

Great.

I think the idea here is to have this as just another TimeIntervals dynamic table as part of the /intervals group which currently stores , e.g., epochs and trials.

I see @oruebel, sounds good, I'll retract my multi-container suggestion then.

I agree that we generally want to avoid free-form as much as possible. However, here the proposal is to just have comment field for which free-form text makes sense. Adding a more specific category column to classify these intervals but for now

OK. Are TimeIntervals required to be non-overlapping? If so, it would not always be possible to represent multiple types of artifacts in one list (e.g. if an 'ecog artifact' happens during a 'camera occlusion' event.)

In my experience it's more useful for analysts if there's a separate table for each type of artifact, but it sounds like @bendichter sees a use case for a global, top-level 'these times are bad' list. As long as it's possible to have multiple instances, then this can be something that just gets worked out by convention.

@oruebel
Copy link
Contributor

oruebel commented Nov 14, 2018

Are TimeIntervals required to be non-overlapping?

I don't think that is something we would enforce here. Generally speaking, I think with what we have discussed so far it would be up to the user to decide whether they want to put all invalid times into one table or have multiple such tables depending on the type. To be more specific, there are to basic ways to deal with categories. Either: 1) you have a column in the table that explicitly describes the category or 2) you have a category for each table that then applies to all elements in that table. The only problem with 2) is that currently this would be implicit in the name of the table, i.e., unless you define an extension or we add an attribute for a category that applies to the table as a whole.

@bendichter
Copy link
Contributor Author

@tjd2002, @oruebel

I mean allowing for a column that for each row indicates what TimeSeries or list of TimeSeries objects this artifact is referring to. The use-case that motivated this feature is storing times where the ECoG signal should be excluded due to artifact. In this case the rows that mark ECoG artifacts would include a reference to the ElectricalSeries object. I could see this also being useful for indicating, for instance, a time interval when a position tracking cut out, which would have a reference to the appropriate BehavioralSeries object(s). Analysis libraries could automatically apply the appropriate bad intervals when a specific data source is used. For instance, a bad_interval row that is linked to a BehavioralSeries might be ignored when computing a periodogram of an ECoG electrode. The infrastructure for references to TimeSeries objects, is there, so implementation isn't an issue. Can you explain why you think this would get tricky?

It could be possible to use multiple tables for this. Could you give me an example of where this would be needed? I want to somehow make it machine-readable which time intervals are to be interpreted as artifactual/bad/excluded. If we have a single table we can do that by the name of the table. If we have multiple tables in the same group they will have different names, but we can still mark them as being in this category by putting them in a specific MultiContainerInterface or by making them a sub-type of TimeIntervals, like InvalidTimes.

@oruebel
Copy link
Contributor

oruebel commented Nov 15, 2018

Looking at the TimeIntervals as it is defined right now, I think it already does everything we discussed here (including the referencing of the timeseries). In addition, it uses tags with user-defined tag categories, so that addresses also (at least in part) the question of free-form comments (i.e, I think we could just use the tags rather than adding a comments field). So it sounds like really what this proposal comes down to is to add an optional TimeIntervals table to /intervals with the name invalid_times. I think that is reasonable.

An alternative approach would also be to simply have a default tag invalid for epochs but my guess is that the metadata that users would want to store for epochs and invalid times is different so having a separate invalid_times table makes sense.

It could be possible to use multiple tables for this. Could you give me an example of where this would be needed?

I think the use case here comes down to the question of which information you want to encode as part of columns of the table vs. general attributes of the table. E.g., instead of having one table for everything your could have a table with invalid times for your EcOG recordings and another table with the invalid times for your eye tracking.

The use-case that motivated this feature is storing times where the ECoG signal should be excluded due to artifact. In this case the rows that mark ECoG artifacts would include a reference to the ElectricalSeries object.

Again, TimeIntervals actually already does this so it's just doing the same we do for epochs already. That being said, the look-up you describe here seems to point the other way around,
i.e., rather than look-up for a particular interval the timeseries it applies to, you seem to want to look up for a TimeSeries all the intervals that apply to it. With the current design, this would come down to a linear search in the table. This sounds like a useful feature for the API (if we don't have it already).

@bendichter
Copy link
Contributor Author

OK sounds good. Let's add /intervals/invalid_times as an optional table and continue to allow other tables as @tjd2002 suggests. I still think this table requires a comments field but I'd be OK with leaving the naming of that field to the user since that is not going to be machine-readable anyway.

You are right that the lookup is inverse and it would be great to have a convenience function for it! That would be a good first issue for someone...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants