Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor unit/cluster data #674

Closed
10 tasks done
ajtritt opened this issue Oct 15, 2018 · 6 comments
Closed
10 tasks done

Refactor unit/cluster data #674

ajtritt opened this issue Oct 15, 2018 · 6 comments
Assignees
Milestone

Comments

@ajtritt
Copy link
Member

ajtritt commented Oct 15, 2018

Problem/Use Case

Spiking unit data/metadata is spread out across multiple locations/objects (UnitTimes, NWBFile.units, Clustering, ClusterWaveforms). These specifications should be combined to make this feature simpler and easier to use.

  • Merge UnitTimes into NWBFile.units
    • Make UnitTimes a DynamicTable
      • Add NWBFile.units predefined columns to UnitTimes
    • Add optional columns to UnitTimes
      • waveform_mean
      • waveform_sd
      • electrode_id
      • electrode_group
    • Make NWBFile.units an instance of UnitTimes
  • Deprecate ClusterWaveforms

Checklist

  • Have you ensured the feature or change was not already reported ?
  • Have you included a brief and descriptive title?
  • Have you included a clear description of the problem you are trying to solve?
  • Have you included a minimal code snippet that reproduces the issue you are encountering?
@ajtritt ajtritt self-assigned this Oct 15, 2018
@ajtritt ajtritt added this to the NWB 2.0 Full milestone Oct 15, 2018
@ajtritt
Copy link
Member Author

ajtritt commented Oct 15, 2018

@tjd2002 @bendichter @NileGraddis please provide any feedback

@bendichter
Copy link
Contributor

I like the idea of incorporating UnitTimes into the units table, especially now that you've made it possible to put region references in dynamic tables. This would solve some usability issues and bring everything together in a more cohesive framework. Though it would be redundant to keep "Unit" in the name if it's in the units table, so I would suggest calling the column "SpikeTimes" rather than "UnitTimes."

It would be nice to remove the redundancy of supporting both Clustering and UnitTimes, since they hold the same info, but I have a few concerns about deprecating Clustering and ClusterWaveforms

  1. Clustering and ClusterWaveform are easier to stream to, because they progress in time. Clustering is more similar in form to other data standards for spike times, e.g. NeuroScope, where spikes can be written in as they are recorded. UnitTimes is better if you are writing data by cell but that is not usually the case unless you are writing simulation output. I've heard one possible solution for this: explicitly allowing multiple regions in the spike_times dataset for a single unit. This would allow us to iterate through every unit at every buffer cycle and write the spikes of each unit for that cycle, and then sort them at the end so that all regions of a specific unit are contiguous. I think this makes streaming technically possible, but painful in comparison to just using Clustering. We'd have num_cycles * num_units regions that we would need to sort, whereas it seems like streaming to Clustering would probably be very straightforward. A similar argument can be made for ClusterWaveforms.

  2. The form of Clustering caters better to time window queries than UnitTimes does, so it may be better for some analyses

  3. With Clustering it was easy to indicate which shank a unit was from by providing multiple Clustering objects, one for each shank. There appears to be no standard way to do that right now in the units table. One possible solution to this is to have a default column "electrodes_group" and have each group correspond to a shank. This would need to be an optional column or be allowed to be some pre-specified default value because simulation output will not have electrodes nor electrode_groups in the units table.

@ajtritt
Copy link
Member Author

ajtritt commented Oct 15, 2018

@bendichter

With Clustering it was easy to indicate which shank a unit was from by providing multiple Clustering objects, one for each shank.

This is one of the unmentioned motivations of making UnitTimes a DynamicTable. UnitTimes would contain optional columns for specifying which group or electrode a unit came from.

Given that there Clustering has some query/streaming capabilities that UnitTimes does not, it makes sense to keep it around.

With ClusterWaveform in UnitTimes, do you still see a reason to keep it around?

@bendichter
Copy link
Contributor

ClusterWaveform is fine to move to the units table since it's just mean and std of the waveform.

@bendichter
Copy link
Contributor

see https://github.com/NeurodataWithoutBorders/pynwb/issues/675 for a proposal for dealing with Clustering

@tjd2002
Copy link
Contributor

tjd2002 commented Oct 16, 2018

Though it would be redundant to keep "Unit" in the name if it's in the units table, so I would suggest calling the column "SpikeTimes" rather than "UnitTimes."

I think the reason we opted not to call the Unit table the 'Spike table' in the first place was to account for use cases other than spikes. E.g. time of detected calcium transients. So I'd vote either for 'UnitTimes', 'timestamps', or 'Times'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants