-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add electrodes as optional argument to ecephys.Clustering #194
Comments
If you use the Unit table for unit metadata instead of Clustering, then you can just add a column for electrodes (actually an electrodeTableRegion). For instance, some of our Unit columns look like this: # External clustering software gives names for each cluster--we want to preserve these
nwbf.add_unit_column('cluster_name', '(str) cluster name from clustering software')
nwbf.add_unit_column('elec_group', '(electrodeGroup) nTrode on which spikes were recorded')
# For tetrode data, this will usually be all channels in the tetrode
nwbf.add_unit_column('neighborhood', '(electrodeTableRegion) list of electrodes on which spikes were clustered') Notably, the datatypes (str, electrodeTableRegion,...) are not enforced by the schema in this approach, but that seems like something that could be addressed separately. |
Same logic applies to #111, which requests new fields be added to Clustering. |
@tjd2002 The dev team has had a few conversations about what is the most efficient way to store spike data. It basically comes down to 2 options: index by unit (as is done in the |
The release notes suggest that at the time of the refactoring the intent was to deprecate Clustering, for a whole bunch of nice reasons. ( There would be a big penalty to pay in terms of both code complexity and user confusion if we have 2 redundant ways of storing a fundamental type of data like spike times. So I think the bar should be pretty high before we go this route.
Why can't you just append to each unit's list of
On read, we are always selecting on both units and time windows, with # of units typically << 1,000 and the number of spikes per unit also on the order of 1,000s, so it's not obvious that one approach would be faster than the other, and in general lookup times are not really a bottleneck for us. Since the argument seems to boil down to whether getting rid of Clustering would impose a large performance hit on the streaming use case, I have 2 main questions: 1) is this a real and important use case, and 2) are there ways to mitigate any performance issues without 'forking' the representation of spikes. One possible solution would be to rename Clustering to something like 'StreamingUnitSpikeTimes', or something else that makes it clear that it's only really there for a very specialized purpose. and then make it clear that in that case, you would still use a Units table to keep track of unit metadata, including electrodes, etc. (to come belatedly back to the topic of this issue :) ) |
If a data structure is needed for streaming spike times, we should design something for that. Regarding the original request in this issue: I think it makes more sense for neurodata_type to have a reference to Units table. Units is the structure for storing metadata about spiking units, including associations between spiking units and electrodes. I just submitted a PR with such a type specced out. |
I like this. Ideally the main API/query calls should (eventually) be agnostic as to which style is used ‘under the hood’. E.g. if you just called |
Clustering is deprecated. |
It would be great if
ecephys.Clustering
could support an optionalelectrodes
argument that would expect an electrode_table_region object.The text was updated successfully, but these errors were encountered: