-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROPOSAL] Interface for derived (kinematic) variables #162
Comments
Some further thoughts on this, based on today's meeting: Accessor object namesThe name Data variables in polar coordinatesWe speculated that polar coordinates systems may not be that often used in practice. Therefore it's probably fine to omit properties like from movement.utils.vector import cart2pol, magnitude
velocity = ds.move.velocity. # we still keep the cartesian versions in the accessor
velocity_pol = cart2pol(velocity)
speed = magnitude(velocity) As mentioned in alternative 3, we can also implement vector utils as custom methods of a DataArray accessor: velocity = ds.move.velocity
velocity_pol = velocity.move.cart2pol()
speed = velocity.move.magnitude() I currently tend to like this option. As a drawback, I had mentioned that people would need to know that "speed" is the magnitude of the velocity vector, but perhaps that's not a bad thing. It will make everyone aware of how we define and use these terms. I'm still somewhat undecided though... |
I think If we go for alternative 1 (i.e. storing everything) and the user decides to modify (e.g. apply confidence filter to) say from movement.filtering import filter_by_confidence
from movement.analysis import kinematics as kin
from movement.utils.vector import norm # not implemented
ds.move.velocity # this stores velocity in ds
ds = filter_by_confidence(ds, threshold=0.6, print_report=True) # this (atm) filters position only
# neither of the below recomputes velocity based on the filtered position
ds.move.velocity
ds.velocity
# need to explicitly recompute velocity
ds.velocity = kin.compute_velocity(ds.position) Conversely, if we only provide methods that do not store anything (i.e. no property accessors) across the package: ds.velocity = ds.move.compute_velocity() # preferred method
ds.velocity = kin.compute_velocity(ds.position) #alternative
ds = filter_by_confidence(ds, threshold=0.6, print_report=True)
ds.velocity = ds.move.compute_velocity() # recompute velocity
ds.speed = norm(ds.velocity) + we do not need to decide what to store |
Thanks for the response @lochhh! I kind of like your approach, though I need to sleep over it, and come back to it tomorrow. I like having some frequently used functions as accessor methods ( In this model, we could even have convenience methods in the accessor, like: def compute_speed(self):
velocity = self.compute_velocity()
return norm(velocity)
def compute_distance_traveled(self, from, to):
displacement = self.compute_displacement()
return displacement.sel(time=slice(from, to)).cumsum() This would not store velocity, or displacement, and only return what the user asked for. If we ever add custom accessor properties, they have to be about something that doesn't change with filtering/interpolation etc., to avoid the re-computing problem you described. I have two further (related) questions:
|
I like the convenience methods you suggested, and if we decide to go for this "store-nothing" model, more of these methods can be added as required. I am not sure we need a ds.position = ds.move.filter_by_confidence(threshold=0.9) # this would overwrite the existing position variable is somewhat confusing, as I would expect the entire ds = filter_by_confidence(ds, threshold=0.6) # apply filter on all data_vars, default for Dataset
ds = filter_by_confidence(ds, threshold=0.6, data_vars=["position", "velocity"]) # apply filter on selected data_vars
position = filter_by_confidence(ds.position, threshold=0.6) With the above, we could perhaps add this as a convenience method that takes ds = ds.move.filter_by_confidence(threshold=0.6)
ds = ds.move.filter_by_confidence(threshold=0.6, data_vars=["position", "velocity"]) If we can override |
I quite like this: ds = filter_by_confidence(ds, threshold=0.6, data_vars=["position", "velocity"]) # apply filter on selected data_vars
# and for convenience
ds = ds.move.filter_by_confidence(threshold=0.6, data_vars=["position", "velocity"]) The only downside is that, conceptually, I'd expect people to first filter position, and then derive velocity, instead of deriving velocity and then filtering both. But I'm fine with giving people options, as long as our examples reflect best practice. Relevant for your work @b-peri
I don't se why we shouldn't (but maybe there are downsides we'll bump into). |
It sounds to me that we have an agreement @lochhh. I'll keep the discussion open till the end of this week, and then I'd say we can move on with the implementation. Would you be willing to take up the conversion of the existing properties to this? from movement.utils.vector import cart2pol
velocity = ds.move.compute_velocity()
velocity_pol = cart2pol(velocity) |
Yep, happy to do this once we run the idea by everyone in this week's behav meeting. |
Summary 2024-04-30This is my attempt at summarising the above discussion, with an emphasis on points that me and @lochhh are agreed on.
|
This issue is meant for discussing the way we compute, access and store variables derived from pose tracks in
movement
.Apologies for the very long read, but this is an important design choice and warrants some deliberation. The conversation started during our last meeting with @lochhh and @sfmig. This is my attempt to write it up.
Context: movement's dataset structure
Predicted poses (pose tracks) are represented in
movement
as anxarray.Dataset
object - hereafter referred to as a movement dataset (ds
in the example code snippets below).Right after loading, each movement dataset contains two data variables stored as
xarray.DataArray
objects:position
: with shape (time
,individuals
,keypoints
,space
)confidence
: with shape (time
,individuals
,keypoints
)You can think of each data variable as a multi-dimensional
pandas.DataFrame
or as anumpy.array
with labeled axes. Inxarray
terms, each axis (.e.g.time
) is called a dimension (dim
), while the lableled 'ticks' along each axis are called coordinates (coords
).Grouping data variables together in a dataset makes sense when they share some common
dims
. In the movement dataset the two variable share 3 out of 4dims
(see image above).Other related data that do not constitute arrays but instead take the form of key-value pairs can be stored as attributes - i.e. inside the
attrs
dictionary.All data variables and attributes can be conveniently accessed via the usual
.
attribute syntax, e.g.:Problem formulation
The
position
andconfidence
data variables (+ some attributes) are created automatically after loading predicted poses from one of our supported pose estimation frameworks.The question is what to do with variables that
movement
derives from these 'primary' variables. For purposes of illustration we will consider three example variables:velocity
: which is anxarray.DataArray
object with the samedims
andcoords
asposition
.velocity_pol
: velocity in polar coordinates. As of PR #155, this is a transformation of the above variable from cartesian to polar coordinates. It's also anxarray.DataArray
, but itsspace
dimension is replaced byspace_pol
, withrho
(magnitude) andphi
(angle) as coordinates.speed
: this is the magnitude (euclidean norm) ofvelocity
and is therefore equivalent to therho
invelocity_pol
. This could be a represented as anxarray.DataArray
that lacks a spatial dimension alltogether (similar to theconfidence
variable)Alternatives
Each of the above derived
xarray.DataArray
objects could be requested and stored in a variety of ways. Below, I'll go through some alternatives, and attempt to supply pros/cons for each:1. Status quo: derived variables as accessor properties
The status quo relies on extending xarray using accessors. In short, accessors are xarray's way of adding domain-specific funcitonality to its
xarray.DataArray
andxarray.Dataset
objects. This is strongly preferred over the standard way of extending objects (inheritance).Accordingly, we have implemented a
MoveAccessor
, which extendsxarray.Dataset
and is accessed via the keyword "move". For example:Currently, derived variables can be computed via the accessor -
ds.move.velocity
. Under the hood, when we access the property for the first time,velocity
is computed and stored as a data variable within the original dataset, alongsideposition
andconfidence
. Once computed, it can be accessed in the same way as the 'primary' variables - i.e.ds.velocity
ords["velocity"]
.All currently implemented kinematic variables -
displacement
,velocity
, andacceleration
- behave in this way. Through PR #155, so do their polar transformations.Pros
velocity
again, the stored variable will be returned.Cons
.move
syntax strange and may not expect the automatic storage of variables.velocity
,velocity_pol
andspeed
, we will be storing the same data in many different ways.velocity_pol
is just a simple transform ofvelocity
, so it may not be worth storing within the dataset. The case is even more extreme forspeed
, if we store bothvelocity_pol
andspeed
, we would be keeping the exact same array of numbers twice. Moreover, callingds.move.speed
would result in calling bothds.move.velocity
andds.move.velocity_polar
under the hood, and users may be surprised by all the extra data variables they suddenly end up with.2. Getting derived variables via accessor methods
This alternative still relies on the
MoveAccessor
, but gets to the derived variables via custom methods, instead of custom properties.For example:
Each of these methods would return a separate
xarray.DataArray
object which would NOT be automatically stored in the original dataset.If the user wishes to store these in the dataset, they could do so explicitly:
Pros
Cons
.move
syntax.ds.move.compute_speed()
would re-computevelocity_polar
to get its magnitude, even if the user had previously computedvelocity_polar
(but hadn't stored it in the same dataset).3. A mix of accessor properties and methods
From the above, it seems like using accessor properties duplicates data, while using accessor methods duplicates computation. Maybe it's possible to strike a balance between the two:
velocity
,acceleration
) would be good candidates for this.This mixed approach could look something like this:
This variant would require us to provide an extra accessor to extend
xarray.DataArray
objects and specifically operate on data variables that contain an appropriate spatial dimension (this is where thecart2pol
andmagnitude
methods would be implemented).Pros
Cons
.move
syntax.xarray.DataArray
objects, potentially leading to further confusion.4. Use both accessor properties and methods
Another approach could be to always supply both alternatives 1 and 2 for every variable, so the user could choose between them:
Pros
Cons
.move
syntax.5. Forget about accessors
We can always abandon the accessor way of doing things, and (given that inheritance and composition are discouraged for
xarray
objects) forget about object-oriented programming (OOP) altogether.We could instead rely on analysis and utility functions that take one
xarray.DataArray
, apply some operation to it, and return anotherxarray.DataArray
, e.g.:The above is already possible by the way (apart form the
magnitude()
function, which could be easily added).Pros
Cons
My personal take
After considering these alternatives, I lean towards sticking with the status quo (alternative 1) - i.e. every derived variable is an accessor property, and they all get stored as data variables in the dataset, duplication be damned.
This means that users will have to get used to the slightly strange
.move
syntax and behaviour, but at least these will be consistent throughout and there will be one main syntax to learn.Power users who wish to override the default 'magic' behaviour can do so by using alternative 5, which already works anyway (and is what actually happens under the hood).
That said, I'm open to counter-arguments, and there may well be alternatives I haven't considered, so please chime in @neuroinformatics-unit/behaviour @b-peri !
The text was updated successfully, but these errors were encountered: