-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support video and frames files associated with sample data #171
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #171 +/- ##
=======================================
Coverage 99.66% 99.67%
=======================================
Files 10 10
Lines 605 619 +14
=======================================
+ Hits 603 617 +14
Misses 2 2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @niksirbi! The new sample_data functions are great and make a lot of sense. Most of my comments are related to restructuring the metadata.yml to have the "pose" file as the key, which will simplify a few functions. I'll let you decide on this.
@@ -74,7 +74,7 @@ def test_filter_by_confidence(sample_dataset, caplog): | |||
).values[:, 0] | |||
) | |||
) | |||
assert n_nans == 3213 | |||
assert n_nans == 2555 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asking out of curiosity, how has this changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I actually changed the underlying dataset! Before we had "DLC_single-mouse_EPM" and "SLEAP_single-mouse_EPM", but they did not actually correspond to the same video. I updated the DLC one so it matches the SLEAP one (enabling us to use the same video and frame.
Thanks a lot @lochhh! I like your suggestion to use the pose file names as keys, it indeed simplifies things a lot. I will give it a try. |
Co-authored-by: Chang Huan Lo <changhuan.lo@ucl.ac.uk>
Quality Gate passedIssues Measures |
I've implemented this and it works! I've added this as an "EDIT" to the PR's description. |
Description
What is this PR
Why is this PR needed?
Having video files, and/or frames extracted from those videos, associated with existing sample pose files will greatly facilitate the development and debugging of GUIs, because it would allow us to plot trajectories over a meaningful background, define ROIs etc. See all the issues linked in References.
What does this PR do?
It overhauls the
sample_data.py
module to allow for the fetching of videos and/or frames alongside the fetching of pose files.All the changes were done in conjunction with changes to the data repository on GIN and should be interpreted together.
Changes to the data repository:
metadata.yaml
. I added new metadata fields, to express the association between pose datasets and videos/frames. Here's an example entry:get_sha256_hashes.py
which will iterate over all files inposes
,videos
, andframes
and write the results to txt files (poses_hashes.txt
,videos_hashes.txt
,frames_hashes.txt
). It doesn't go all the way to fully automate the generation of metadata.yaml entries, but it is an improvement on previous practices.Changes to the code repository (this PR):
sample_data.py
module now exposes 3 public functions:list_datasets()
: returns the filenames of the sample pose files (the one in theposes
folder)fetch_dataset_paths(filename)
: given a filename of a valid pose dataset (one that the above function returns), return a dict of 3 local paths, with keys "poses", "video", "frame". If video or frame is missing, their value is None.fetch_dataset(filename
) : given a filename of a valid pose dataset (one thatlist_datasets()
returns), callsfetch_dataset_paths(filename)
and proceed to load the "poses" into a movement dataset. The "video" and "frame" paths do not get loaded (for now), they are simply stored as dataset attributes.References
Closes #38.
Closes #121 because the syntax is much less awkward now (with fewer redundancies), and I think there is no longer a clear need for rewriting the
sample_data.py
module into a class.Facilitates #105, #49, #50, #48, #164.
How has this PR been tested?
Updated existing tests in
test_sample_data.py
.Is this a breaking change?
Yes, the API for fetching sample datasets has changed. This PR need to be merged ahead of any others, because the changes to the GIN data repository have broken CI, and it will remain broken until this is merged.
Does this PR require an update to the documentation?
Yes, I've updated the relevant sections of the docs.
Checklist:
EDIT 2024-05-07
Following @lochhh 's suggestion, the
metadata.yaml
has been reformatted as a dict of dicts, using the pose file names as top-level dict keys:This simplifies the logic inside
sample_data.py
quite a bit!