-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review load_data
#107
Review load_data
#107
Conversation
de3c17c
to
3a2b94b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting work, Joe!
I left some comments on parts of the code that are not clear to me. Open to discuss them when you have time!
) | ||
|
||
sub_name = "sub-1119617" | ||
sessions_and_runs = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beyond this example, will the user need to scrape these names from the filesystem?
|
||
return list(zip(ordered_ses_names, ordered_run_names)) | ||
|
||
def _validate_inputs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this set of assert
statements, seems like a clean way to hold all the validations together.
Do you think you will define separate methods to get the folders? For instance will the get_sub_path
change depending on the implementation of load_data()
?
In case yes it makes sense to inject the Callable
s
This PR is to review the
load_data
code from spikewrap.The PR system is a little strange but I think should (hopefully) work well. The aim of this system is avoid having to review the entire codebase at once, or review since-reverted commits without changing the
dev
/main
brain git history.reviewed_code
was created by branchingdev
and removing all commits.rev_load_data
was created by branchingdev
and removing all of the code unrelated to loading data. Now mergingrev_load_data > reviewed code
should isolate these commits of interest. For new features, I will just add them fromdev
to here. Once the main bulk of the code is reviewed we can revert to reviewing in the normal way. This is a very weird system :'D but it should work out ok! and allow tailored commits to be reviewed.This PR includes all the general package-related files (e.g.
pyproject.toml
) as well as:examples/load_data.py
: an example of how to use the code to load data intospikewrap
.pipeline/load_data.py
: wrapper code to load the raw data into a class data structure. Under the hood the data is loaded with spikeinterface. Note that all data objects in spikeinterface are 'lazy' and do not actually contain the data, so these objects can be easily stored in classes.data_classes/base.py
anddata_classes/preprocessing.py
: These data classes hold the raw data objects. At the moment, they don't do anything with them, but become more important when preprocessing is added.Some notes:
1)
The dataclass inheritence reduces boilerplate but is quite confusing, I am considering reverting back to normal classes just for readability. The way it works is the class attributes are placed at the start of the class. For sub-classing, only the class attributes of the subclass are placed at the start of the class. Then, when classing the class you must input all required arguments (first the superclasses, then the subclasses).
Because this only works for attributes, any logic must be put in a
__post_init__
function.The idea is that the user will have data stored in SWCBlueprint style. This consists of subject, sessions, ephys level. Then, typically within each sessions there are different runs. The user must indicate the sessions and runs to load in the
sessions_and_runs
dictionary (e.g.examples/load_data.py
). Then, the filesystem is traversed based on these names to load the specified sessions and runs into thepreprocessingData
class. This is a userdict (basically a dictionary with additional functions) in which the loaded data objects are stored in this dictionary. Later (future PR) these objects are operated on during preprocessing.Currently the test data is at:
"/ceph/neuroinformatics/neuroinformatics/scratch/jziminski/ephys/test_data/steve_multi_run/1119617/time-short-multises"
Please let me know if you have any trouble accessing it! Currently I am running off a mounted drive but let me know if you have trouble on macos with the path and I can update the usage instructions for the cluster (sorry these are currently unfinished).