-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ticket/ecephys/33a/metadata/writer #2407
Conversation
I'm attaching an input_json that can be run through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall notes:
- This design does not make use of the existing design for the
BehaviorProjectMetadataWriter
. These are essentially 2 different strategies for metadata writing. Is it possible to make this design share more in common with that design? If not, why not? - There may be potentially duplicated/slightly modified logic around ecephys LIMS queries. There already exists
EcephysProjectLimsApi
which I think does most of what we need. Are you able to reuse this class/share code in any way? - Some of the code in the
vbn_2022
seems borderline generally applicable. But if it is in a package calledvbn_2022
it will not be reusable in the future. Please only place code within this package that is truly specific to this release. allensdk.brain_observatory.ecephys.ecephys_project_api
seems like an already existing suitable home for project-level code relating to ecephys projects- Please indent SQL queries when the statement runs to the next line. I.e instead of
SELECT
foo,
bar
JOIN bar
ON bar.bar = foo.bar
WHERE
foo = 3
it is much easier to read as
SELECT
foo,
bar
JOIN bar ON bar.bar = foo.bar
WHERE
foo = 3
On a related note, I find it easier to read queries if the line doesn't start with a comma.
...k/brain_observatory/behavior/behavior_project_cache/tables/util/prior_exposure_processing.py
Outdated
Show resolved
Hide resolved
allensdk/brain_observatory/behavior/data_files/stimulus_file.py
Outdated
Show resolved
Hide resolved
allensdk/brain_observatory/vbn_2022/metadata_writer/lims_queries.py
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,758 @@ | |||
from typing import List, Tuple, Dict, Any, Optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like some of these functions should be private. Unable to tell what functions are supposed to be exposed to outside consumers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
FileIDGenerator) | ||
|
||
|
||
def add_file_paths_to_session_table( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The VBO strategy relied on the files being released being well known files that were tracked in LIMS. Given that we are not making NWB generation a LIMS strategy, the files we are releasing are not going to be well known files tracked in LIMS, so we are going to need a different way to list the files for release. We can discuss this if you want to revisit that decision, but the only way to reuse the VBO strategy is to make sure the VBN NWB files are tracked in the well_known_files
table on LIMS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Ideally we would have uniformity for these data releases. I know that the old strategy of using LIMS to generate a well known file wasn't great. If we prefer the new strategy, I'd like to update the old strategy to use the new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to allensdk/data_releases/metadata_writing
# the VBN 2022 metadata dataframes as they are directly queried | ||
# from LIMS. | ||
|
||
import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure why this module doesn't make use of prior_exposure_processing.py
. Is it possible to share code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prior_exposure_processing.py
code used in VBO is all very VBO-specified, for instance
which, based on the docstring, sounds like a VBO-specific bug or
which assumes that these are ophys sessions.
Talking to Corbett, his attitude towards omissions was "there were only omissions in the ecephys sessions, so really, prior exposures to omissions is just a question of 'has the mouse seen an ecephys session, yet" which is different than what we did for VBO, but also very VBN specific (which is why I placed it here, where it is unlikely to get reused for future data releases).
I do reuse prior_exposure_processing.py
where appropriate in vbn_2022/metadata_writer/lims_queries.py
. Prior exposures to omissions was a weird edge case for this release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The VBO code does something like (ignoring the habituation session bug)
Select all ophys sessions
calculate prior exposure to omissions for these sessions using __get_prior_exposure_count
merge these values into the other table
Isn't this essentially the same algorithm? The only difference being select all ecephys sessions. The function I think should be used is __get_prior_exposure_count
(making it public and moving it if needed). I'm unsure why you have such complicated logic to fill in the behavior session values. Wouldn't a merge from one table to the other work, similar to how we did it for VBO?
allensdk/test/brain_observatory/vbn_2020/metadata_writer/conftest.py
Outdated
Show resolved
Hide resolved
allensdk/brain_observatory/vbn_2022/metadata_writer/lims_queries.py
Outdated
Show resolved
Hide resolved
allensdk/brain_observatory/vbn_2022/metadata_writer/lims_queries.py
Outdated
Show resolved
Hide resolved
I haven't really thought through what this means with respect to the VCN LIMS API, but I am summarizing a Teams conversation I just had with Corbett so that we know why we did what we ended up doing: Scott:
After a brief back-and-forth, the decision was: Release all the data (even I'm not sure if this means we can/should reuse the existing |
@aamster Given the above comment about the My reasoning for this is that, once this ticket is merged, we can start working on the issue Corbett opened regarding session_type, which unblocks us for release. We are then free to take as long as we need to updating the To be explicit, the new ticket we are opening would involve
|
@danielsf I'm fine with pushing the reuse of |
3c9a07a
to
95c6446
Compare
#2411 is the ticket to make this work rely on the legacy ecephys LIMS API |
#2412 is to address a misalignment in date_of_acquisition discovered in the course of this work |
#2414 has been opened to add functionality to the metadata writer to create the input.json that will ultimately be fed into the informatics_data_release_tool |
@aamster I think this is ready for another look. The method for processing prior_exposures_to_omissions is still ugly (it has to create a combined history from the ecephys_sessions table and the behavior_sessions_table), but it relies on the prior_exposure_processing module from VBO (and now actually does what Corbett wanted, as I understand it). I haven't done much to address your "what should be private and what should be public" comment. The reason is, "I don't know." As I'm sure you've noticed, I've become much less sanguine than you of late that code from any particular release can be generalized to releases past or present. I think the best we can do is keep the data munging methods siloed and documented so that we can borrow things when we want them and don't accidentally borrow them when we don't want them. I'd like to get this merged so we can move forward with the release at hand. If you want to have a larger organizational conversation about the VBN release code, I'm open to that in the next few weeks, once we know that we are on our way to actually getting the data into S3. |
4d5ebef
to
6e53840
Compare
@aamster I think, modulo the cases where we have tentatively agreed to disagree (and whatever we are calling the situation with Talk to you about this on Monday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I do think there needs to be more cohesion between this metadata writer and the previous VBO one, but as already discussed doing so would require some additional time and effort. Things I am referring to are reuse of ProjectTable
, common base class between the existing BehaviorProjectMetadataWriter
and VBN2022MetadataWriterClass
, and reorganization of data release packages for different projects, as you've already pointed out. I don't know how much of the above we want to take on in the future. We've also pointed out that much of database queries can share with the VCN queries, which I think we plan on taking on in the future.
These methods are going to be useful for the VBN metadata.
- move functions for getting behavior -> session_utils - move VBN utils into internal/api/queries where appropriate
split up methods that are principally LIMS queries from methods that are local dataframe manipulations
6e53840
to
e8be731
Compare
This PR adds a module to write the metadata files for the 2022 VBN release.
Some tangentially related changes are
BehaviorStimulus
file allowing us to access date_of_acquisition and session_type directly from the pickle file when they are absent or wrong in MTRAIN or LIMSbehavior/behavior_project_cache
and intoallensdk/internal/api/queries
, where they are more accessible to more packages within the SDK.This PR does not adequately unit-test the code in
allensdk/brain_observatory/vbn_2022/lims_queries.py
. Writing a mock LIMS database at test time is pretty straightforward but will take a lot of personnel hours. I will open a separate ticket (#2408) to add those tests. For now, our principal means of testing the LIMS queries is to use this code to generate metadata tables and present them to Corbett for validation.