-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make missing data work with -999 #540
Conversation
Instructions for code reviewerHello reviewer, thanks for taking the time to review this PR!
Checklist
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #540 +/- ##
==========================================
+ Coverage 75.28% 75.35% +0.07%
==========================================
Files 39 39
Lines 2080 2086 +6
Branches 140 140
==========================================
+ Hits 1566 1572 +6
Misses 471 471
Partials 43 43 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChristineStawitz-NOAA I took a look and don't see any issues with this. I had a few questions, mostly for my understanding. The do not need to be resolved to merge, but answering them would help me :).
One thing that I'm not sure about is how much documentation this should have - perhaps at least saying how users would use this -999 on the description of this PR might be helpful? That way, it will give us something to start from once writing more formal documentation!
dnorm.mean = fims_math::log(this->expected_index[i]); | ||
dnorm.sd = fims_math::exp(this->log_obs_error[i]); | ||
nll -= dnorm.evaluate(true); | ||
if(this->observed_index_data->at(i) != this->observed_index_data->na_value){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like if we add more data types besides an index and age comps, more if()
statements would need to be added to each of the data types' negative log likelihood assignment statements to allow for NAs to work, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct! When coding we also discussed if we need to handle a case where there are NA values in some ages but not others despite the existence of some comp samples, but for now, you can only specify a full missing year. Are you thinking we should document this or add to an issue?
testindex <- 2 | ||
na_value <- -999 | ||
if(i==4){ | ||
fishing_fleet_index$index_data[testindex] <- na_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is how R users would use it currently, correct? Assigning -999 to anything that should be an NA?
Maybe this should be added to the PR description at least for some minimal documentation on how this works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - I added something to the fims-demo vignette about how to specify missing data!
@ChristineStawitz-NOAA before you merge - I just realized that the missing data commit is in the proportion female (#543) PR as well. Perhaps then it would be better to either 1) Not merge this in and only merge in the #543 ; Or 2) Merge this in, and do some rebasing magic on the #543 before merging this in |
One observation from exploring this branch and putting missing data in. When I put -999 in for a year, I also put -1 in for input multinomial sample size, thinking that it would ignore this value since it's skipped in the calculations. But it caused some really unexpected behavior. The model would run but had the wrong results. I could try to reproduce this if necessary at some point. But I think we need more error checking for inputs to catch some of this early on. Otherwise I approve the review and closing it. Sorry I was late to this. |
Did you mean to close Allow ability for proportion female to vary by age by
ChristineStawitz-NOAA · Pull Request #543 · NOAA-FIMS/FIMS (github.com)
<#543> ? That's the PR I need to be
approved to merge into main
I'm confused where you put the -1 in for multinomial sample size. The
age_comp_data input is a vector of integers such that sample size is
multiplied by the proportion in each age group. So did you have a vector of
negative numbers in that year?
…On Thu, Feb 8, 2024 at 4:38 PM Cole Monnahan ***@***.***> wrote:
One observation from exploring this branch and putting missing data in.
When I put -999 in for a year, I also put -1 in for input multinomial
sample size, thinking that it would ignore this value since it's skipped in
the calculations. But it caused some really unexpected behavior. The model
would run but had the wrong results. I could try to reproduce this if
necessary at some point. But I think we need more error checking for inputs
to catch some of this early on.
Otherwise I approve the review and closing it. Sorry I was late to this.
—
Reply to this email directly, view it on GitHub
<#540 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALNPO3MC32PKHUXEHJCZR2TYSVVZ5AVCNFSM6AAAAABCCV4BQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZVGE2TAMJYG4>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
--
Christine C. Stawitz, PhD. (pronouns: she/her)
FIMS Project Lead
NOAA Fisheries Office of Science and Technology | U.S. Department of
Commerce
Schedule a meeting with me! <https://calendly.com/christine-stawitz/30min>
Mobile: 206-617-2060
www.fisheries.noaa.gov
|
It seems like the original plan of the data team to handle missing values behind the scenes would make things more consistent than having people put -999 in their data input. |
Agree - that is the long-term goal @kellijohnson-NOAA. This was an interim fix to get the case studies running with missing years of data prior to M2 development
starting, with the eventual goal of populating the na_values internally
within the R interface so the user doesn't have to do it.
|
Thanks @ChristineStawitz-NOAA for the clarification. |
What is the feature?
How have you implemented the solution?
Does the PR impact any other area of the project?
How to test this change
Developer pre-PR checklist