Create inits from fit and draws objects #937

SteveBronder · 2024-03-21T19:40:27Z

Submission Checklist

Run unit tests
Declare copyright holder and agree to license (see below)

Summary

Adds a process_inits function to turn fit objects into a list of list of inits. All types of fits from cmdstanr as well as draws objects can be used as inits now.

Tests can be run by calling

testthat::test_file("./tests/testthat/test-fit-init.R")

Copyright and Licensing

Please list the copyright holder for the work you are submitting
(this will be you or your assignee, such as a university or company):
Simon's Foundation

By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

SteveBronder · 2024-03-21T19:42:55Z

@avehtari Sorry for closing the other PR but I started doing some heavy feature creep there and this one is much simpler

avehtari · 2024-03-22T12:41:07Z

An example Pathfinder object which caused a failure with the previous PR is at https://users.aalto.fi/~ave/casestudies/Birthdays/pth5.rds

SteveBronder · 2024-03-22T16:02:07Z

@avehtari when I try loading that into R I'm getting

 load("~/Downloads/pth5.rds")
Error in load("~/Downloads/pth5.rds") : 
  bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning messages:
1: In readChar(con, 5L, useBytes = TRUE) :
  truncating string with embedded nuls
2: file ‘pth5.rds’ has magic number 'X'
  Use of save versions prior to 2 is deprecated

Does that rds file come from one of the scripts in that folder?

avehtari · 2024-03-22T16:14:17Z

The R code is at https://github.com/avehtari/casestudies/blob/master/Birthdays/birthdays.R
and Stan files in https://github.com/avehtari/casestudies/tree/master/Birthdays
and data in https://github.com/avehtari/casestudies/tree/master/Birthdays/data

Which R version you are using?

SteveBronder · 2024-03-22T21:06:28Z

Which R version you are using?

4.3.0. Maybe I need to upgrade? But I'll run the scripts you sent over and see if that causes my code to break

SteveBronder · 2024-03-25T18:09:18Z

@avehtari running pth5 from the script I got the error about not enough distinct draws which I think is what we want.

Not enough distinct draws (4) in pathfinder fit to create inits. Try running Pathfinder with psis_resample=FALSE

Turning off psis resampling stopped the error so I think that is what we want

avehtari · 2024-03-25T18:22:22Z

Yes, this is what we want. It just didn't work like that with the previous PR. I'll test with this PR

avehtari · 2024-03-28T17:18:17Z

@SteveBronder was using an older posterior version. The default value for pareto_smooth() argument return_k has changed from TRUE to FALSE. I pushed a commit that explicitly sets the argument to FALSE, so that the call should work now both with old and new posterior version. Now the Birthday case study runs further. I'll comment after I've been able to run it to the end (need to make some minor changes in my notebook, but can't do those at this minute)

jgabry · 2024-03-28T18:50:29Z

@SteveBronder was using an older posterior version. The default value for pareto_smooth() argument return_k has changed from TRUE to FALSE. I pushed a commit that explicitly sets the argument to FALSE, so that the call should work now both with old and new posterior version. Now the Birthday case study runs further. I'll comment after I've been able to run it to the end (need to make some minor changes in my notebook, but can't do those at this minute)

Ok thanks @avehtari. I think the windows unit tests are failing due to an unrelated reason. Looks like some issue setting up R, which often is resolved at some point without us having to do anything. But I will merge master into this branch just in case (there was an update to something related to GitHub actions).

@SteveBronder I've started taking a look and so far it looks great, but I will do a deeper dive after @avehtari finishes his testing in case he finds any larger issues.

venpopov · 2024-03-28T19:14:00Z

just a comment on the windows tests, not related to this PR per se - we are seeing the same failures for other packages, so definitely unrelated to the PR. They seem intermittent, sometimes passing, sometimes failing...

venpopov · 2024-03-28T19:14:02Z

just a comment on the windows tests, not related to this PR per se - we are seeing the same failures for other packages, so definitely unrelated to the PR. They seem intermittent, sometimes passing, sometimes failing...

avehtari · 2024-04-03T12:39:53Z

My Birthdays case study runs now completely with new inits and other recent CmdStanR features. I will next check the documentation added in this PR

avehtari

Added some comments. In addition, the PR seems to be missing a doc update for the init arguments in model.R

avehtari · 2024-04-03T15:13:00Z

R/args.R

+#' @param init A `CmdStanMCMC` class
+#' @param num_procs Number of inits requested
+#' @param model_variables  A list of all parameters with their types and
+#'   number of dimensions. Typically the output of model$variables().


The output of model$variables() includes also other variables than parameters (data, transformed parameters, and generated quantities). More accurate would be "output of model$variables()$parameters". The same comment repeats in many function docs. As the dimensionality of generated quantities is often much bigger than the parameters, it is not useful to include them in init.

avehtari · 2024-04-03T15:20:20Z

R/args.R

+  init_draws_df = posterior::resample_draws(draws_df, ndraws = num_procs,
+                                            weights = draws_df$pareto_weight, method = "simple_no_replace")


If I read correctly, this resampling is done always, even if Stan did already do resampling. If model$pathfinder(..., psis_resample=TRUE) was used and there are enough unique draws, then the resampling here should be done with uniform weights.

Makes sense!

avehtari · 2024-04-03T15:25:23Z

R/args.R

+  init_draws_df = posterior::resample_draws(draws_df, ndraws = num_procs,
+                                            method = "simple")


CmdStanMLE should only have one draw. Resampling just copies that one draw several times. There is no performance hit, but it just looks strange.

Yeah I'll fix this to just rep the draw a bunch of times

codecov-commenter · 2024-04-03T15:38:58Z

Codecov Report

Attention: Patch coverage is 87.50000% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 88.31%. Comparing base (1b45e70) to head (7f1fa91).

❗ Current head 7f1fa91 differs from pull request most recent head 1197ad4. Consider uploading reports for the commit 1197ad4 to get more accurate results

Files	Patch %	Lines
R/args.R	87.36%	12 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #937      +/-   ##
==========================================
- Coverage   88.33%   88.31%   -0.02%     
==========================================
  Files          12       12              
  Lines        4553     4640      +87     
==========================================
+ Hits         4022     4098      +76     
- Misses        531      542      +11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

avehtari · 2024-04-03T16:05:11Z

Additional observations using the Birthday case study

Init with opt1 (CmdStanMLE object) works, but does not work with as_draws(opt1) unless if more than one init is required (multiple chains or pathfinders)
Init with CmdStanLaplace, CmdStanPathfinder, CmdStanMCMC and corresponding as_draws() works
If draws object has fewer draws than inits, there is an error, but the behavior could be the same as when using CmdStanMLE object, that is resample to get the needed number. This would make sense especially when using a single init value. There could be a warning in cases when a single init is copied.
warn_partial argument indicates possibility of providing init only for some parameters, but at least with draws object as init, this causes an error
it would be good to check that the inits don't include -Inf or Inf

…s draws than procs

SteveBronder · 2024-04-04T20:01:55Z

Init with opt1 (CmdStanMLE object) works, but does not work with as_draws(opt1) unless if more than one init is required (multiple chains or pathfinders)
If draws object has fewer draws than inits, there is an error, but the behavior could be the same as when using CmdStanMLE object, that is resample to get the needed number. This would make sense especially when using a single init value. There could be a warning in cases when a single init is copied.

Yes I changed this now so that if the number of draws is less than the number of processes it repeats the inits so that we have the amount of inits as processes. Since all the other process_init functions always return the same number of draws as the number of processes this logic will only happen when the user passes a draws object.

warn_partial argument indicates possibility of providing init only for some parameters, but at least with draws object as init, this causes an error

Oh that should just give a warning. I think it makes sense to only have some parameters if for instance you are doing iterative model building

it would be good to check that the inits don't include -Inf or Inf

Agree! Just added that with a nice error message

@jgabry one more thing. It would be pretty easy to add something here so that if a user passes a sampling object, has not included their own inverse metric, and is doing sampling with the same model / parameters we can use the sampling init object's inverse metric. Do you want me to include that here or should that be another PR?

…fit-inits

jgabry · 2024-04-22T20:00:30Z

@SteveBronder Anything I can do to help test/debug this?

@venpopov

also add @venpopov as contributor

jgabry · 2024-04-22T20:07:34Z

@SteveBronder Anything I can do to help test/debug this?

Also I just fixed a merge conflict with master, moved you up in the DESCRIPTION file with the other full authors (which we probably should have done when you added the pathfinder method initially), and added @venpopov to DESCRIPTION as a contributor. I think we're super close to merging this.

SteveBronder · 2024-04-22T20:09:37Z

Not yet, I want to test if I get this same issue on master from just running a bunch of the algorithms multiple times. It looks like this code is not touching anything related to reading / writing so I'm pretty confused by the failures

SteveBronder · 2024-04-22T20:10:03Z

If you have time to test that today it would be nice but otherwise I'll try that tmrw

jgabry · 2024-04-22T20:27:01Z

I can make a branch off of master than runs basically the same test file you added but overrides the fit_init argument to test_inits and just passes in old fashioned initial values (since it would otherwise fail if trying to init from a model object).

jgabry · 2024-04-22T20:30:37Z

I can make a branch off of master than runs basically the same test file you added but overrides the fit_init argument to test_inits and just passes in old fashioned initial values (since it would otherwise fail if trying to init from a model object).

Ok this should be running the same code as on your branch, just replacing the inits with an init function instead of using the fitted model objects:

https://github.com/stan-dev/cmdstanr/pull/954/files

Let's see if it fails.

SteveBronder · 2024-04-22T20:38:43Z

Much appreciated!

SteveBronder · 2024-05-03T22:13:39Z

This and master are getting a weird error in windows latest when trying to build stringi?

SteveBronder · 2024-05-03T22:13:53Z

Besides that though I think I figured out the original bug with the tests

andrjohns · 2024-05-04T11:03:52Z

This and master are getting a weird error in windows latest when trying to build stringi?

Yeah stringi is broken on windows r-devel atm, so safe to ignore.

Besides that though I think I figured out the original bug with the tests

Brilliant! Let me know when you're done with changes and I'll do a final look-over before merging

SteveBronder · 2024-05-04T12:30:05Z

I think it is ready to merge!

Addressed

andrjohns

Thanks, this is going to be very well-appreciated!

adds feature to make inits from fit and draws objects

ae46447

SteveBronder added 2 commits March 21, 2024 16:32

update to import stats::aggregate

a89f31f

remove use of \() anonymous function

0ebe16d

SteveBronder requested review from jgabry and andrjohns March 22, 2024 15:58

fix pareto_smooth call with explicit argument

099a352

Merge branch 'master' into feature/fit-inits

e4ec68e

SteveBronder added 3 commits April 3, 2024 10:31

update stevebronder from ctb to auth

6c1092b

remove extra files

7da1f17

remove extra file

872405f

avehtari reviewed Apr 3, 2024

View reviewed changes

add tests for partial variable matching and allow draws objs with les…

d85f3b5

…s draws than procs

SteveBronder added 3 commits April 4, 2024 16:12

update r-setup action version

6cd3a63

update rtools version for github actions

a7c3400

update rtools for github workflow

feee313

SteveBronder added 2 commits April 19, 2024 12:46

testing

392941e

Merge remote-tracking branch 'origin/feature/fit-inits' into feature/…

e03a643

…fit-inits

jgabry changed the title ~~Adds feature to make inits from fit and draws objects~~ Create inits from fit and draws objects Apr 20, 2024

fix merge conflicts

df7617d

also add @venpopov as contributor

jgabry mentioned this pull request Apr 22, 2024

debug inits tests #954

Closed

jgabry and others added 7 commits April 22, 2024 16:25

Merge branch 'master' into feature/fit-inits

f14641e

remove set.seed() in each init test

e711265

update

213cfe7

update for stringi temporary fix

25d0a74

trying workaround for stringi

cd4eedd

...

d984931

remove things to try to fix stringi cli error

ef9e718

andrjohns approved these changes May 4, 2024

View reviewed changes

andrjohns merged commit cc2e36d into master May 4, 2024
10 of 11 checks passed

andrjohns deleted the feature/fit-inits branch May 18, 2024 11:26

avehtari mentioned this pull request May 18, 2024

Add method to create inits from fit or draws objects #876

Closed

avehtari mentioned this pull request May 27, 2024

Multiple inits for multi-path Pathfinder? #907

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create inits from fit and draws objects #937

Create inits from fit and draws objects #937

SteveBronder commented Mar 21, 2024

SteveBronder commented Mar 21, 2024

avehtari commented Mar 22, 2024

SteveBronder commented Mar 22, 2024

avehtari commented Mar 22, 2024

SteveBronder commented Mar 22, 2024

SteveBronder commented Mar 25, 2024 •

edited

Loading

avehtari commented Mar 25, 2024

avehtari commented Mar 28, 2024

jgabry commented Mar 28, 2024 •

edited

Loading

venpopov commented Mar 28, 2024

venpopov commented Mar 28, 2024

avehtari commented Apr 3, 2024

avehtari left a comment

avehtari Apr 3, 2024

SteveBronder Apr 4, 2024

avehtari Apr 3, 2024

SteveBronder Apr 4, 2024

avehtari Apr 3, 2024

SteveBronder Apr 4, 2024

codecov-commenter commented Apr 3, 2024 •

edited

Loading

avehtari commented Apr 3, 2024

SteveBronder commented Apr 4, 2024

jgabry commented Apr 22, 2024

jgabry commented Apr 22, 2024 •

edited

Loading

SteveBronder commented Apr 22, 2024

SteveBronder commented Apr 22, 2024

jgabry commented Apr 22, 2024

jgabry commented Apr 22, 2024

SteveBronder commented Apr 22, 2024

SteveBronder commented May 3, 2024

SteveBronder commented May 3, 2024

andrjohns commented May 4, 2024

SteveBronder commented May 4, 2024

andrjohns left a comment

		init_draws_df = posterior::resample_draws(draws_df, ndraws = num_procs,
		weights = draws_df$pareto_weight, method = "simple_no_replace")

		init_draws_df = posterior::resample_draws(draws_df, ndraws = num_procs,
		method = "simple")

Create inits from fit and draws objects #937

Create inits from fit and draws objects #937

Conversation

SteveBronder commented Mar 21, 2024

Submission Checklist

Summary

Copyright and Licensing

SteveBronder commented Mar 21, 2024

avehtari commented Mar 22, 2024

SteveBronder commented Mar 22, 2024

avehtari commented Mar 22, 2024

SteveBronder commented Mar 22, 2024

SteveBronder commented Mar 25, 2024 • edited Loading

avehtari commented Mar 25, 2024

avehtari commented Mar 28, 2024

jgabry commented Mar 28, 2024 • edited Loading

venpopov commented Mar 28, 2024

venpopov commented Mar 28, 2024

avehtari commented Apr 3, 2024

avehtari left a comment

Choose a reason for hiding this comment

avehtari Apr 3, 2024

Choose a reason for hiding this comment

SteveBronder Apr 4, 2024

Choose a reason for hiding this comment

avehtari Apr 3, 2024

Choose a reason for hiding this comment

SteveBronder Apr 4, 2024

Choose a reason for hiding this comment

avehtari Apr 3, 2024

Choose a reason for hiding this comment

SteveBronder Apr 4, 2024

Choose a reason for hiding this comment

codecov-commenter commented Apr 3, 2024 • edited Loading

Codecov Report

avehtari commented Apr 3, 2024

SteveBronder commented Apr 4, 2024

jgabry commented Apr 22, 2024

jgabry commented Apr 22, 2024 • edited Loading

SteveBronder commented Apr 22, 2024

SteveBronder commented Apr 22, 2024

jgabry commented Apr 22, 2024

jgabry commented Apr 22, 2024

SteveBronder commented Apr 22, 2024

SteveBronder commented May 3, 2024

SteveBronder commented May 3, 2024

andrjohns commented May 4, 2024

SteveBronder commented May 4, 2024

andrjohns left a comment

Choose a reason for hiding this comment

SteveBronder commented Mar 25, 2024 •

edited

Loading

jgabry commented Mar 28, 2024 •

edited

Loading

codecov-commenter commented Apr 3, 2024 •

edited

Loading

jgabry commented Apr 22, 2024 •

edited

Loading