Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking change: Implement param_constrain in a thread-safe way #98

Merged
merged 21 commits into from
Apr 14, 2023

Conversation

WardBrian
Copy link
Collaborator

@WardBrian WardBrian commented Mar 29, 2023

This closes #92.

  1. The bs_model_rng struct no longer holds an RNG (and has thus been renamed to bs_model).
  2. The bs_param_constrain function now expects a pointer to a bs_rng struct, which holds the boost::ecuyer1988 RNG
  3. Functions to create and destroy the bs_rng struct have been added. These are exposed in the interfaces to create new RNG holders from a seed.
  4. The interfaces have all been updated to take in an rng argument to param_constrain. This must be set if include_gq == true.

@WardBrian
Copy link
Collaborator Author

Note: this does not implement the idea discussed in #92 where the interfaces could have a non-thread-safe default mode where they store and use an RNG on your behalf.

We could implement this afterwards quite easily, and it would not be a further breaking change to do so (if anything it makes it slightly less breaking)

@aseyboldt
Copy link
Contributor

I think this looks nice :-)

Is there a reason why the bs_construct function still has a seed argument? I don't think this should be doing anything, does it?

I also noticed that it this is using ecuyer1988 to get random numbers. I'm very far from an expert on those, but I think this is a pretty old one that might not be all that reliable? I couldn't find detailed analysis on it, but it is mentioned here as having issues: https://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf

@WardBrian
Copy link
Collaborator Author

The model’s constructor can have some randomization in it (RNG functions in the transformed data block), so a seed is still needed there, but it’s much shorter lived

The ecuyer1988 generator does seem to be suboptimal in the modern landscape, and we’ve encountered strange issues with it before (the fact that I always discard 1 draw from it is actually working around a bug someone reported to us recently: stan-dev/stan#3167)

Unfortunately the Stan model base class only exposes write_array with that argument, so upstream changes would be needed for us to use a different one in bridgestan

@aseyboldt
Copy link
Contributor

The model’s constructor can have some randomization in it (RNG functions in the transformed data block), so a seed is still needed there, but it’s much shorter lived

Ah, makes sense.

If there is a global model seed anyway, would it then make sense to turn the seed argument of bs_construct_rng to a stream argument as in the create_rng function in stan (maybe using that function)? https://github.com/stan-dev/stan/blob/develop/src/stan/services/util/create_rng.hpp#LL25C30-L25C30

If bs_construct_rng(stream_id) were to call create_rng(model.seed, stream_id + 1), the first stream would be for rng in transformed_data, and all other random streams would be independent from each other and from the transformed_data stream.

With independent seeds as in the current design it is much harder for consumers of the API to figure out a good system for those seeds.
(for instance using global_seed + chain for the individual bs_construct_rng calls would mean that you get overlapping chains for global_seed = 0 and global_seed = 1.)

@WardBrian
Copy link
Collaborator Author

I agree that is a better interface @aseyboldt. Reworking now

@WardBrian WardBrian force-pushed the feature/gq-thread-safety branch from 018e69b to 7304b04 Compare April 4, 2023 18:51
@WardBrian WardBrian changed the base branch from feature/return-error-messages to main April 4, 2023 18:51
@WardBrian WardBrian marked this pull request as ready for review April 4, 2023 19:04
@aseyboldt
Copy link
Contributor

A small inconsistency: bs_destruct_rng can return an error, but bs_destruct can not. Is there a reason for this?

@WardBrian
Copy link
Collaborator Author

Nope! Just forgot to update after we changed bs_construct in #91. I’ll fix tomorrow

@WardBrian
Copy link
Collaborator Author

I believe this is ready for feedback/review from you both @roualdes @bob-carpenter

@WardBrian
Copy link
Collaborator Author

Discussed with @bob-carpenter and he is in favor of ditching any logic where we deal with chain IDs or implicitly constructing an RNG for the user if they give us one.

The interface would then be:

  • param_constrain accepts a nullable RNG which must be non-null if include_gq is True.
  • The user is responsible for creating an RNG from a seed to pass to param_constrain.

This essentially removes bullet point 4 in the original PR post and greatly simplifies 5.

I think I am also in favor of this, since it's not clear to me that the "advance by some huge number times the chain ID" scheme makes sense when the only thing the RNG is being used for is generated quantities and not the algorithm itself.

Thoughts @roualdes?

@roualdes
Copy link
Owner

Overall the conclusions of your discussions sound good. Ditching any logic surrounding RNGs where they are not necessary is a win.

I feel like I'm missing some details, but I'll pick those up after looking over the next draft.

@WardBrian
Copy link
Collaborator Author

Updated the code and the PR description per the above.

@roualdes
Copy link
Owner

bs_construct takes a seed and passes it to bs_model, which passes it to new_model. Great. But when is that specific seed ever used again? Can't we replace this seed with a dummy seed?

@WardBrian
Copy link
Collaborator Author

new_model uses that seed if the Stan code has RNG functions in transformed data

@roualdes
Copy link
Owner

I certainly did not consider that idea. Thanks.

I should have time Thursday, at latest, to review this.

Copy link
Owner

@roualdes roualdes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a big one. Thanks much for your effort, @WardBrian.

Overall, the structure looks great. Lots of little tidbits in this review. Mostly I'm interested in a new Julia test to confirm that these changes are thread-safe.

src/bridgestan.cpp Outdated Show resolved Hide resolved
julia/src/model.jl Outdated Show resolved Hide resolved
julia/test/model_tests.jl Outdated Show resolved Hide resolved
src/bridgestan.h Show resolved Hide resolved
python/bridgestan/model.py Show resolved Hide resolved
src/bridgestan.cpp Outdated Show resolved Hide resolved
R/R/bridgestan.R Outdated Show resolved Hide resolved
julia/src/model.jl Outdated Show resolved Hide resolved
python/bridgestan/model.py Outdated Show resolved Hide resolved
R/R/bridgestan.R Outdated Show resolved Hide resolved
@WardBrian WardBrian requested a review from roualdes April 13, 2023 19:38
@roualdes
Copy link
Owner

@WardBrian, I just pushed a re-worked version of the julia test "threaded model: full". As I understand things, the previous test passed but did not test what we wanted for two reasons.

  1. gq1[r, :] creates a copy of the rth row of gq1 into a newly allocated Vector{Float64} which then gets passed into param_constrain!(), is modified, and then is immediately thrown away since that copy is never assigned. Ugh. Lots of complaints in Julialand about slices creating copies and not views, i.e. not referencing the original array.

  2. Even if 1. weren't the case, gq1[r, :] references non-contiguous memory in column-major Julia and our C++ (Eigen) silently relies on contiguous memory.

This all speaks to a larger, julia specific, issue of our modifying functions, e.g. f!(). Right now we hard-code Vector{Float64} everywhere, as Julia doesn't have an obvious way to throw on non-contiguous views; re #12

Let me know what you think, then I'll do one last pass and approve.

@WardBrian
Copy link
Collaborator Author

Sounds good, I heartily appreciate the look-over on that test. That does sound like some possibly messy Julia inside baseball.
Looks like the fixed test is still passing, so that's good!

Copy link
Owner

@roualdes roualdes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks!

@WardBrian WardBrian merged commit 38f9ac0 into main Apr 14, 2023
@WardBrian WardBrian deleted the feature/gq-thread-safety branch April 14, 2023 16:31
This was referenced Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Thread safety of param_constrain
3 participants