-
Notifications
You must be signed in to change notification settings - Fork 923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Model.rng for SPEC-7 compliant numpy random number generation #2352
Conversation
for more information, see https://pre-commit.ci
Performance benchmarks:
|
Implementation looks clean and good!
I think we should have one consistent recommended way to pass seeds. In my view, the random number generator should be reset directly in Model init with a certain seed, to make model runs also reproducable when run in (for example) a Jupyter notebook multiple times, or in (custom) batch scripts. How it's implemented internally doesn't matter that much in my opinion. If that seed number is passed to both python.random and numpy.random and that provides consistently reproducable results, that perfectly fine. If it uses numpy.random for everything, that's also great. It's something we should test on a very minimal model I guess.
I'm willing to advocate and enable an aggressive timeline for this. |
I started out this way but its not trivial. For example, If there is a broader concensus on moving aggresively on this, I would probably approach it a bit differently. Now, I just added |
Can we solve this by requiring the user to input the seed in a certain dtype, and doing the rest behind the scenes? In this case, require inputting a int and constructing the remainder of the dict in the Model init? |
In general, there is no need to rush, as SciPy (one of the endorsers of the spec and the one behind the spec) itself has yet to make a move scipy/scipy#14322 (because their user-facing surface with random state is so huge). No project hosted on GH has completely adopted the spec yet (https://github.com/topics/spec-7). The restriction in not being able to use a float for the seed will be a huge one. What should be the recommended conversion from users's existing seeds? If we move aggressively with the |
There is a rush in another way. If we want to make breaking changes to how random number generation is handled in MESA it would be good to make these as part of the shift to MESA 3.0. My current idea (based on thoughts below) is to make
I deliberately left random in place for now to make migration easier (and to have this conversation). In case of seed=None, we can easily instantiate rng and use this to draw the seed for random (instead of using random.random as currently done). If we want to use seed/rng to seed both So, for the float case, we could no longer allow it because There are various other changes in MESA 3.0 that make me wonder whether perfect numerical reproducability is not already impossible when switching from MESA 2.x to MESA 3. |
If it is already at the point of no return, then it is fine to do the breaking change. (scikit-learn/scikit-learn#16988 (comment) (from the creator of https://github.com/pyutils/line_profiler) has an extensive read on why a generator construct eliminates a lot of headaches in scikit-learn. I haven't read the lengthy explanation version.) I haven't said elsewhere, but |
Ok, I'll update the PR based on some of the other points discussed but make sure we keep random in there while adding spec-7 compliant rng.
This is an important point and a possible argument for keeping both stdlib random and rng if the benchmarks further done the line indicate that the overhead is massive. |
for more information, see https://pre-commit.ci
Ok, I have updated the PR. Users can not pass both If both If either is passed, we seed the associated random number generator. We try to also use this number for the other. If this raises a type error, we draw a random integer from the seeded random number generator and use this to seed the other.
|
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t really understand all the little details and nuances of this implementation and how it will interact with everything, but in general it looks clean and doesn’t break anything, and I agree on a conceptual level a SPEC 7 implementation in Mesa is useful.
So I’m approving this, but requesting for the PR to remain open for about a day or so other maintainers have a chance to take a look at it.
@projectmesa/maintainers any of you additional insights / an opinion on this? |
@quaquel Unfortunately, this PR broke pre-commit. |
so, what does that mean? I don't know what is covered by pre-commit? |
The |
Was this not noticed before the merge? |
A while ago we un-required some checks, because some times they error either erroneously or because of unrelated issues to that specific PR, which than blocked it unnecessarily. Yesterday I re-required pre-commit and one of the build jobs again. It’s a bit about finding a balance, but in general I’m a fan of educating ourselves about the systems and checks we use and interpreting them as guidelines, then making everything strict rules and being held hostage to automated processes. These things are really minor and easily fixed, so also really not a big deal. As we say in Dutch: “Waar gehakt wordt, vallen spaanders”. |
The problem was that when you have a PR that doesn't change code, for example a pure documentation update, then the build runs are not triggered at all. But they are still required so you run into a situation where you have to use the big red button, which isn't available to all maintainers. I just wish there would be an extra step involved when you want to merge something, but some checks are failing. Right now it's way too easy to miss failed checks. But making everything required would be too strict, sometimes it's okay when you know you have a dependency that will fix the error soon. But let's see how it works out now with the required workflows |
You’re right, that’s why we removed that one. I removed it again as a required check.
Agreed, that would be useful. I don’t think GitHub offers such a feature unfortunately. @quaquel Tom did it a while back before I had admin permissions. I will communicate clearly if I change anything in the future. Currently only pre-commit is a required check, since that runs always. It’s good habit as a maintainer to always check if all checks in GitHub are green before merging, and if something isn’t to check the root cause. If it’s due to the PR, fix it within the PR, otherwise post it somewhere relevant. |
Found this FIXME in the Agent code, with this merged, could this be resolved? Line 125 in 22784df
|
The issue still exists for both At the moment, they all create a new Given spec-7, we cannot really go the global route, and as discussed in #1980, there are additional reasons why that is undesirable. We might consider raising a user warning? |
I love sensible user warnings, sounds like a good idea! |
This PR adds a SPEC 7 compliant numpy random number generator to the Model class.
Implementation
I have chosen a straightforward approach. I have added an additional optional keyword argument
rng
that can be used to seed thenumpy.random.Generator
instance when instantiating a model. I have also added 2 new attributes toModel
:rng
and_rng
. The first is the generator which can be used analogously to model.random. The second is_rng
which is the dictionary describing the state of the numpy.random.Generator instance. Knowing this dict allows for reproducibility.The code itself and e.g., the typing behavior is lifted straight from SPEC 7.
At the moment, a user has to pass either seed or rng, but not both which will raise a value error. The value passed for the one will be used for the other as well. If this argument is invalid, a valid integer based seed/rng number will be generated instead. What does this mean practically: if you pass seed we start with seeding
random.Random
. We try to use seed also fornumpy.random.default_rng
. If this raises a type error because seed is not valid here (e.g., seed is a float). Instead, we draw a valid integer via the seededrandom.Random()
. It works the same way if you only pass rng. So we create a Generator, try to use rgn forrandom.Random
. If this failes because it is e.g. an ArrayLike sequence of ints, we draw a random int to use forrandom.Random(()
What's next?
At present, I have not touched the existing random throughout the code base. SPEC 7 is only about numpy and does not imply anything for python.random. It is for me an open question whether we should completely deprecate python.random throughout mesa in favor of numpy.random. I am inclined to be in favor of making this change because the quality of the random number generators in numpy is superior (long story about mersene twister's "bad" behavior when seeded with non 32 bit numbers or with adjacent numbers in case of seed analysis). However, this requires a substantial effort throughout the code base and might come with a performance penalty. What do others think and should we do this before MESA 3.0 becomes stable?