-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make mesa scalable! #798
Comments
Hey there, first of all if you really want to scale up ABMs I think Python is the wrong modelling language. That said I fully agree that the "cost of mesa" should be as small as possible. From my experience there are several performance gains for the For the datacollection I don't see an urgent problem. If I understand you correctly you were simply running out of memory. However, data collection in mesa is pretty explicit. You have to call An alternative would be to write the data of the collector to the disk or a database. But again, this is something that can already be done explicitly. But I agree that this could be done more user-friendly with a dedicated function of the datacollector (with the question if values should also remain in memory or not). Note, however, that this will likely incur some performance cost, because IO and/or writing to a database is rather slow (compared to doing nothing). Lastly I had some good experiences with pickling whole models. This also allows you to restart your model from a later state, if it ever crashes. |
@rithwikjc to add to @Corvince's comments... Any chance you can create a "hello world" like model that illustrates the break down? We have heard various feedback here and there, but have never had a model illustrate it. This would be helpful. (Or maybe modify one of the models in the model folder?) RE: Datacollector updates -- I could see that. (I think our issue is that we have a lot of things in motion and not enough contributors, so it is harder to prioritize something like that.) RE: Parallel Processing -- This is a tricky one, because splitting a model over multiple cores is known to introduce unintended artifacts to the outcome of the models. I am sure there is research out there about mitigating this, to integrate this kind of thing, I wouldn't want to build something off of one paper since real research would depend on how this thing works. This would be a very intensive research project to ensure the only artifact are speed gains and if there were other artifacts that a model creator was aware of those and choose to live their best lives with them. (I hope a PhD student comes along and builds a dissertation off of this and contributes it back. #shameless) |
Also... probably not what you are looking for, but BatchRunnerMP is a mult-processing class for running batches. |
RE: Parallel Processing -- At first glance multiprocessing seems like a natural fit for ABMs since a lot of the tasks of the agents could theoretically be done in parallel. However, the interesting aspects of ABMs (and where they differentiate themselves from other models) comes from the interaction of agents. And this is known to be tricky to be done in parallel. And I don't think it is usually worth it. Even if you would manage to increase model run times by an amount |
Hey @Corvince Just curious. Which language would be suited for something like ABM with lets say agents in the order of 100s and time steps in the order of millions? Since the interactions have to happen sequentially in the population I don't see how a different language can really improve things in many cases. Indeed, it was a memory error. 😄 Woah, pickling whole models as in, writing the entire model at some state to a pkl? Do you suggest that as a good way of storing the state of a model? I have never really used pickle before and general advice was to stick to simpler file I/O. |
Sure, makes sense. I would retract my suggestion to improve the parallel processing part for now. 😄
|
@Corvince Thanks, I've wondered about this. |
Just to put the amount of work it requires for a user/modeler and the amount of work it would require for mesa developer into perspective: As a modeler, I would do something like this: #inside the model.step function
if self.schedule.steps % 1000 == 0:
df_out = self.datacollector.get_agent_vars_dataframe()
df_out.to_csv(f"awesome_model_run_{self.schedule.steps}.csv")
self.datacollector = DataCollector(self.datacollector.model_reporters, self.datacollector.agent_reporters) That is exactly 4 lines of code. However, there are several design decisions I took that might not work for everyone or are unsafe. Let`s go through the lines if self.schedule.steps % 1000 == 0: I want to output every 1000 steps. Obviously this would be different for everyone. As I said before I definitely see the general use of this functionality, but right now the DataCollector class does not know of the model or the schedule so there is no straight forward way to implement this df_out = self.datacollector.get_agent_vars_dataframe() I am only interested in the agent variables, but if I wanted to save also model variables things get more complicated regarding file names (see below) df_out.to_csv(f"awesome_model_run_{self.schedule.steps}.csv") This is the crucial line. I made a total of four design decisions here:
self.datacollector = DataCollector(self.datacollector.model_reporters, self.datacollector.agent_reporters) I am resetting the datacollector every time I am saving to file. This might not always be desired so it should be optional. I hope this shows that something as simple as four lines of modelers work translate into much more work for a library developer, if done properly. Also it would mean we are potentially offering so many customization options that it requires more time to read and understand the Just to put @jackiekazil comment about a lack of contributors into perspective. But contributions are always welcomed! |
If you want to save the "real" state of a model this is actually the only (very easy) way to do so! import pickle
with open("filename.p", "wb") as f:
pickle.dump(model, f) This also happens quite fast and later you can do import pickle
with open("filename.p", "rb") as f:
model = pickle.load(f) And you can continue your model from the saved state (just continue with The only caveats are that pickle files are insecure (but not if you use your own files) and are not inter-operable outside of Python * import sys
sys.setrecursionlimit(10000)``` |
I'm going to add a few thoughts on here based on my brief experience with Mesa.
|
Thank you for your feedback!
|
Thank you @Corvince for the very elaborate comments. These are surely helpful. In fact I am using something like,
you mentioned to limit the data writing. I also understand the trade-off between complexity and learning-curve/ease-of-use. However I felt that if mesa is to be used for research purposes, it is better to have much more scalability built into it, as optional features. But I understand the problems you are raising. I will try to add some contributions to the repository, if I find elegant ways to tackle these issues. A small and significant improvement in the meanwhile could be adding tips such as the ones you've mentioned here in the documentation, so that new users don't feel completely lost or helpless when coding models that operate on higher scales. I will see about augmenting the documentation as well, as I think that would be one of the best additions at this point. Thanks for the helpful comments regarding pickling as well. I imagine it can be very helpful for my application (and again could possibly be added to the documentation as helpful tips). I was very apprehensive about pickling since almost everywhere it is shunned upon to pickle objects for safety/security reasons. I feel however safer to proceed now and try it, as I guess the benefits could be huge for me. @pmbaumgartner also has made some really good suggestions. Some profiling can be immensely helpful for models that work at high scales to improve performance. I have used snakeviz to improve performance on my model drastically. It could be a valuable addition for research applications. I am not savy enough to comment on Cython however. |
I second the spirit of the starting post. My recent efforts to provide a reasonable model for COVID-19 require representing a large number of agents to make it realistic. https://github.com/snunezcr/COVID19-mesa While I understand Python may not be the best tool > 10^5 agents, there may be significant opportunities to utilize multiprocessing libraries. |
Hi @snunezcr thanks for your feedback and sharing your model. It looks very interesting. I took a look at the performance of your model and I actually don't see any model slowness caused by mesa. It runs relatively slow, because you are generating a lot of random numbers, which is rather expensive (especially in a setting where you only generate a lot of single numbers, i.e. generating 1000 x 1 random number is much slower than generating 1 x 1000 random numbers). On my machine running your starter model for 50 steps takes about 7 seconds. Profiling revealed that in your move function you have this line self.curr_dwelling = poisson(self._model.avg_dwell).rvs() If I change that to self.curr_dwelling = poisson.rvs(self._model.avg_dwell) the run time goes down to 3.5 seconds. Apparently "freezing" distributions in scipy is rather expensive, interestingly because the docstring is always being generated. If you do it only once there is not much difference for Furthermore changing agents susceptible stage step function from this: if (self.detection.rvs()) and (self.astep >= self.model.days_detection):
pass to first checking the second condition you don't need to generate the first random number: if (self.astep >= self.model.days_detection) and (self.detection.rvs()):
pass my run time for 50 steps goes down to 0.5 seconds. However, the visualization is indeed much slower than that. It seems to be related to the chart, if you deactivate the chart, the grid view produces minimal overhead. Maybe we could investigate into why the charts are relatively slow. |
Hello @Corvince , Thank you for looking into the profiling aspects of the model. This is extremely useful. I am surprised about the behavior of distributions in scipy as well. Did not expect that. I do worry at the back of my mind about whether the cost is related to some guarantees that ensure proper behavior of the distribution. Small experiments I have performed using your method do not indicate significant differences, so, for the moment, I will go ahead with it. While the charts are useful, my primary concern is to scale up to ensemble computations with fixed parameters up to a 100 instances, and then compute averages. Thanks again. |
Hey @Corvince
New around here but I think that I can take a function that does that. Would you recommend something to take into account? |
There has been some initial work by @dmasad on this thread: There is a link to a GitHub gist and note my improvements in the first reply (you can ignore my other comments cheering on a non-existent solution). Apart from that, I think the most difficult thing is to decide if and when you want to override data in the database. Sometimes you just change parameters and want to compare, but sometimes your model itself changes and you want to get rid of that old data. So I would say it is easy to store data itself, but not to identify the model behind the data. Be sure to keep this in mind. |
@Corvince The link does not work. Could you link it again or paste the gist and your comments here? |
Oh the link only works if I am logged in. You can search for "data collection profiling" here Direct link to the gist: My changes: def record_data(self):
self.insert_sql = "INSERT INTO agent_data VALUES (?, ?, ?, ?)"
self.c.execute("BEGIN TRANSACTION;")
values = [(self.schedule.steps, a.unique_id, a.x, a.y) for a in self.schedule.agents]
self.c.executemany(self.insert_sql, values)
self.conn.commit() |
For everyone interested in this, there is now an effort to create an vectorized subset of Mesa: |
@EwoutH thank you for connecting dots! |
I'm going to close this issue as completed, now that we have mesa-frames officially under the Mesa umbrella: If there are any specific issues or ideas for Mesa performance or scalability, feel free to open a new issue or discussion! For anyone encountering this issue, there's a ChatGPT 4o generated summary:
|
What's the problem this feature will solve?
Currently mesa is a great tool for visualizing and studying ABMs (and the best in python), but in my experience it like other tools I've tried prevents us from taking full advantage of ABMs. For this what is required is scalability. ABMs are currently a hot area of research, to make sure Mesa doesn't die out, mesa should be able to deal with
Currently the mesa basic package simply crashes for large number of agents and large number of steps, mainly owing to the datacollector getting overloaded.
Describe the solution you'd like
An alternate class of
datacollector
could be made which could be used to store/write data periodically so as to not crash the system when it runs out of memory. More support and documentation of parallel processing. Documentation here is important, as without that currently anything implemented just goes unused. I think both these need to be addressed quickly and systematically. I am willing to help in any way to this making mesa future-proof and a proper research tool.Additional context$10^6 - 10^8$ steps and mesa simply failed at this. I've had to write my own code to even just run the model without memory overload.
I was working on my masters thesis which needs ABMs to run up to
The text was updated successfully, but these errors were encountered: