Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AgentSet: Add agg method #2266

Merged
merged 2 commits into from
Sep 3, 2024
Merged

AgentSet: Add agg method #2266

merged 2 commits into from
Sep 3, 2024

Conversation

EwoutH
Copy link
Member

@EwoutH EwoutH commented Sep 2, 2024

This PR introduces the agg method to the AgentSet class, allowing users to apply aggregation functions (e.g., min, max, sum, np.mean) to attributes of agents within the AgentSet. This enhancement makes it easier to compute summary statistics across agent attributes directly within the AgentSet interface.

This will be useful in both the model operation itself as well as for future DataCollector use.

New Method: agg

def agg(self, attribute: str, func: Callable) -> Any:

Parameters:

  • attribute (str): The name of the attribute to aggregate.
  • func (Callable): The aggregation function to apply (e.g., min, max, sum, np.mean).

Returns:

  • The result of applying the aggregation function to the attribute values of all agents in the AgentSet.

Usage Examples

# Get the minimum energy value
min_energy = agentset.agg("energy", min)

# Calculate the total energy of sheep in the model
total_energy = model.get_agents_of_type[Sheep].agg("energy", sum)

# Compute the average wealth of the poorest 10% using numpy
average_wealth = agentset.select(at_most=0.1).agg("wealth", np.mean)

# Custom aggregation function
def custom_func(values):
    return sum(values) / len(values)

custom_avg_energy = agentset.agg("energy", custom_func)

Future work

This function could be expended by:

  • Allowing multiple attribute as input for the functions
  • Allowing multiple pairs of attribute-function to be inputted
  • Allowing multiple pairs of multiple attributes / functions as input

However, this PRs limit the scope to a single attribute and single function, since that's the most common use case.

This commit introduces the `agg` method to the `AgentSet` class, allowing users to apply aggregation functions (e.g., `min`, `max`, `sum`, `np.mean`) to attributes of agents within the `AgentSet`. This enhancement makes it easier to compute summary statistics across agent attributes directly within the `AgentSet` interface.
@EwoutH EwoutH added the feature Release notes label label Sep 2, 2024
Copy link

github-actions bot commented Sep 2, 2024

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small 🔵 +3.3% [+1.9%, +4.8%] 🔵 +0.0% [-0.2%, +0.2%]
BoltzmannWealth large 🔵 +2.7% [-31.1%, +41.3%] 🔵 +2.9% [+1.4%, +4.4%]
Schelling small 🔵 +0.1% [-0.1%, +0.4%] 🔵 +1.5% [+1.2%, +1.8%]
Schelling large 🔵 -0.9% [-1.9%, +0.1%] 🔵 +0.1% [-3.4%, +3.7%]
WolfSheep small 🔵 -1.1% [-2.4%, +0.3%] 🔵 -0.6% [-1.0%, -0.3%]
WolfSheep large 🔵 -2.6% [-3.2%, -2.0%] 🔵 -4.0% [-6.9%, -1.7%]
BoidFlockers small 🔵 +1.6% [+1.2%, +2.1%] 🔵 +0.4% [-0.2%, +1.0%]
BoidFlockers large 🔵 +1.7% [+0.7%, +2.6%] 🔵 +0.8% [-0.0%, +1.6%]

mesa/agent.py Outdated Show resolved Hide resolved
@EwoutH
Copy link
Member Author

EwoutH commented Sep 2, 2024

While it will probably be used most of aggregation, in theory it could also be used for other things. Is it the most fitting name?

@quaquel
Copy link
Member

quaquel commented Sep 2, 2024

I am fine with agg. It is true you can do much more (See also in pandas) but the main use case is to calculate some descriptive statistic on top of a list of attribute values.

Another future pr is to bring agg to the groupby helper class so you can also do this: some_agentset.groupby(type).agg.

@EwoutH
Copy link
Member Author

EwoutH commented Sep 2, 2024

I just realized these are totally equivalent:

# Get the minimum energy value
min_energy = agentset.agg("energy", min)
min_energy = min(agentset.get("energy"))

# Calculate the total energy of sheep in the model
total_energy = model.get_agents_of_type[Sheep].agg("energy", sum)
total_energy = sum(model.get_agents_of_type[Sheep].get("energy"))

# Compute the average wealth of the poorest 10% using numpy
average_wealth = agentset.select(at_most=0.1).agg("wealth", np.mean)
average_wealth = np.mean(agentset.select(at_most=0.1).get("wealth"))

Is there something as too much syntactic sugar?

It does read nicely left to right though.

@quaquel
Copy link
Member

quaquel commented Sep 2, 2024

Not sure about the last example, but yes, on the others, you are correct. If I read the last example correctly, you select 10% of the agents and then take their mean wealth, which is not equivalent to computing the average wealth of the poorest 10% using numpy.
To do what you write requires something like

agentset.sort("wealth").select(at_most=0.1).agg("wealth", np.mean)
np.mean(agentset.sort("wealth").select(at_most=0.1))

So for just one attribute and one aggregation function, we now have 2 ways to do it if we merge this PR. I think the real use case is when you extend this to multipel attributes/and or multiple callables.

@EwoutH
Copy link
Member Author

EwoutH commented Sep 3, 2024

I think the reading from left to right and the fact that this method is documented, and thus reminds people of "oh this is something we can do", provide enough value to add this relatively minor feature to the codebase. It's also consistent with how set() can be used.

We could extend this to allow a single dict input:

.agg({"attribute":  Callable, ["multiple", "attributes"]: Callable})

Then the output would be a list or tuple.

@quaquel
Copy link
Member

quaquel commented Sep 3, 2024

For inspiration, I looked at DataFrame.agg. This allows

  • callable
  • a string that resolves to a function/method on the dataframe (I guess)
  • a list of callables and/or strings,
  • dict

Of course their use case is different because agg applies over all rows/columns, while our use case involves first selecting one or more attributes upon which to apply the aggregation function. My hunch is to allow either 2 argumenst (i.e., an attribute name and. a callable), or a single dict as you suggest.

I like the clean simple case of just passign an attribute/list of attributes and a single callable and would love to preserve that API:

agentset.agg("wealth", calculate_gini)

Alternatively, you just wrap everying into a long list of arguments rather than use a dict:

agentset.agg("wealth", calculate_gini,
             ["attr_a", "attr_b"], some _other_function)

So basically, you allways pass an even number where the odd entry is a string or list of strings, while the even entry is the callable to apply.

I agree that this can come in a later PR.

@EwoutH
Copy link
Member Author

EwoutH commented Sep 3, 2024

So, for now, merge as is?

If so, this PR is ready for final review.

@EwoutH EwoutH merged commit e0d1156 into projectmesa:main Sep 3, 2024
11 of 12 checks passed
EwoutH added a commit to EwoutH/mesa that referenced this pull request Sep 24, 2024
This PR introduces the `agg` method to the `AgentSet` class, allowing users to apply aggregation functions (e.g., `min`, `max`, `sum`, `np.mean`) to attributes of agents within the `AgentSet`. This enhancement makes it easier to compute summary statistics across agent attributes directly within the `AgentSet` interface.

This will be useful in both the model operation itself as well as for future DataCollector use.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Release notes label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants