Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupBy: Add count and agg methods #2290

Merged
merged 4 commits into from
Sep 20, 2024
Merged

GroupBy: Add count and agg methods #2290

merged 4 commits into from
Sep 20, 2024

Conversation

EwoutH
Copy link
Member

@EwoutH EwoutH commented Sep 10, 2024

Added two new methods to the GroupBy class:

  • count: Returns the count of agents in each group.
  • agg: Performs aggregation on a specific attribute across groups, applying a function like sum, min, max, or mean.

Usage examples

  1. count():

    • Returns the count of agents in each group.
    • Usage Example:
      grouped_agents.count()
      Output:
      {
          "group_1": 10,  # 10 agents in group_1
          "group_2": 8,   # 8 agents in group_2
      }
  2. agg(attr_name, func):

    • Aggregates the values of a specific attribute across each group using a provided function (e.g., sum, min, max, mean).
    • Usage Example:
      # Sum the wealth of all agents in each group
      grouped_agents.agg("wealth", sum)
      Output:
      {
          "group_1": 15000,  # Total wealth of agents in group_1
          "group_2": 12000,  # Total wealth of agents in group_2
      }

I'm a bit worried about our AgentSet and now GroupBy exploding our code base.

  1. Is there any built-in functionality or libraries for this? It seems we're implementing very obvious behavior ourselves here.
  2. Do we need to split into multiple files?
  3. How do we keep this properly documented? Do we need guides for this?

(obviously tests are still needed)

@EwoutH EwoutH added the feature Release notes label label Sep 10, 2024
Copy link

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small 🔵 -1.7% [-2.5%, -0.9%] 🔵 -0.4% [-0.5%, -0.3%]
BoltzmannWealth large 🔵 +0.4% [-0.6%, +1.4%] 🔵 +0.0% [-2.9%, +3.3%]
Schelling small 🔵 +0.2% [-0.1%, +0.4%] 🔵 -0.3% [-0.5%, -0.0%]
Schelling large 🔵 -0.4% [-1.1%, +0.3%] 🔵 -2.4% [-3.7%, -1.1%]
WolfSheep small 🔵 -2.0% [-2.9%, -1.2%] 🔵 -2.2% [-2.6%, -1.9%]
WolfSheep large 🟢 -6.0% [-7.3%, -4.9%] 🟢 -12.7% [-14.6%, -10.9%]
BoidFlockers small 🔵 -0.3% [-0.7%, +0.3%] 🔵 +1.2% [+0.6%, +1.8%]
BoidFlockers large 🔵 +0.1% [-0.8%, +1.1%] 🔵 +1.2% [+0.5%, +1.8%]

@quaquel
Copy link
Member

quaquel commented Sep 10, 2024

  1. Not sure you need count. You can already do it via some_agentset.groupby(type).map(len). So, if you are worried about the number of lines of code in the code base, you could leave it out. I personally would not worry too much about the number of lines of code and am open to adding some shorthands for frequently used patterns (like getting the number of agents of a particular type or that meet a given criterion).
  2. agg makes good sense and is a useful addition. I think I even mentioned adding it to groupby while adding it to agentset.

@@ -606,6 +606,31 @@ def do(self, method: Callable | str, *args, **kwargs) -> GroupBy:

return self

def count(self) -> dict[Any, int]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it can be expressed with agg, I am in favour of having a separate method for count. It just makes code more readable and easier to write.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be dict[Hashable, int] as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed this one.

Aren't all dicts by design required to have Hashable as a key? Seems a bit redundant to define that every time.

Like, the int part adds information. The Hashable doesn't (over Any).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please "Unresolve conversation" if you have additional comments by the way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Hashable adds information to the type checker (in an IDE) to constraint user code. Arguably, any dict[Any, *] should be dict[Hashable, *]. I don't have a citation on why this is better. ChatGPT (the fallback version after my quota of GPT-4o has run out) seems to agree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sounds about right.

A smart type checker should know you can’t use non-hashable types as dict keys.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried Mypy:

from typing import Hashable, Any

a: dict[Hashable, int] = {[1]: 1}
a: dict[Any, int] = {[1]: 1}

It caught the first one, but doesn't consider the 2nd one to be a type error.

mesa/agent.py Outdated Show resolved Hide resolved
Added two new methods to the `GroupBy` class to enhance aggregation and group operations:

- `count`: Returns the count of agents in each group.
- `agg`: Performs aggregation on a specific attribute across groups, applying a function like `sum`, `min`, `max`, or `mean`.

These methods improve flexibility in applying both group-level and attribute-specific operations.
- Extend test_agentset_groupby() function
- Include tests for count() method to verify group sizes
- Add tests for agg() method with sum, max, min, and custom functions
- Ensure proper handling of grouped data and attribute aggregation
Change return type annotation of GroupBy.agg() from dict[Any, Any] to dict[Hashable, Any]
@EwoutH
Copy link
Member Author

EwoutH commented Sep 19, 2024

Added tests and cleaned up commits, ready for final review.

@adamamer20 adamamer20 self-requested a review September 20, 2024 06:59
Copy link
Member

@quaquel quaquel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

mesa/agent.py Show resolved Hide resolved
@EwoutH EwoutH merged commit 84881e7 into main Sep 20, 2024
10 checks passed
@rht rht deleted the groupby_funcs branch September 20, 2024 08:35
EwoutH added a commit to EwoutH/mesa that referenced this pull request Sep 24, 2024
Added two new methods to the `GroupBy` class to enhance aggregation and group operations:

- `count`: Returns the count of agents in each group.
- `agg`: Performs aggregation on a specific attribute across groups, applying a function like `sum`, `min`, `max`, or `mean`.

These methods improve flexibility in applying both group-level and attribute-specific operations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Release notes label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants