Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AgentSet: Allow selecting a fraction of agents in the AgentSet #2253

Merged
merged 16 commits into from
Aug 30, 2024

Conversation

EwoutH
Copy link
Member

@EwoutH EwoutH commented Aug 28, 2024

This PR updates the select method in the AgentSet class by replacing the n parameter with a more versatile at_most parameter. The at_most parameter allows for selecting either a specific number of agents or a fraction of the total agents when provided as an integer or a float, respectively. Additionally, backward compatibility is maintained by supporting the deprecated n parameter, which will trigger a warning when used.

Motive

Previously, the select method only allowed users to specify a fixed number of agents (n) to be selected. The new at_most parameter extends this functionality by enabling the selection of agents based on a proportion of the total set, which is particularly useful in scenarios where relative selection is desired over absolute selection.

Implementation

  • at_most Parameter:
    • Accepts either an integer (to select a fixed number of agents) or a float between 0.0 and 1.0 (to select a fraction of the total agents).
    • at_most=1 selects one agent, while at_most=1.0 selects all agents.
    • If a float is provided, it determines the maximum fraction of agents to be selected from the total set. It rounds down to the nearest number of whole agents.
  • Backward Compatibility:
    • The deprecated n parameter is still supported, but it now serves as a fallback for at_most and triggers a deprecation warning.
  • Behavior Notes:
    • at_most serves as an upper limit on the number of selected agents. If additional filtering criteria are provided, the final selection may include fewer agents.
    • For random sampling, users should shuffle the AgentSet before applying at_most.

Usage Examples

# Select the first 5 agents from the AgentSet
selected_agents = agents.select(at_most=5)
selected_agents = agents.select(n=5)  # Still works but throws a deprecation warning
# Select the first 20% of agents from the AgentSet (as currently sorted, rounded down)
selected_agents = agents.select(at_most=0.2)

To randomly select a fraction, add a shuffle():

# Select 20% of agents randomly from the AgentSet
random_agents = agents.shuffle().select(at_most=0.2)

Combining with sorting:

# Select the 20% of agents with the lowest wealth
selected_agents = agents.sort("wealth", ascending=True).select(at_most=0.2)

The most powerful feature is that you can combine at_most with additional criteria:

# Select agents with "wealth" less than 5, and at most 20% of the total
selected_agents = agents.select(lambda agent: agent.wealth < 5, at_most=0.2)

# First filter agents, then select 20% of those remaining
filtered_agents = agents.select(lambda agent: agent.wealth < 5).select(at_most=0.2)

You can also use it with chaining:

# Randomly select 40% of the agents from the AgentSet and set a value
model.agents.shuffle().select(at_most=0.4).set('has_license', True)

@EwoutH EwoutH added the enhancement Release notes label label Aug 28, 2024
Copy link

Performance benchmarks:

@quaquel
Copy link
Member

quaquel commented Aug 28, 2024

what is the motivation for adding this to the agentset?

@EwoutH EwoutH mentioned this pull request Aug 28, 2024
@Corvince
Copy link
Contributor

That seems useful, thanks!

The only worry I have is how this behaves if a user specifies both n and p. That probably should raise an error?

Or maybe there is a good name that could incorporate both p and n? So if it is between 0 and 1 use a fraction and if it is a whole number above 1 use that number?

@EwoutH
Copy link
Member Author

EwoutH commented Aug 28, 2024

what is the motivation for adding this to the agentset?

Sorry, was still working on other features (and my actual model), wrote it up.

That seems useful, thanks!

The only worry I have is how this behaves if a user specifies both n and p. That probably should raise an error?

Yeah I was thinking about that. Maybe just don't do that (and we mention it in the docstring)?

If you just want to select a fraction of n, you can do n=round(n*p), so having both doesn't make sense.

Or maybe there is a good name that could incorporate both p and n? So if it is between 0 and 1 use a fraction and if it is a whole number above 1 use that number?

Very interesting idea, but maybe in this case explicit is better than implicit. Except if you can come up with a killer name.

@quaquel
Copy link
Member

quaquel commented Aug 28, 2024

I like the clarity of p. So my suggestion would be to raise a value error if both n and p are passed

mesa/agent.py Outdated Show resolved Hide resolved
mesa/agent.py Outdated Show resolved Hide resolved
mesa/agent.py Outdated Show resolved Hide resolved
@quaquel
Copy link
Member

quaquel commented Aug 28, 2024

see the few minor comments and once unit tests are added, this is good to go.

@EwoutH
Copy link
Member Author

EwoutH commented Aug 28, 2024

Okay, I:

  • Changed p to fraction
  • Used the ValueError
  • Updated the other docstring, including notes
  • Added tests
  • Updated the examples

However, I noticed that there's an important difference between n and fraction. n is always fixed, it's just an upper limiter. fraction does matter when you apply it, before or after the rest of the selection.

Currently fraction is interpreted as a fraction of the input AgentSet. When writing the usage examples that felt really counter intuitive. It would be more logical if you could apply it afterwards, such that a fraction of the selected AgentSet is returned.

Why? Because if you take these two use cases:

  • Select the agents with "wealth" less than 5 but at most 20% of total agents
  • Select the agents with "wealth" less than 5, and then 20% of those agents

The latter is used way more than the former. And it will be way more logical if you select by type.

So I would suggest applying fraction afterwards, on the selected AgentSet after all other operations are done. Then you could still do both:

# Select the agents with "wealth" less than 5, and at most 20% of total agents
agents.select(fraction=0.2).select(lambda agent: agent.wealth < 5)

# Select the agents with "wealth" less than 5, and then 20% of those agents
agents.select(lambda agent: agent.wealth < 5, fraction=0.2)
# or, equivalently:
agents.select(lambda agent: agent.wealth < 5).select(fraction=0.2)

But now the one that's more used and more intuitive will go well by default.


Totally other options could be:

  • Don't allow fraction and/or n with other functions, but enforce chaining
  • Introduce a new method, like sample, that give a sample of n or a sample of fraction.

@rht
Copy link
Contributor

rht commented Aug 28, 2024

what is the motivation for adding this to the agentset?

@EwoutH I'm also wondering about this. Not saying that this shouldn't be in the library, but a concrete example could give some illustration. Is this used in your project?

@EwoutH
Copy link
Member Author

EwoutH commented Aug 28, 2024

This was the thing I wanted to do:

# Randomly select 40% of the agents from the AgentSet and give them a license
model.agents.shuffle().select(fraction=0.4).set('has_license', True)

I needed to do this:

n_license = round(model.agents * license_chance)
model.agents.shuffle().select(n=n_license).do(lambda agent: setattr(agent, 'has_license', True))

With #2254 it got simplified to:

n_license = round(model.agents * license_chance)
model.agents.shuffle().select(n=n_license).set('has_license', True)

It's not a huge use case, but it's nice. Especially that you don't need to break the chain.

Combine it with a function and it get's really powerful though. Assume I want to distribute some cars around (I know a certain percentage of all people has a car), but only to agents with licenses.

agents.select(lambda a: a.has_license, fraction=car_chance).set('has_car', True)

Without the fraction, this would have been:

n_car = round(model.agents * car_chance)
model.agents.shuffle().select(n=n_car ).set('has_license', True)

So yeah, it's not a huge use case. Maybe it adds some complexity.


There's an unique application for fraction as upper limit (cap), as currently implemented, and a unique application for doing it afterwards. I need to think about this a bit longer.

@EwoutH
Copy link
Member Author

EwoutH commented Aug 28, 2024

Right, n=0 has a special status. With a small fraction or small agentset, n can become 0, returning all agents.

@Corvince
Copy link
Contributor

Right, n=0 has a special status. With a small fraction or small agentset, n can become 0, returning all agents.

Good catch!

I see two possibilities now. Either just change the special meaning from 0 to -1. I don't know if there was a good use case for 0, but it's rather strange for 0 to indicate all agents.

The more holistic approach would be to split select into a filter function and a sample function. This would also simplify the logic and solve the "before or after" question (which was present but unconsidered before fraction was introduced)

@EwoutH
Copy link
Member Author

EwoutH commented Aug 29, 2024

The brain is so interesting that after a nights sleep you look at it again and you think oh, and it all clicks together.

Now I just have to write it up, rewrite the codes, tests and examples.

Can’t wait for 2026/2027 where with a voice message a bit does that automatically.


Long story short: There’s a special use case for when filtering, you want a certain number or fraction at most. Especially the fraction should happen right there in the function, because after the function is done, you don’t know how large the

For all other cases (before, after) a sample method would be perfect (and can be implemented pretty fast I think). sample could also draw a random sample, where select selects the first n/fraction.

Or maybe there is a good name that could incorporate both p and n? So if it is between 0 and 1 use a fraction and if it is a whole number above 1 use that number?

Obviously the way to go. I was thinking max, limit, ceiling or at_most.

@Corvince
Copy link
Contributor

Corvince commented Aug 29, 2024

Agreed on the performance aspect. One way to solve this but keep the chainable approach would be to use generator functions to return iterators instead of the complete AgentSet. But maybe as you said this is all mainly catered towards nice semantics and there are other ways already available for performance critical operations.

That's an interesting idea worth exploring at some point (but not this PR). Basically, what if we have a generator interface to an AgenSet? And can we make a chainable API work with generators?

I think having an __iter__ method is kind of enough, so

(agent for agent in agentset)

should already give you an iterator over the agentset. Definitely worth exploring that more, but certainly way out of scope for this PR

//Edit
Ah, sorry, didn't think this through. Definitely needs more thought on the possibility to make this chainable. This if course only iterates over the agents themselves

n is removed with a fallback

max (int | float, optional): The maximum amount of agents to select. Defaults to infinity.
  - If an integer of 1 or larger, the first n matching agents are selected.
  - If a float between 0 and 1, at most that fraction of original the agents are selected.
@EwoutH
Copy link
Member Author

EwoutH commented Aug 29, 2024

I updated this PR to replace n with max.

max (int | float, optional): The maximum amount of agents to select. Defaults to infinity.

  • If an integer of 1 or larger, the first n matching agents are selected.
  • If a float between 0 and 1, at most that fraction of original the agents are selected.

Some details:

  • max=1 will give one agent, max=1.0 gives all agents.
  • A fallback for n was added, which does max = n and throws a warning.

Tests are updated. Please double check the internal agent_generator function.

If we decide this is the way to go, I will update the PR description.


I plan on adding a separate sample() function that implements max in the same way, including with a shuffle=True option. Fun fact: sample(n, shuffle=True) will be equivalent to NetLogo's up-to-n-of. @quaquel I know you hate NetLogo with all your hearth, but sometimes you can learn a lot from them ;).

But that would be separate PR.

@quaquel
Copy link
Member

quaquel commented Aug 30, 2024

I am unsure about using a single keyword for both the number and the percentage, but I won't object to it either. I would change the name, however. max shadows the name of a build-in.

It would be nice to see a quick overview of what the API is now becoming just for clarity.

sample(n, shuffle=True) will be equivalent to NetLogo's up-to-n-of. @quaquel I know you hate NetLogo with all your hearth, but sometimes you can learn a lot from them ;).

I hate the language, but, yes, we can pick up useful ideas and give them a better name. sample is much better than that weird construct with hyphens in the name 😉.

@EwoutH
Copy link
Member Author

EwoutH commented Aug 30, 2024

I was thinking max, limit, ceiling or at_most.

Any suggestions (either these or another)?

@Corvince
Copy link
Contributor

I like at_most the best. It conveys that "n" can be arbitrary large, but must the number of returned agents must not match. It also sort of implies that you first apply a filter and then take a sample. And it also makes the rounding clear for fractions. So 1/3 of 5 (1.67) will be 1 agent, otherwise it would be more than 1/3.

@EwoutH
Copy link
Member Author

EwoutH commented Aug 30, 2024

So 1/3 of 5 (1.67) will be 1 agent, otherwise it would be more than 1/3.

Currently it does round, do you think it shouldn't?

@Corvince
Copy link
Contributor

If its an upper limit I think it should always round down/floor

@EwoutH
Copy link
Member Author

EwoutH commented Aug 30, 2024

Difficult one. Because if you describe it as "selecting a fraction" I would expect it to select the closest match.

I think in many practical scenarios the closest selection to the fraction you wanted is most logical.

@quaquel
Copy link
Member

quaquel commented Aug 30, 2024

If we go with at_most, it should round down in the case of fractions. Otherwise, the name and behavior don't match.

@Corvince
Copy link
Contributor

Valid argument for "selecting a fraction", but for selecting "at most" 33% I would not expect it to select 40%

@Corvince
Copy link
Contributor

If we go with at_most, it should round down in the case of fractions. Otherwise, the name and behavior don't match.

Exactly. Thats why I think its a good name (if we floor), because people will always have different expectations for "selecting a fraction" with respect to rounding.

@EwoutH
Copy link
Member Author

EwoutH commented Aug 30, 2024

I renamed max to at_most, made sure it rounded down, and updated the tests.

@EwoutH EwoutH added breaking Release notes label deprecation When a new deprecation is introduced and removed breaking Release notes label labels Aug 30, 2024
@EwoutH
Copy link
Member Author

EwoutH commented Aug 30, 2024

PR description is updated, including the usage examples

mesa/agent.py Outdated Show resolved Hide resolved
mesa/agent.py Outdated Show resolved Hide resolved
@EwoutH
Copy link
Member Author

EwoutH commented Aug 30, 2024

@projectmesa/maintainers ready to go? (would like to merge myself)

mesa/agent.py Outdated Show resolved Hide resolved
@EwoutH EwoutH merged commit efa51cd into main Aug 30, 2024
9 of 10 checks passed
@EwoutH
Copy link
Member Author

EwoutH commented Aug 30, 2024

(keeping the branch in case of regressions)

@EwoutH EwoutH deleted the select_fraction branch September 20, 2024 09:09
EwoutH added a commit to EwoutH/mesa that referenced this pull request Sep 24, 2024
This PR updates the `select` method in the `AgentSet` class by replacing the `n` parameter with a more versatile `at_most` parameter. The `at_most` parameter allows for selecting either a specific number of agents or a fraction of the total agents when provided as an integer or a float, respectively. Additionally, backward compatibility is maintained by supporting the deprecated `n` parameter, which will trigger a warning when used.

### Motive
Previously, the `select` method only allowed users to specify a fixed number of agents (`n`) to be selected. The new `at_most` parameter extends this functionality by enabling the selection of agents based on a proportion of the total set, which is particularly useful in scenarios where relative selection is desired over absolute selection.

### Implementation
- **`at_most` Parameter:** 
  - Accepts either an integer (to select a fixed number of agents) or a float between 0.0 and 1.0 (to select a fraction of the total agents).
  - `at_most=1` selects one agent, while `at_most=1.0` selects all agents.
  - If a float is provided, it determines the maximum fraction of agents to be selected from the total set. It rounds down to the nearest number of whole agents.
- **Backward Compatibility:**
  - The deprecated `n` parameter is still supported, but it now serves as a fallback for `at_most` and triggers a deprecation warning.
- **Behavior Notes:**
  - `at_most` serves as an upper limit on the number of selected agents. If additional filtering criteria are provided, the final selection may include fewer agents.
  - For random sampling, users should shuffle the `AgentSet` before applying `at_most`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprecation When a new deprecation is introduced enhancement Release notes label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants