Skip to content

Commit

Permalink
add large scale simulation readme
Browse files Browse the repository at this point in the history
  • Loading branch information
panxuchen committed Aug 13, 2024
1 parent 21377d0 commit 0572991
Show file tree
Hide file tree
Showing 6 changed files with 131 additions and 11 deletions.
112 changes: 112 additions & 0 deletions examples/paper_large_scale_simulation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Very Large-Scale Multi-Agent Simulation in AgentScope

> **WARNING:**
>
> **This example will consume a huge amount of tokens.**
> **Using paid model API with this example can introduce a high cost.**
> **Users with powerful GPUs (A100 or better) can use local inference services (such as vLLM) to run this example,**
The code under this folder is the experiment of the paper [Very Large-Scale Multi-Agent Simulation in AgentScope](https://arxiv.org/abs/2407.17789).

In the experiment, we set up a large number of agents to participate in the classic game "guess the 2/3 of the average", where each agent reports a real number between 0 and 100 and the agent who reports a number closest to 2
3 of the average of all the reported numbers wins the game.

## Tested Models

Only vLLM local inference service is tested for this example.

This example will consume a huge amount of tokens. Please do not use model API that requires payment.

## Prerequisites

- Have multiple machines (Linux system) with powerful GPUs (A100 or better)
- The distribute version of AgentScope is installed on all machines.
- The v0.4.3 or higher versions of [vLLM](https://github.com/vllm-project/vllm) is installed on all machines.


## Usage

## How to Run

### Step 1: start local inference service

> If you only have one machine and don't have a powerful GPU (A800 or better), you can ignore this step.
You can use `start_vllm.sh` to start vllm inference services on each of your machines.
Before running the script, please set `gpu_num`, `model_path`, `gpu_per_model` and `base_port` properly.

- `gpu_num`: number of GPUs for this machine.
- `model_path`: the model checkpoint path.
- `gpu_per_model`: number of GPUs required for each model
- `base_port`: the starting point of the port number used by the local inference services.

For example, if `base_port` is `8010`, `gpu_num` is `8` and `gpu_per_model` is `4`, 2 inference services will be started, and the port numbers are `8010`, `8014` respectively.

vLLM inference services start slowly, so you need to wait for these servers to actually start before proceeding to the next step.

> The above configuration requires that the model checkpoint can be loaded by a single GPU.
> If you need to use a model that must be loaded by multiple GPUs, you need to modify the script.
### Step 2: Configure the Experiment

Modify the following files according to your environment:

- `configs/model_configs.json`: set the model configs for your experiment. Note that the `config_name` field should follow the format `{model_name}_{model_per_machine}_{model_id}`, where `model_name` is the name of the model, `model_per_machine` is the number of models per machine, and `model_id` is the id of the model (starting from 1).

- `configs/experiment.csv`: set the test cases for your experiment.

- `scripts/start_all_server.sh`: activate your python environment properly in this script.

### Step 3: Run the Experiment

Suppose you have 4 machines whose hostnames are `worker1`, `worker2`, `worker3` and `worker4`, respectively, you can run all your experiment cases by the following command:

```
python benchmark.py -name large_scale -config experiment --hosts worker1 worker2 worker3 worker4
```

### Step 4: View the Results

All results will be saved in `./result` folder, and organized as follows:
```text
result
`-- <benchmark_name>
`-- <model_name>
`-- <settings>
|-- <timestamp>
| |-- result_<round_num>.json # the raw text result of round <round_num>
| `-- result_<round_num>.pdf # the distribution histogram of round <round_num>
`-- <timestamp>
|-- result_<round_num>.json
`-- result_<round_num>.pdf
```

And during the experiment, you can also view the experiment results on the command line.
```text
2024-08-13 06:20:40.028 | INFO | participant:_generate_participant_configs:525 - init 100 random participant agents...
2024-08-13 06:20:40.028 | INFO | participant:_init_env:574 - init 1 envs...
2024-08-13 06:20:40.171 | INFO | participant:_init_env:603 - [init takes 0.1432037353515625 s]
Moderator: The average value is 49.70 [takes 1.130 s]
Moderator: The average value is 48.44 [takes 1.125 s]
Moderator: The average value is 47.81 [takes 1.129 s]
Moderator: Save result to ./result/studio/qwen2_72b/1-1-100-1-0.667/2024-08-13-06:20:43
```

## References

```
@article{agentscope_simulation,
title={Very Large-Scale Multi-Agent Simulation in AgentScope},
author={Xuchen Pan and
Dawei Gao and
Yuexiang Xie
and Zhewei Wei and
Yaliang Li and
Bolin Ding and
Ji-Rong Wen and
Jingren Zhou},
journal = {CoRR},
volume = {abs/2407.17789},
year = {2024},
}
```
8 changes: 6 additions & 2 deletions examples/paper_large_scale_simulation/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,9 +94,12 @@ def load_exp_config(cfg_path: str) -> list:
return configs


def main(name: str = None, config: str = None) -> None:
def main(
name: str = None,
hosts: list[str] = None,
config: str = None,
) -> None:
"""The main function of the benchmark"""
hosts = ["worker1", "worker2", "worker3", "worker4"]
configs = load_exp_config(config)
for cfg in configs:
run_case(
Expand Down Expand Up @@ -128,5 +131,6 @@ def main(name: str = None, config: str = None) -> None:
args = parser.parse_args()
main(
name=args.name,
hosts=args.hosts,
config=os.path.join("./configs", f"{args.config}.csv"),
)
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
participant_num,agent_type,agent_server_num,env_server_num,model_per_host,model_name,sys_id,usr_id,host_num,ratio,round
8,random,4,1,2,qwen2_72b,1,1,1,2/3,3
100,random,4,1,2,qwen2_72b,1,1,1,2/3,3
2 changes: 1 addition & 1 deletion examples/paper_large_scale_simulation/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def setup_participant_agent_server(host: str, port: int) -> None:
save_api_invoke=False,
model_configs="configs/model_configs.json",
use_monitor=False,
logger_level="INFO",
logger_level="ERROR",
save_dir=SAVE_DIR,
)
assistant_server_launcher = RpcAgentServerLauncher(
Expand Down
12 changes: 8 additions & 4 deletions examples/paper_large_scale_simulation/participant.py
Original file line number Diff line number Diff line change
Expand Up @@ -375,7 +375,6 @@ def run(self, round: int, winner: float) -> tuple:
self.cnt += 1
except Exception as e:
print(e)
logger.info(f"sum: {self.sum}, cnt: {self.cnt}")
return (self.sum, self.cnt)


Expand All @@ -400,14 +399,12 @@ def save_result(
ratio: str = "2/3",
) -> None:
"""Save the result into file"""
print(f"Round: {len(results)}")
os.makedirs(save_path, exist_ok=True)
import numpy as np
from matplotlib import pyplot as plt

for r, result in enumerate(results):
values = [v["value"] for v in result.values()]
logger.info(f"get {len(values)} values")
win = np.mean(values) * RATIO_MAP[ratio]
stats = {
"win": win,
Expand Down Expand Up @@ -628,7 +625,7 @@ def step(self) -> None:
Msg(
name="Moderator",
role="assistant",
content=f"The average value is {summ / cnt :.2f} [takes {et - st :.3f} s]",
content=f"The average value of round {self.round + 1} is {summ / cnt :.2f} [takes {et - st :.3f} s]",
),
)

Expand All @@ -650,6 +647,13 @@ def record(self, run_time: float) -> None:
_get_timestamp(format_="%Y-%m-%d-%H:%M:%S"),
)
save_result(result, run_time, save_path, self.ratio)
log_msg(
Msg(
name="Moderator",
role="assistant",
content=f"Save result to {save_path}",
),
)

def run(self) -> None:
"""Run the game"""
Expand Down
6 changes: 3 additions & 3 deletions examples/paper_large_scale_simulation/scripts/start_vllm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@

# default values
gpu_num=8
model_per_gpu=1
model_path="/home/data/shared/checkpoints/llama3/llama3-8b-instruct"
gpu_per_model=1
model_path=<your_model_path>
base_port=8010

touch .vllm_pid
mkdir -p log

for ((i=0; i < ${gpu_num}; i=i+{model_per_gpu})); do
for ((i=0; i < ${gpu_num}; i=i+{gpu_per_model})); do
port=$((base_port + i))
export CUDA_VISIBLE_DEVICES=$i
python -m vllm.entrypoints.openai.api_server --model "${model_path}" --port ${port} --enforce-eager > log/vllm-${port}.log 2>&1 &
Expand Down

0 comments on commit 0572991

Please sign in to comment.