main.py
provides code to evaluate the actions of an agent given a specific grid environment.
data/grids
directory contains all the grid instances. Each jsonl
file contains 100 grid instances. The name of each file indicates the specification of the grid instances in the file. inner
and outer
denote which agent starting position configuration the grid used. random
, updown
, leftright
, cluster
, and spiral
indicate which energy distribution was used to generate the grid instances. leftright
means horizontally-skewed distribution and updown
means vertically-skewed distribution. free
and block
indicate if there is any obstacle in the grid instances, with free
indicating no block.
Each row of the jsonl
file contains one JSON object, with the following keys: index
, energy
, energy_probability
, obstacle
, start_position
, grid
, start
. The energy_probability
indicates the distribution-dependent probability used to generate the grid instance. start
indicates the agent starting position in (x, y)
coordinates.
data/results
directory contains the action answers for both the two baseline agents and the two LLM-based agents. Each file in both data/results/greedy_results
and data/results/random_results
directories corresponds with the file in the data/grids
directory with the same name.
Each row of the jsonl
file in both data/results/greedy_results
and data/results/random_results
contains one JSON object, with the following keys: baseline
, index
, answer
, movement_prompt
, energy_limit_prompt
, cost_of_step_prompt
. The index
indicates which grid environment is used, which is the grid environment with the same index number in the corresponding file in the data/grids
directory. answer
contains the list of actions that the agent produced. movement_prompt
indicates which movement-related action set the agent is allowed when being evaluated, with 4
indicating the μ 1 set and 8
indicating the μ 2 set. energy_limit_prompt
indicates the limit on the amount of energy that the agent is allowed to carry when being evaluated, with 100
indicating no limit and 2
indicating a limit of 2 units of energy. cost_of_step_prompt
indicates the energy cost per step the agent is subjected to when being evaluated.
data/results/batch_gpt4o_output.jsonl
contains the responses from GPT-4o and data/results/batch_gpt35_output.jsonl
contains the responses from GPT-3.5-Turbo. Each row in both files is a JSON object with the following keys: id
, custom_id
, response
, error
. The id
is from the OpenAI. custom_id
indicates which grid environment is used and what agent constraints the agent is subjected to when being evaluated. The id comprises elements concatenated by -
in the following order: fileName
, index
, movement_prompt
, energy_limit_prompt
, cost_of_step_prompt
. The fileName
is the name of the file in the data/grids
directory. The index
indicates which grid environment is used, which is the grid environment with the same index number in the fileName
. movement_prompt
indicates which movement-related action set the agent is allowed when being evaluated, with 0
indicating the μ 1 set and 1
indicating the μ 2 set. energy_limit_prompt
indicates the limit on the amount of energy that the agent is allowed to carry when being evaluated, with 0
indicating no limit and 1
indicating a limit of 2 units of energy. cost_of_step_prompt
indicates the energy cost per step the agent is subjected to when being evaluated, with 0
indicating no cost and 1
indicating a cost of 0.3 units of energy per step response
contains the full response from the API. error
indicates if there is any error. Similarly, data/results/gpt_o1_mini_output.jsonl
contains the responses from GPT-o1-mini.
data/batch_gpt4o.jsonl
and data/batch_gpt35.jsonl
contain the batched API requests for GPT-4o and GPT-3.5-Turbo, respectively. data/gpt_o1_mini.jsonl
contain the API requests for GPT-o1-mini.