A simple Q-learning implementation in OpenAI Gym's Taxi-v3 environment.
The "Taxi-v3" environment simulates a simple grid world in which an autonomous taxi operates.
Here are the key characteristics and components of the "Taxi-v3" environment:
-
Grid World
- The environment consists of a grid world with a fixed size.
- The grid world includes walls, empty cells, passenger locations, a taxi, and a destination location.
-
Taxi and Passengers
- The taxi is represented as a yellow rectangle, and its position on the grid can change.
- Passengers are represented by letters (R, G, Y, and B) that indicate their destinations.
- Passengers are initially located at random positions within the grid.
-
Destination Locations
- There are four fixed destinations represented by the corresponding letters (R, G, Y, and B).
- Passengers request rides to one of these destinations.
-
Objective
- The objective of the taxi is to pick up passengers and drop them off at their requested destinations while navigating the grid world.
The agent (the taxi) receives a reward for successful pickups and drop-offs and penalties for illegal moves, such as hitting walls or attempting to pick up passengers who are not at the designated location.
The action space consists of discrete actions that the taxi can take, including moving in four directions (up, down, left, right), picking up a passenger, and dropping off a passenger.