A collection of environments for highway driving and tactical decision-making tasks
An episode of one of the environments available in higwhay-env.
pip install --user git+https://github.com/eleurent/highway-env
import higwhay_env
env = gym.make("highway-v0")
done = False
while not done:
action = ... # Your agent code here
obs, reward, done, _ = env.step(action)
env.render()
env = gym.make("highway-v0")
In this task, the ego-vehicle is driving on a multilane highway populated with other vehicles. The agent's objective is to reach a high velocity while avoiding collisions with neighbouring vehicles. Driving on the right side of the road is also rewarded.
env = gym.make("highway-merge-v0")
On this task, the ego-vehicle starts on a main highway but soon approaches a road junction with incoming vehicles on the access ramp. The agent's objective is now to maintain a high velocity while making room for the vehicles so that they can safely merge in the traffic.
The highway-merge-v0 environment.
New highway driving environments can easily be made from a set of building blocks.
A Road
is composed of several Lanes
and a list of Vehicles
. The Lanes are described by their center line curve and local coordinate system.
The vehicles dynamics are represented in the Vehicle
class by a bicycle model.
dx = v*cos(psi)
dy = v*sin(psi)
dv = a
dpsi = v/l*tan(beta)
Where (x, y) is the vehicle position, v its forward velocity and psi its heading. a is the acceleration command and beta is the slip angle at the center of gravity, used as a steering command.
The ControlledVehicle
class implements a low-level controller on top of a Vehicle
, allowing to track a given target velocity and follow a target lane.
The vehicles populating the highway follow simple and realistic behaviours that dictate how they accelerate and steer on the road.
In the IDMVehicle
class,
- Longitudinal Model: the acceleration of the vehicle is given by the Intelligent Driver Model (IDM) from (Treiber et al, 2000).
- Lateral Model: the discrete lane change decisions are given by the MOBIL model from (Kesting et al, 2007).
In the LinearVehicle
class, the longitudinal behavior is defined as a linear weighting of several features, such as the distance and velocity difference to the leading vehicle.
Agents solving the highway-env
environments are available in the RL-Agents repository.
pip install --user git+https://github.com/eleurent/rl-agents
The DQN agent solving highway-v0.
This model-free reinforcement learning agent performs Q-learning with function approximation, using a neural network to represent the state-action value function Q.
The Value Iteration agent solving highway-v0.
The Value Iteration is only compatible with finite discrete MDPs, so the environment is first approximated by a finite-mdp environment using env.to_finite_mdp()
. This simplified state representation describes the nearby traffic in terms of predicted Time-To-Collision (TTC) on each lane of the road. The transition model is simplistic and assumes that each vehicle will keep driving at a constant velocity without changing lanes. This model bias can be a source of mistakes.
The agent then performs a Value Iteration to compute the corresponding optimal state-value function.
This agent leverages a transition and reward models to perform a stochastic tree search (Coulom, 2006) of the optimal trajectory. No particular assumption is required on the state representation or transition model.