Skip to content

Accepted by AROB 2021. A car-agent navigates in complex traffic conditions by Mixed_Input_PPO_CNN_LSTM model.

Notifications You must be signed in to change notification settings

ZHONGJunjie86/Mixed_Input_PPO_CNN_LSTM_Car_Navigation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mixed_Input_PPO_CNN_LSTM_Car_Navigation

A car-agent navigates in complex traffic conditions by Mixed_Input_PPO_CNN_LSTM model.

Accepted by AROB 2021

Find my paper Generation of Traffic Flows in Multi-Agent Traffic Simulation with Agent Behavior Model based on Deep Reinforcement Learning.

Feature extraction and image inverse generation

image image

Partially Observable Markov Games

In this work, I consider an agent extension of Markov decision processes(MDPs) called partially observable Markov games.
Every cycle the agent will obtain an observation which makes the agent become the image's center.
And the inverse generated images are extracted by features of which the agent should be careful. For example, the front cars and behind cars.

Mixed input architecture

image

Sequential data && LSTM

Input [real_speed/10, target_speed/10, elapsed_time_ratio,reward,done,time_pass,over]
Station representation: [real_speed/10, target_speed/10, elapsed_time_ratio,]
It's notable that the data elements have some relation rather than random distribute.
The target_speed is a constant value while the elapsed_time_ratio and distance_to_goal are monotonically increasing or monotonically decreasing data.
So we can consider to use LSTM, a kind of Recurrent Neural Network(RNN), can find temporal relationship between datas.
To comfirm this, I input [t-2,t] three datas in a bunch once time. Also applies to images.
And the LSTM layers will use (h_t-1, c_t-1) hidden sate for time t.

Traffic conditions && Collision Detection

When a car-agent navigates on the road, it may encounter with other cars.
In some conditions, the acceleration chosen by car-agent will cause jam or collision.
Since the condition will come very complex and the GAMA simulator has no idea about the collision so I have to make collision detection or jam detection.
Here will choose the closest 10 cars around the agent and calculate the distances.
These equations are neccessary. And here will use Euclidean distance for safe driving.

On the same road

First, the agent compute the useful distances (There will be distance of the behind car or distance of the front car).
And then detections will be executed after the agent choose acceleration to detecte whether the acceleration will cause jams or collisions.
A unit of time is 1-cycle.

Collision Detection

When there is an another car is in front of the car-agent when the two cars are on the same road, if

the acceleration will be supposed to cause collision with the front cars. (The front cars maybe more than one.)

Jam Detection

When there is an another car is behind of the car-agent when the two cars are on the same road, if

the acceleration will be supposed to cause jam with the behind cars. (The behind cars maybe more than one.)

Jam

image

On the different road

The calculation process is the same as the conditions on the same road.But the conditions become very complex.
The closest 10 cars will on the same road with the agnet?
If so, will the cars be the front of the agent or behind of the agent?
These conditions will be detected clear in the gaml file.

Station representation

[real_speed/10, target_speed/10, elapsed_time_ratio, distance_to_goal/100,distance_front_car/10,distance_behind_car/10]

Action representation

The network's output are accelerations which are constricted between [-6,6]m/s^2 to be closer to the real situations.

Reward shaping

 Output acceleration.  Action representation [acceleration].  The car will learn to control its acceleration with the restructions shown below:
 Reward shaping:

  • rt = r terminal + r danger + r speed
  • r terminal: -0.013(target_speed > real_speed) or -0.1(target_speed < real_speed):crash / time expires
  • r speed: related to the target speed
  • if sa ≤st:0.001 - 0.004*((target_speed-Instantaneous_speed)/target_speed);
      if distance_front_car_before <= safe_interval or time_after_safe_interval>0:0.001*(Instantaneous_speed/target_speed);
      Time_after_safe_interval can be extented when the front cars within safe_interval.
  • if sa > st: 0.001 - 0.006*((Instantaneous_speed-target_speed)/target_speed);

 In my experiment it's obviously I desire the agent to learn controling its speed around the target-speed.

Result

 It's obvoiusly that the LSTM can be trained much better than models without LSTM.

Actor-Ctitic 2 LSTM

image

Actor-Ctitic 0 LSTM

image

PPO2

TD

MC

Actor Critic (TD)

 

About GAMA

 The GAMA is a platefrom to do simulations.
 I have a GAMA-modle named "PPO_Mixedinput_Navigation.gaml", which is assigned a car and some traffic lights. The model will sent some data
 [real_speed, target_speed, elapsed_time_ratio, distance_to_goal,reward,done,time_pass,over]
 as a matrix to python environment, calculating the car's accelerate by A2C. Applying to the Markov Decision Process framework, the car in the GAMA will take up the acceleration and send the latest data to python over and over again until reaching the destination.

About

Accepted by AROB 2021. A car-agent navigates in complex traffic conditions by Mixed_Input_PPO_CNN_LSTM model.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published