Skip to content

acedesci/encode-attend-navigate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Encode, Attend & Navigate

Overview

Tensorflow implementation of "Learning Heuristics for the TSP by Policy Gradient".
[Michel Deudon, Pierre Cournut, Alexandre Lacoste, Yossiri Adulyasak, Louis-Martin Rousseau].

Requirements

Usage

  • To train a model from scratch (data is generated on the fly), run blocks 1.DataGenerator, 2.Config, 3.Model and 4.Train with the Jupyter Notebook (Neural_Reinforce.ipynb). You can change parameters in the Config block. Default parameters should replicate results reported in our paper (2D TSP50).

  • If training is successful, the model will be saved in a "save" folder (filename depends on config) and training statistics will be reported in a "summary" folder. To visualize training on tensorboard, run:

> tensorboard --logdir=summary
  • To test a trained model, run block 5.Test with the Jupyter Notebook (Neural_Reinforce.ipynb).

What is Combinatorial Optimization ?

  • Combinatorial Optimization: A topic that consists of finding an optimal object from a finite set of objects.
  • Sequencing problems: The best order for performing a set of tasks must be determined.
  • Applications: Manufacturing, routing, astrology, genetics...

Can we learn data-driven heuristics, competitive with existing man-engineered heuristics ?

What is Deep Reinforcement Learning ?

  • Reinforcement Learning: A general purpose framework for Decision Making in a scenario where a learner actively interacts with an environment to achieve a certain goal.
  • Deep Learning: A general purpose framework for Representation Learning
  • Successful applications: Playing games, navigating worlds, controlling physical systems and interacting with users.

Related Work

Our work draws inspiration from Neural Combinatorial Optimization with Reinforcement Learning to solve the Euclidean TSP. Our framework gets a 5x speedup compared to the original framework, while achieving similar results in terms of optimality.

Architecture

Following Bello & al., 2016, our Neural Network overall parameterizes a stochastic policy over city permutations. Our model is trained by Policy Gradient (Reinforce, 1992) to learn to assign high probability to "good tours", and low probability to "undesirable tours".

Neural Encoder

Our neural encoder takes inspiration from recent advances in Neural Machine Translation The purpose of our encoder is to obtain a representation for each action (city) given its context.

The output of our encoder is a set of reference vectors ref = (enc1, ..., encn), each representing a city interacting with other cities.

Neural Decoder

Similar to Bello & al., 2016, our Neural Decoder uses a Pointer to effectively point to a city given a trajectory. Our model however explicity forgets after K steps, dispensing with LSTM networks.


Local Search

We use a simple 2-OPT post-processing to clean best sampled tours during test time. One contribution we would like to emphasize here is that simple heuristics can be used in conjunction with Deep Reinforcement Learning, shedding light on interesting hybridization between Artificial Intelligence (AI) & Operations Research (OR).

Results

tsp1000.1

tsp100.2

We evaluate on TSP100 our model pre-trained on TSP50 and the results show that that it performs relatively well even though the model was not trained directly on the same instance size as in Bello & al, 2016.

Acknowledgments

Special thanks (chronological order):

Author

Michel Deudon / @mdeudon

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published