Using neural networks to master the tile-laying game of Azul. Exploring the applications of TD(λ) reinforcement learning to create an agent with possibly integrated human apriori knowledge to accelerate the sample inefficiency problem of RL.
These are the simplifications made to the rules of Azul for self-playing dataset generation:
-
The bag tile is endless
-
Random is picked in the following order: Pick factory, pick tile_type, pick board row.
-
Cannot make pointless moves of [0], except picking the ONE/NULL tile in the middle pool