Python project using Q-learning for Logistics simulation
The goal is to take te best possible selection of lifters between the farmer and the supermarkt. Currently the reward system is based on the markup that the lifter makes for their services
The results of the code show that D is avoided and G is always part of the selection