The program is able to run several instances of an algorithm on the Rubik's cube restricted to only 180 degree turns (double turns). Through providing arguments, the following parameters for the problem can be set:
- The learning algorithm to perform on the problem.
- The action selection policy used by the algorithm.
- The number of instances of the algorithm to run.
- The number of episodes per instance.
And optionally:
- Epsilon greedy and Q-Learning parameters:
- Learning rate (Alpha) - Default 0.4
- Discount factor (Gamma) - Default: 0.95
- MENACE Approach parameters:
- Base factor (Lambda) - Default: 0.78
- Reward (Reward) - Default: 1
- Action Selection parameters:
- Epsilon greedy (Epsilon) - Default: 0.2
- Softmax temperature (Tau) - Default: 0.2
- Simulated annealing (Temperature scale) - Default: 0.2
A more detailed description of how to run the program with these parameters is described in: Run the program.
The code can be compiled through:
gcc *.c -o cube -O3 -lm
For support on running multiple threads (OpenMp), add flag:
-fopenmp
The program can be run through:
./cube <Algorithm> <Policy> <# Instances> <# Episodes> [Param1] [Param2] [Param3]
The arguments need to be specified following the rules:
Algorithm: The learning algorithm to perform on the problem. Select either 0, 1 or 2.
- Q-Learning: 0
- SARSA: 1
- MENACE Approach: 2
Policy: The action selection policy used by the algorithm. Select 0, 1 or 2.
- Espilon Greedy: 0
- Softmax: 1
- Simulated Annealing: 2 - (Only available for MENACE Approach)
# Instances: The number of instances of the algorithm to run. Select any integer > 0.
# Episodes: The number of The number of episodes per instance. Select any integer > 0.
Param 1: (Optional) Parameter for the algorithm. Select any float > 0.
- Q-Learning: Param 1 = Alpha
- SARSA: Param 1 = Alpha
- MENACE Approach: Param 1 = Lambda
Param 2: (Optional) Parameter for the algorithm. Select any float > 0.
- Q-Learning: Param 2 = Gamma
- SARSA: Param 2 = Gamma
- MENACE Approach: Param 2 = Reward
Param 3: (Optional) Parameter for the action seletion policy. Select any float > 0.
- Epsilon Greedy: Param 3 = Alpha
- Softmax: Param 3 = Tau
- Simulated Annealing: Param 3 = Temperature Scale
The output will be written in csv format to the standard output. The first row contains information on the algorithm. The second row provides statistics on the performance of the run. The other rows all contain a single data entry, containing the average number of turns in each episode.
First row: Algorithm, Policy, # Instances, # Episodes
Second Row: Xbar, SD
- Xbar: The mean number of turns in all episodes from all instances.
- SD: The standard deviation between the mean number of turns over all episodes of each instance.
Third Row and onwards: # actions
- # actions: The mean number of turns of the Nth-episode in all instances. For the third row N = 0. For the fourth N = 1 etc.