Reinforcement Learning with Tabular Methods

Rubik's cube restricted to 180 degree turns

The program is able to run several instances of an algorithm on the Rubik's cube restricted to only 180 degree turns (double turns). Through providing arguments, the following parameters for the problem can be set:

The learning algorithm to perform on the problem.
The action selection policy used by the algorithm.
The number of instances of the algorithm to run.
The number of episodes per instance.

And optionally:

Epsilon greedy and Q-Learning parameters:
- Learning rate (Alpha) - Default 0.4
- Discount factor (Gamma) - Default: 0.95
MENACE Approach parameters:
- Base factor (Lambda) - Default: 0.78
- Reward (Reward) - Default: 1
Action Selection parameters:
- Epsilon greedy (Epsilon) - Default: 0.2
- Softmax temperature (Tau) - Default: 0.2
- Simulated annealing (Temperature scale) - Default: 0.2

A more detailed description of how to run the program with these parameters is described in: Run the program.

Compile the C source code (gcc)

The code can be compiled through:
gcc *.c -o cube -O3 -lm
For support on running multiple threads (OpenMp), add flag:
-fopenmp

Run the program

The program can be run through:
./cube <Algorithm> <Policy> <# Instances> <# Episodes> [Param1] [Param2] [Param3]

The arguments need to be specified following the rules:

Algorithm: The learning algorithm to perform on the problem. Select either 0, 1 or 2.

Q-Learning: 0
SARSA: 1
MENACE Approach: 2

Policy: The action selection policy used by the algorithm. Select 0, 1 or 2.

Espilon Greedy: 0
Softmax: 1
Simulated Annealing: 2 - (Only available for MENACE Approach)

# Instances: The number of instances of the algorithm to run. Select any integer > 0.

# Episodes: The number of The number of episodes per instance. Select any integer > 0.

Param 1: (Optional) Parameter for the algorithm. Select any float > 0.

Q-Learning: Param 1 = Alpha
SARSA: Param 1 = Alpha
MENACE Approach: Param 1 = Lambda

Param 2: (Optional) Parameter for the algorithm. Select any float > 0.

Q-Learning: Param 2 = Gamma
SARSA: Param 2 = Gamma
MENACE Approach: Param 2 = Reward

Param 3: (Optional) Parameter for the action seletion policy. Select any float > 0.

Epsilon Greedy: Param 3 = Alpha
Softmax: Param 3 = Tau
Simulated Annealing: Param 3 = Temperature Scale

Output

The output will be written in csv format to the standard output. The first row contains information on the algorithm. The second row provides statistics on the performance of the run. The other rows all contain a single data entry, containing the average number of turns in each episode.

First row: Algorithm, Policy, # Instances, # Episodes

Second Row: Xbar, SD

Xbar: The mean number of turns in all episodes from all instances.
SD: The standard deviation between the mean number of turns over all episodes of each instance.

Third Row and onwards: # actions

# actions: The mean number of turns of the Nth-episode in all instances. For the third row N = 0. For the fourth N = 1 etc.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.vscode		.vscode
Data		Data
Plotting		Plotting
README.md		README.md
actionSelection.c		actionSelection.c
actionSelection.h		actionSelection.h
cube		cube
cube.c		cube.c
cube.h		cube.h
cubelib.c		cubelib.c
cubelib.h		cubelib.h
main.c		main.c
menace.c		menace.c
menace.h		menace.h
safeAlloc.c		safeAlloc.c
safeAlloc.h		safeAlloc.h
state.c		state.c
state.h		state.h
tdlearning.c		tdlearning.c
tdlearning.h		tdlearning.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning with Tabular Methods

Rubik's cube restricted to 180 degree turns

Compile the C source code (gcc)

Run the program

Output

About

Contributors 2

Languages

RFLeijenaar/RL-Tabular-Rubikscube

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning with Tabular Methods

Rubik's cube restricted to 180 degree turns

Compile the C source code (gcc)

Run the program

Output

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages