-
Notifications
You must be signed in to change notification settings - Fork 72
cmd_decide
Commands and settings related to the selection of operators during the Soar decision process
=============================================================================
- Decide Sub-Commands and Options -
=============================================================================
decide [? | help]
-----------------------------------------------------------------------------
decide numeric-indifferent-mode [--avg --sum]
-----------------------------------------------------------------------------
decide indifferent-selection
decide indifferent-selection <policy>
<policy> = [--boltzmann | --epsilon-greedy |
--first | --last | -- softmax ]
decide indifferent-selection <param> [value]
<param> = [--epsilon --temperature]
decide indifferent-selection [--reduction-policy| -p] <param> [<policy>]
decide indifferent-selection [--reduction-rate| -r] <param> <policy> [<rate>]
decide indifferent-selection [--auto-reduce] [setting]
decide indifferent-selection [--stats]
----------------------------------------------------------------------------
decide predict
decide select <operator ID>
-----------------------------------------------------------------------------
decide set-random-seed [<seed>]
-----------------------------------------------------------------------------
For a detailed explanation of sub-commands: help decide
Using the decide
command without any arguments will display key elements of Soar's current decision settings:
=======================================================
Decide Summary
=======================================================
Numeric indifference mode: sum
-------------------------------------------------------
Exploration Policy: softmax
Automatic Policy Parameter Reduction: off
Epsilon: 0.100000
Epsilon Reduction Policy: exponential
Temperature: 25.000000
Temperature Reduction Policy: exponential
-------------------------------------------------------
Use 'decide ?' for a command overview or 'help decide' for the manual page.
The numeric-indifferent-mode
command sets how multiple numeric indifferent preference values given to an operator are combined into a single value for use in random selection.
The default procedure is --sum
which sums all numeric indifferent preference values given to the operator, defaulting to 0 if none exist. The alternative --avg
mode will average the values, also defaulting to 0 if none exist.
The indifferent-selection
command allows the user to set options relating to selection between operator proposals that are mutually indifferent in preference memory.
The primary option is the exploration policy (each is covered below). When Soar starts, softmax is the default policy.
Note: As of version 9.3.2, the architecture no longer automatically changes the policy to epsilon-greedy the first time Soar-RL is enabled.
Some policies have parameters to temper behavior. The indifferent-selection command provides basic facilities to automatically reduce these parameters exponentially and linearly each decision cycle by a fixed rate. In addition to setting these policies/rates, the auto-reduce option enables the automatic reduction system (disabled by default), for which the Soar decision cycle incurs a small performance cost.
Option | Description |
---|---|
-s, --stats |
Summary of settings |
policy |
Set exploration policy |
parameter [exploration policy parameters] |
Get/Set exploration policy parameters (if value not given, returns the current value) |
parameter [reduction_policy](value] |
Get/Set exploration policy parameter reduction policy (if policy not given, returns the current) |
parameter reduction_policy [exploration policy parameter] |
Get/Set exploration policy parameter reduction rate for a policy (if rate not give, returns the current) |
-a, --auto-reduce [on,off](reduction-rate] |
Get/Set auto-reduction setting (if setting not provided, returns the current) |
Option | Description |
---|---|
-b, --boltzmann |
Tempered softmax (uses temperature) |
-g, --epsilon-greedy |
Tempered greedy (uses epsilon) |
-x, --softmax |
Random, biased by numeric indifferent values (if a non-positive value is encountered, resorts to a uniform random selection) |
-f, --first |
Deterministic, first indifferent preference is selected |
-l, --last |
Deterministic, last indifferent preference is selected |
Parameter Name | Acceptable Values | Default Value |
---|---|---|
-e, --epsilon |
[0, 1] |
0.1 |
-t, --temperature |
(0, inf) |
25 |
Parameter Name | Acceptable Values | Default Value |
---|---|---|
exponential default |
[0, 1] |
1 |
linear |
[0, inf] |
0 |
The predict command determines, based upon current operator proposals, which operator will be chosen during the next decision phase. If predict determines an operator tie will be encountered, "tie" is returned. If predict determines no operator will be selected (state no-change), "none" is returned. If predict determines a conflict will arise during the decision phase, "conflict" is returned. If predict determines a constraint failure will occur, "constraint" is returned. Otherwise, predict will return the id of the operator to be chosen. If operator selection will require probabilistic selection, and no alterations to the probabilities are made between the call to predict and decision phase, predict will manipulate the random number generator to enforce its prediction.
The select command will force the selection of an operator, whose id is supplied as an argument, during the next decision phase. If the argument is not a proposed operator in the next decision phase, an error is raised and operator selection proceeds as if the select command had not been called. Otherwise, the supplied operator will be selected as the next operator, regardless of preferences. If select is called with no id argument, the command returns the operator id currently forced for selection (by a previous call to select), if one exists.
Assuming operator "O2" is a valid operator, this would select it as the next operator to be selected:
decide select O2
Seeds the random number generator with the passed seed. Calling decide set-random-seed
(or equivalently, decide srand
) without providing a seed will seed the generator based on the contents of /dev/urandom (if available) or else based on time() and clock() values.
decide set-random-seed 23
inds indifferent-selection
srand set-random-seed