Skip to content

Latest commit

 

History

History
32 lines (23 loc) · 1.32 KB

TRAIN.md

File metadata and controls

32 lines (23 loc) · 1.32 KB

Training

Training with Base Environment

To train the heuristic function, state/goal pairs are generated by starting from a given start state and taking random actions to generate a goal state. A goal is then sampled from this goal state.

Training with ASP Environment

Deep Approximate value iteration

Given an environment, deep approximate value iteration (DAVI) trains a heuristic function to estimate the cost-to-go from a given state to a given goal. For more details see our DeepCubeA paper.

Examples

Training a heuristic function for an environment:

from deepxube.environments.environment_abstract import Environment
from deepxube.training import avi

env: Environment = <construct_your_environment>
avi.train(env, 30, "<directory_to_save_model>", batch_size=10000, itrs_per_update=5000, max_itrs=2000000, 
          greedy_update_step_max=30, num_update_procs=48)