We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
train function in MonteCarlo.ipynb
agent.update(one_ep_transition) # 更新智能体 should be outside the for-loop
def train(cfg,env,agent): print('开始训练!') print(f'环境:{cfg.env_name}, 算法:{cfg.algo_name}, 设备:{cfg.device}') rewards = [] # 记录奖励 for i_ep in range(cfg.train_eps): ep_reward = 0 # 记录每个回合的奖励 one_ep_transition = [] state = env.reset(seed=cfg.seed) # 重置环境,即开始新的回合 for _ in range(cfg.max_steps): action = agent.sample_action(state) # 根据算法采样一个动作 next_state, reward, terminated, info = env.step(action) # 与环境进行一次动作交互 one_ep_transition.append((state, action, reward)) # 保存transitions state = next_state # 更新状态 ep_reward += reward if terminated: break agent.update(one_ep_transition) # 更新智能体 rewards.append(ep_reward) print(f"回合:{i_ep+1}/{cfg.train_eps},奖励:{ep_reward:.1f}") print('完成训练!') return {"rewards":rewards}
The text was updated successfully, but these errors were encountered:
There is also a bug in train function of Sarsa.ipynb . action = agent.sample(state) after while True should be deleted. Correct code is:
def train(cfg,env,agent): print('开始训练!') print(f'环境:{cfg.env_name}, 算法:{cfg.algo_name}, 设备:{cfg.device}') rewards = [] # 记录奖励 for i_ep in range(cfg.train_eps): ep_reward = 0 # 记录每个回合的奖励 state = env.reset() # 重置环境,即开始新的回合 action = agent.sample(state) while True: #action = agent.sample(state) should be deleted next_state, reward, done, _ = env.step(action) # 与环境进行一次动作交互 next_action = agent.sample(next_state) agent.update(state, action, reward, next_state, next_action,done) # 算法更新 state = next_state # 更新状态 action = next_action ep_reward += reward if done: break rewards.append(ep_reward) print(f"回合:{i_ep+1}/{cfg.train_eps},奖励:{ep_reward:.1f},Epsilon:{agent.epsilon}") print('完成训练!') return {"rewards":rewards}
Sorry, something went wrong.
johnjim0816
No branches or pull requests
train function in MonteCarlo.ipynb
agent.update(one_ep_transition) # 更新智能体 should be outside the for-loop
The text was updated successfully, but these errors were encountered: