about steps related to the reward #11

congling · 2016-09-03T04:22:55Z

Hi Kosuke,
I've tried your model on breakout game. The performance was amazing, the average score went up to 520 after 80M steps. It's far more better than any other model I've tried.
But the average score didn't went up much after 80M. Sometimes game costs plenty of steps. When the bricks left 1 or 2 pieces, the ball went between your board and the wall, but never hits the remain bricks.
Do you think if it's better to add steps of games as the penalty to calculate R? such as
R-=beta*sqrt(steps)
Thanks

BTW, I've do some changes
1 changed the ACTION_SIZE= 4, because breakout have 4 actions in ALE.
2 If lives is lost, treat it as terminal
#if not terminal_end:
if lives==new_lives and not terminal_end:
R = self.local_network.run_value(sess, self.game_state.s_t)
else:
#print("lives cost from %d to %d"%(lives,new_lives))
lives = new_lives

miyosuda · 2016-09-03T06:42:57Z

But the average score didn't went up much after 80M.

With default "MAX_TIME_STEP" constant in constants.py, MAX_TIME_STEP is 100M.
Learning rate is annealed to 0.0 along 100M steps, so learning rate around 80M will be small.
One way to try is using bigger MAX_TIME_STEP like 150M or 200M, but this might not take effect.

BTW, @Itsukara has also reported same result that score stops around 80M.
http://itsukara.hateblo.jp/entry/2016/08/02/190029

Maybe he is using A3C-FF mode?

He also tried live-loss-as-terminal, and the learning speed was faster.
(Reaching score 400 around 18M steps?)
https://cdn-ak.f.st-hatena.com/images/fotolife/I/Itsukara/20160824/20160824034536.png
http://itsukara.hateblo.jp/entry/2016/08/11/003715

Do you think if it's better to add steps of games as the penalty to calculate R?

Ah I see. It might take effect.

There is another A3C implementation which reports higher breakout score.

https://github.com/ppwwyyxx/tensorpack/tree/master/examples/Atari2600

Their A3C learning code seems not yet released, but please check once their code is released.

congling · 2016-09-04T01:42:50Z

Thanks for your reply, @Itsukara improvement looks quite good. I'm trying it now.

miyosuda · 2016-09-04T04:10:24Z

@congling
Sorry my mistake, his graph
https://cdn-ak.f.st-hatena.com/images/fotolife/I/Itsukara/20160824/20160824034536.png
was graph of "live loss as -1 reward."

And he said that he was using A3C-FF when recording this graph.
thanks @Itsukara !

sahiliitm · 2016-09-16T12:07:26Z

Btw the code which @congling has used is not equivalent to resetting game on life lost. :p This is because all it says is that if you just lost a life, then it does not matter what the value estimate of next state is, treat it as 0.
A more correct approach to resetting simulation on lost lives is :
if terminal or (RESET_ON_LOST_LIFE and old_lives != new_lives): terminal_end = True
on this line.

1601214542 · 2017-11-02T14:25:30Z

@congling conglinghello, i am a little confused where to add the code with lives in the released code. I have problem trained with Breakout and i only reached 50 when steps are 18m. Can you give me your implemention code ? thank you.

xiaoschannel · 2017-12-20T13:55:54Z

@1601214542 hey i have this problem too! hmm. is 50 train or test score(in which you get 5 lives to spare)? how often is it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about steps related to the reward #11

about steps related to the reward #11

congling commented Sep 3, 2016

miyosuda commented Sep 3, 2016 •

edited

Loading

congling commented Sep 4, 2016

miyosuda commented Sep 4, 2016 •

edited

Loading

sahiliitm commented Sep 16, 2016

1601214542 commented Nov 2, 2017

xiaoschannel commented Dec 20, 2017

about steps related to the reward #11

about steps related to the reward #11

Comments

congling commented Sep 3, 2016

miyosuda commented Sep 3, 2016 • edited Loading

congling commented Sep 4, 2016

miyosuda commented Sep 4, 2016 • edited Loading

sahiliitm commented Sep 16, 2016

1601214542 commented Nov 2, 2017

xiaoschannel commented Dec 20, 2017

miyosuda commented Sep 3, 2016 •

edited

Loading

miyosuda commented Sep 4, 2016 •

edited

Loading