Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model improvements6 #206

Merged
merged 21 commits into from
Dec 13, 2021
Merged

Model improvements6 #206

merged 21 commits into from
Dec 13, 2021

Conversation

detlefarend
Copy link
Member

@detlefarend detlefarend commented Dec 10, 2021

Description

Background

Types of changes

  • Bug fix
  • New Example, please read the detailed guideline here
  • New Feature
  • Documentation

New Feature

  • Model (Improvement on the Core Model)
  • Environment (New Environment), please read the detailed guideline here
  • Policy (New Policy), please read the detailed guideline here
  • .....

Checklist:

  • Read the CONTRIBUTION guide.
  • Update the history on the source code (required).
  • This change requires a change to the documentation.
  • Update the tests accordingly.
  • Update the documentation accordingly.

@detlefarend detlefarend added enhancement New feature or request v0.8.0 In scope of Release 0.8.0 labels Dec 10, 2021
@detlefarend detlefarend linked an issue Dec 12, 2021 that may be closed by this pull request
7 tasks
@detlefarend detlefarend marked this pull request as ready for review December 12, 2021 23:31
@detlefarend
Copy link
Member Author

Hi Rizky and William, classes Training and RLTraining are reworked and enhanced as described in #161 and demonstrated in new Howto 17. Please take a look. Thx!

@rizkydiprasetya
Copy link
Contributor

@detlefarend could you please check there are still some errors.

@detlefarend
Copy link
Member Author

detlefarend commented Dec 13, 2021

Hi @rizkydiprasetya, I guess the problem is caused by this code

grafik

rew is not a scalar but a np array. The method Reward.set_overall_reward() in turn expects a scalar.

Then a parameter of class RLTraining changed it's name. I already fixed and pushed it:

grafik

@rizkydiprasetya
Copy link
Contributor

why should be a scalar? it used to work with np array

@detlefarend
Copy link
Member Author

why should be a scalar? it used to work with np array

A reward is a scalar value, not an array. That's why both methods Reward.set_overall_reward() and Reward.add_agent_reward() expect scalars.

Fyi: I edited my last comment (parameter of class RLTraining)

@rizkydiprasetya
Copy link
Contributor

ok, and why is it working without this stagnation and evaluation and other stuff that you added?

@detlefarend
Copy link
Member Author

ok, and why is it working without this stagnation and evaluation and other stuff that you added?

It is a simple thing: you use the method beyond its specification. If you do so the behavior is undefined. The new evaluation stuff expects scalar values and gets arrays. We can surely improve both methods by adding try/except logic. But it's not a bug. Please adjust your code or add the logic to Reward by yourself.

@rizkydiprasetya
Copy link
Contributor

rizkydiprasetya commented Dec 13, 2021

ok, then we need to change all of the environment. Since all of them based on numpy. You can check them.

@rizkydiprasetya
Copy link
Contributor

ok, and why is it working without this stagnation and evaluation and other stuff that you added?

It is a simple thing: you use the method beyond its specification. If you do so the behavior is undefined. The new evaluation stuff expects scalar values and gets arrays. We can surely improve both methods by adding try/except logic. But it's not a bug. Please adjust your code or add the logic to Reward by yourself.

its not that beyond. Just about either using NumPy or not.

@detlefarend
Copy link
Member Author

I have no problem to solve this in Reward - except of time. Feel free to do it if you prefer this solution.

@rizkydiprasetya
Copy link
Contributor

done

@rizkydiprasetya
Copy link
Contributor

Screenshot 2021-12-13 at 11 20 50

@detlefarend why?

@detlefarend
Copy link
Member Author

Screenshot 2021-12-13 at 11 20 50

@detlefarend why?

What is the problem? Give me a chance to understand.

@rizkydiprasetya
Copy link
Contributor

Screenshot 2021-12-13 at 11 20 50 @detlefarend why?

What is the problem? Give me a chance to understand.

Why are you using try and except, but there is no error that you want to catch?

@detlefarend
Copy link
Member Author

Because you can use an env as one (or all) of the afcts in an envmodel. But an env is not adaptive and raises an exception, if you try to switch the adaptivity. See EnvBase.switch_adaptivity().

@rizkydiprasetya
Copy link
Contributor

rizkydiprasetya commented Dec 13, 2021

I see, ok. It is fine like that, but I would prefer to check for the inheritance of the object for the adaptivity.

Oh just check again, EnvBase inherits also the Model class. That is why you put the raise on the function.

@detlefarend
Copy link
Member Author

What is your problem? Does something behave different from your expectation?

@detlefarend detlefarend merged commit a3de2a8 into main Dec 13, 2021
@detlefarend detlefarend deleted the model_improvements6 branch December 14, 2021 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request v0.8.0 In scope of Release 0.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Class RLTraining - Completion
3 participants