-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maximal Iteration cycle #34
Comments
This should work - One thing that I immediately note is that you have written |
sorry that was a typo. I checked it before without typo and this is a an approach which will work. The disadvantage what I see is that I extend the state space in one dimension, which I guess could influence the performance of the solver (for example in the Grid Interpolation example). What I currently try to test if the CommonRLInterface would help as a workaround. I have not tested it yet. |
Something like this stops the simulation after MAXITER Iterations... Now as far as I understand CommonRLInterface I can add POMDPs.jl ReinforcementLearning.jl etc. using CommonRLInterface
include("envs/myEnv.jl")
env = myEnv(.1,0)
reset!(env)
rsum = 0.0
while !terminated(env)
global rsum += act!(env, rand(actions(env)))
end
@show rsum using CommonRLInterface
using StaticArrays
using Compose
using Plots
import ColorSchemes
begin
MAXITER=5
end
mutable struct myEnv <: AbstractEnv
s::Float64
c::Int64
end
function CommonRLInterface.reset!(env::myEnv)
env.s=0.0
env.c=0
end
CommonRLInterface.actions(env::myEnv) = (-1.0, 0.0, 1.0)
CommonRLInterface.observe(env::myEnv) = env.s
CommonRLInterface.terminated(env::myEnv) = env.c>=MAXITER
function CommonRLInterface.act!(env::myEnv, a)
print(".")
env.c+=1
r = -env.s^2 - a^2
env.s = env.s + a + randn()
return r
end
|
I am struggling to use POMDPs with the commonrlinterface. Is there a minimal example? At least here https://juliareinforcementlearning.org/CommonRLInterface.jl/dev/faqs/ is something mentioned:
|
Is it something like this following. First I have provided the environment with the state, action, observation space in commonrlinterface something like this and then I have to use the convert function and parsing the action, observation and state space? |
The instructions that you quote above are for package developers, not users. For users, you can just use https://juliapomdp.github.io/POMDPModelTools.jl/stable/common_rl/#CommonRLInterface-Integration |
@zsunberg thank you. It seems that I misunderstood the CommonRL package, but I think it can solve my initial question with the max iteration cycle POMDP <-- commonrlinterface In the QuickPOMDP there is no other method possible?! |
Is there a possibility to use additional functions in QuickMDP or QuickPOMDP? In this minimal incomplete example cnt=0
mdp = QuickMDP(
function gen(s, a, rng)
x, v = s
#incr_cnt()
xₚ=clamp(x+Ts*v+rand(rng), PXMIN, PXMAX)
vₚ=clamp(v+Ts*a, VMIN, VMAX)
r = v > 0.5 ? 0.5 : -1
return (sp=[xₚ, vₚ], r=r)
end,
actions = collect(0.:.1:1),
initialstate = [[0.0, 0.0]],
discount = 0.95,
cnt+=1,
isterminal = function(cnt) # or isterminal=cnt->cnt>10
cnt > 10
end, |
Sorry for the delay in responding to this. Using a global You may also be interested in: https://github.com/JuliaPOMDP/FiniteHorizonPOMDPs.jl . |
Thank you for the link. I appreciate your help and information. It might be interesting for my application. It might be a quick way to augment the state space. I agree. |
Hi @ga72kud , In order to find an optimal policy for a finite horizon problem, you have two options:
You are right to say that if you want to find a single optimal policy for reaching any goal, you have to include both the goal and the vehicle's position in the MDP state, so, for a 2D grid world, the MDP state would be four dimensional, and 5 dimensional if time is included. My advice is to make the MDP model include everything so that it represents the problem correctly; i.e. do not include any approximation and do not worry about how hard it is to solve the problem when you are formulating it. Then, when you get to the solution stage, you can make approximations. This may include using neural networks for the value function as in DQN, or using simplified formulations of the problem to start with. |
Thank you, I appreciate your information |
I want to set the number of maximal iterations for the MDP (here as a certain variable x[3] describing the current iteration). If x[3] is greater than 20 it reaches the terminal state. Warning following code only for illustration I am wondering if there is another way to set the maximal limit and if the current iteration is readable.
In the solver one can set the variable, but it does not seem to fit here.
The text was updated successfully, but these errors were encountered: