-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Gaussian noise to the dynamics for improved parameter identification #250
base: master
Are you sure you want to change the base?
Conversation
Could you snd me the complete paper, I am not allowed to download it, as I do not belong to a university. |
This comment was marked as duplicate.
This comment was marked as duplicate.
@HuaweiWang, if you have a moment, could you confirm that I'm doing the same thing you do in your paper? I've made a small modification to this example: https://opty.readthedocs.io/latest/examples/plot_non_contiguous_parameter_identification.html by adding process noise to the dynamics. |
Did you check, whether Mr. Wang's noise will also get the correct parameters if you add bias to your measurements? |
I saw your bias PR, but I'm not sure having biased measurements is so common (at least not large bias relative to the measurements). |
I agree, such large biases as I tested would make a measurement instrument useless. I just thought, 'my' way would make the simulation non-susceptible to some bias in the measurements. I will test, whether Mr. Wang's method will give similar results. |
Do you think your bias addition reduces the chance the optimizer finds a local minima? |
I did not find this in my tests - and I ran quite a few. They'd always get the correct parameters, regardless of how much bias I added. The accuracy seemed in line with the simulation without bias, errors maybe 10% of correct value. |
My VERY initial findings with Wang's method are:
I use my model with spring coefficient, dampening and friction. Seems the spring coefficient is estimated the best, always seems so. |
If we can't get these things to work well with a simple system, I don't see how jumping to a complex system will help us see the forest for the trees. |
I just thought, that maybe this simple system only has one minimum for (c, k, friction), so Wang's method cannot help to jump out of a local minimum to a lower minimum. |
A test showed these results, the % number are the errors:
So, at least in this simulation, barring programming errors, the addition of noise does not seem to help, |
If you read the paper, do you think I'm implementing it correctly? It seems too simple to be true. |
Without the bounds and good guesses, I was regularly landing in local minima when we originally were playing around with the problem. |
I will. I implemented it, same way you did it, in my slightly more complicated model with friction, and my maybe 15 - 20 test runs gave no advantage, actually it got worse with larger noise_scale. |
Here is Huawei's code: https://github.com/Huawei-ClevelandStateUniv/Stochastic_Paper/blob/master/controller_identification/StandingModel_PD_Eps_Noise_SameInit.py#L306 It seems like the noise is added just like we are doing. |
From figure 1 on page 7, it is clear, what We are adding the noise as if it were a random_force(t) acting on the system. |
I would guess that omega_a = theta_a.diff(t). |
If I understood correctly it is not exactly the same what you are doing:
If I understood correctly, it is a different set up. |
Are you saying he fits multiple models to one set of measurements, with each model having different noise? I think I was interpreting it to multiple measurements (his episodes being multiple measurements). |
This is how I would read the very top of page 11. I guess by process noise signal he means the |
I wonder why he did not use sympy mechanics to set up the eoms. |
I think he likely did. |
Makes sense. From the function you sent, I could not tell. |
Ah, I see what you are saying. The |
Yes, this is what I think. |
If his |
Yes, each episode s uses the same measurements in the objective, but each episode has its own trajectory \theta(t) which is indexed with s. All M trajectories are optimized simultaneously. |
I just pushed another change that creates two independent measurements and three episodes for each of the two measurements. This also works. |
If I understand correctly, these |
On this problem with a different random seed on each execution of the script, I'm getting about 1 in 6 do not converge to the global optimum. |
But that is with an initial guess of all zeros. |
Nice thing is that opty supports this as is! The only negative is the repetition in the equations of motion, which may cause some slow down for very large equations of motion systems. |
This repetition of eoms is the same with Wang's method, is it not? |
@tvdbogert thanks for the tip! Nice idea you all came up with. We are doing some bicycle control experiments right now and amassing a bunch of experimental data to identify the rider's heading feedback control for use in multi-actor traffic simulations. We are hitting the same issues you and Huawei came upon (unstable solutions, not always global optimum). This may help us home in on consistent controller parameter identification. |
@Peter230655 you've got sharper eyes than I do! I blazed over that missing subscript s! |
I think there two separable concepts. Firstly, at least for the experiments we do in my lab, we collect data from many trials from the same person controlling a bicycle. So I have lots of noncontiguous measurements where the rider is roughly using the same internal control algorithm. I want to know the best fitting control parameters for all of the noncontiguous measurements we take. So, your example solved that. Secondly, we need to get reliable global optima from real measurements where we don't know what the optimal parameters are. Wang's paper helps with that. But, if you are stuck with only a single (hopefully long) measurement, then Wang's method helps you get to the global optima also. So I don't think the addition of process noise is exactly equivalent, but it does seem to have some similar effects. I suppose having many noncontiguous measurements could be sufficient for getting to a global optima and the process noise may not be needed. But using them both might be the best of both solutions. |
They are old, but they have time! :-) |
Makes sense. Dumb question: Say, I have x measurements, and I want to add y noises |
Yes, fun!
The way I now have this one is that I have a dynamical differential equation for each measurement and for each "episode" of that measurement. So if I have 10 noncontiguous measurements (where measurements = subset of all or all state trajectories) and then I want to do 5 episodes then I need 50 equations of motion. |
I will change my simulation accordingly tomorrow. |
I am trying my simulation with variable number of measurements and variable number of repeats per measurement. |
I made PR 253, but with examples-gallery, I do not know where else to do it. |
Question: |
Huawei released the data publicly for his paper. Is that what data you are asking about? |
Yes, this is what I meant. I just want to play around, using opty to find the |
You'll have to search for the data. I think I saw that it was available (maybe in his dissertation). Also, the Park2004 model in this repository already implements his entire model except for the process noise part. That's why I opened #252. It uses generated data. |
I also don't intend to merge this example or mine, because it is just a duplicate of what is there. We should have a separate different example to show how to do Wang's method. |
I fully agree! I enjoy doing these simulations. But no point in adding something which is already there. I am not the least bit offended!! |
You can leave it open while you work on it. Eventually I'll close both. |
I am finished with both in their present stage. I am thinking about a simple model of that person standing of a tread mill, but initial stage only. I think 249 serves no practical purpose: it converges even with very large (unrealistic) biases, but very slowly so. |
This is already implemented in examples/Park2004. |
I saw it, but it did not lok 'simple' to me, surely it is realistic. |
Hi all,
As @moorepants already mentioned, I am working on a system id problem that occasionally has issues finding stable results. So I gave this approach a try with my more complex problem and real data. Here is what I did:
As was discussed here previously, I believe that this corresponds to what Wang and van den Bogert do in their paper. With this setup, I selected a data sample and compared the results of optimization with and without process noise. For each case, I ran the same 100 different inital guesses that where sampled from wide spectrum of system parameters that result in a stable system. I tried 3 repetitions. Here are the results:
So indeed, the optimisation is more successful at finding stable results. However, the added process noise seems to make it more difficult to find what seems to be the global optimum. The best solutions of both scenarios are not identical but very close-by. Diving the objective value of the case with process noise by the number of repetitions, it is actually even a little better than the case without process noise. However adding process noise seems to make it much harder to reach the global optimum from any given initial guess. Only 28 initial guesses end up at the optimum under process noise, while 60 end up there without it. I don't think one can draw too many conclusions from this one sample. But it definitely shows a case where adding process noise significantly increases the success at finding stable solutions. I find it quite interesting though that less initial guesses find the best optimum. It seems that the noise somehow makes it harder for the optimization to perform large jumps through the parameter space, in my case. The slightly better objective value of the process noise case seems too marginal to draw any generalizing conclusions from in. |
@chrismo-schmidt once you find one of these optima that are close to global from the noise model, can you then solve one time with the non-noise model using that as an initial guess, thus converging to the true optima? |
Huawei Wang has this paper:
"Identification of the human postural control system through stochastic trajectory optimization" https://doi.org/10.1016/j.jneumeth.2020.108580
It seems that the technique is very similar to our noncontiguous example but he adds a small amount of fixed Gaussian noise to the equations of motion (different noise for each measurement trial). opty supports such an addition of noise via the
known_trajectory_map
, so I have tested that here. The result converges to the correct globally optimal solution even from an initial guess of zeros for the trajectories and choosing an initial guess as the lower bound of the parameters, but this also works without the noise too. It isn't yet clear to me that this reaches the global solution any better than what we have. It is worth investigating it more, as the paper claims adding the noise improves things. I also need to confirm that what I'm doing is the same as what Huawei did.cc @chrismo-schmidt, @Peter230655