Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Gaussian noise to the dynamics for improved parameter identification #250

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

moorepants
Copy link
Member

Huawei Wang has this paper:

"Identification of the human postural control system through stochastic trajectory optimization" https://doi.org/10.1016/j.jneumeth.2020.108580

It seems that the technique is very similar to our noncontiguous example but he adds a small amount of fixed Gaussian noise to the equations of motion (different noise for each measurement trial). opty supports such an addition of noise via the known_trajectory_map, so I have tested that here. The result converges to the correct globally optimal solution even from an initial guess of zeros for the trajectories and choosing an initial guess as the lower bound of the parameters, but this also works without the noise too. It isn't yet clear to me that this reaches the global solution any better than what we have. It is worth investigating it more, as the paper claims adding the noise improves things. I also need to confirm that what I'm doing is the same as what Huawei did.

cc @chrismo-schmidt, @Peter230655

@Peter230655
Copy link
Contributor

Huawei Wang has this paper:

"Identification of the human postural control system through stochastic trajectory optimization" https://doi.org/10.1016/j.jneumeth.2020.108580

It seems that the technique is very similar to our noncontiguous example but he adds a small amount of fixed Gaussian noise to the equations of motion (different noise for each measurement trial). opty supports such an addition of noise via the known_trajectory_map, so I have tested that here. The result converges to the correct globally optimal solution even from an initial guess of zeros for the trajectories and choosing an initial guess as the lower bound of the parameters, but this also works without the noise too. It isn't yet clear to me that this reaches the global solution any better than what we have. It is worth investigating it more, as the paper claims adding the noise improves things. I also need to confirm that what I'm doing is the same as what Huawei did.

cc @chrismo-schmidt, @Peter230655

Could you snd me the complete paper, I am not allowed to download it, as I do not belong to a university.
Thanks!

@Peter230655

This comment was marked as duplicate.

@moorepants
Copy link
Member Author

@HuaweiWang, if you have a moment, could you confirm that I'm doing the same thing you do in your paper? I've made a small modification to this example: https://opty.readthedocs.io/latest/examples/plot_non_contiguous_parameter_identification.html by adding process noise to the dynamics.

@Peter230655
Copy link
Contributor

Huawei Wang has this paper:

"Identification of the human postural control system through stochastic trajectory optimization" https://doi.org/10.1016/j.jneumeth.2020.108580

It seems that the technique is very similar to our noncontiguous example but he adds a small amount of fixed Gaussian noise to the equations of motion (different noise for each measurement trial). opty supports such an addition of noise via the known_trajectory_map, so I have tested that here. The result converges to the correct globally optimal solution even from an initial guess of zeros for the trajectories and choosing an initial guess as the lower bound of the parameters, but this also works without the noise too. It isn't yet clear to me that this reaches the global solution any better than what we have. It is worth investigating it more, as the paper claims adding the noise improves things. I also need to confirm that what I'm doing is the same as what Huawei did.

cc @chrismo-schmidt, @Peter230655

Did you check, whether Mr. Wang's noise will also get the correct parameters if you add bias to your measurements?
I think, I found a way to handle bias, made a PR, but Mr. Wang's idea may be simpler. Of course I will test it.

@moorepants
Copy link
Member Author

I saw your bias PR, but I'm not sure having biased measurements is so common (at least not large bias relative to the measurements).

@Peter230655
Copy link
Contributor

I saw your bias PR, but I'm not sure having biased measurements is so common (at least not large bias relative to the measurements).

I agree, such large biases as I tested would make a measurement instrument useless. I just thought, 'my' way would make the simulation non-susceptible to some bias in the measurements. I will test, whether Mr. Wang's method will give similar results.

@moorepants
Copy link
Member Author

Do you think your bias addition reduces the chance the optimizer finds a local minima?

@Peter230655
Copy link
Contributor

Do you think your bias addition reduces the chance the optimizer finds a local minima?

I did not find this in my tests - and I ran quite a few. They'd always get the correct parameters, regardless of how much bias I added. The accuracy seemed in line with the simulation without bias, errors maybe 10% of correct value.
Of course I have no theoretical idea whether this would be so or not.
Right now I am setting up a simulation to how Wang's method works.

@Peter230655
Copy link
Contributor

My VERY initial findings with Wang's method are:

  • it does not work with (unrealistically) large bias, I am pretty sure of this.
  • it does not seem to help without bias. With large noise_scale (I use the term you used) it seems to hurt.

I use my model with spring coefficient, dampening and friction. Seems the spring coefficient is estimated the best, always seems so.
Do you think, this simple spring_damper_friction mass system is too simple a system to test these methods? If so, should I try a more complicated system, with say some rigid bodies involved?

@moorepants
Copy link
Member Author

Do you think, this simple spring_damper_friction mass system is too simple a system to test these methods?

If we can't get these things to work well with a simple system, I don't see how jumping to a complex system will help us see the forest for the trees.

@Peter230655
Copy link
Contributor

Do you think, this simple spring_damper_friction mass system is too simple a system to test these methods?

If we can't get these things to work well with a simple system, I don't see how jumping to a complex system will help us see the forest for the trees.

I just thought, that maybe this simple system only has one minimum for (c, k, friction), so Wang's method cannot help to jump out of a local minimum to a lower minimum.

@Peter230655
Copy link
Contributor

Peter230655 commented Sep 24, 2024

A test showed these results, the % number are the errors:

noise scale       0.0      1.0     2.0
dampening         6.9%    9.7%    6.6%
spring           0.01%   0.31%  0.64%
friction          10.0%   20.7$  24.8%   

So, at least in this simulation, barring programming errors, the addition of noise does not seem to help,

@moorepants
Copy link
Member Author

If you read the paper, do you think I'm implementing it correctly? It seems too simple to be true.

@moorepants
Copy link
Member Author

I just thought, that maybe this simple system only has one minimum for (c, k, friction), so Wang's method cannot help to jump out of a local minimum to a lower minimum.

Without the bounds and good guesses, I was regularly landing in local minima when we originally were playing around with the problem.

@Peter230655
Copy link
Contributor

Peter230655 commented Sep 24, 2024

If you read the paper, do you think I'm implementing it correctly? It seems too simple to be true.

I will. I implemented it, same way you did it, in my slightly more complicated model with friction, and my maybe 15 - 20 test runs gave no advantage, actually it got worse with larger noise_scale.
Eq (5) on page 11 looks exactly like what you did, except I have to find out what is meant by episode s. Is this the s-th measurement of the same thing, like I guess you assumed. Also, it says that $\theta$ is part of x(t), and he only optimizes that part, like you optimize x(t) only, not $\dot x(t)$
Anyway, let me read it, hopefully I will understand better.

@moorepants
Copy link
Member Author

@Peter230655
Copy link
Contributor

Peter230655 commented Sep 24, 2024

Here is Huawei's code: https://github.com/Huawei-ClevelandStateUniv/Stochastic_Paper/blob/master/controller_identification/StandingModel_PD_Eps_Noise_SameInit.py#L306 It seems like the noise is added just like we are doing.

From figure 1 on page 7, it is clear, what $\theta_a, \theta_h$ are.
In the program are also omega_a, omega_h not sure what they are.

We are adding the noise as if it were a random_force(t) acting on the system.
Is he doing the same? I am not very clear about his eoms.

@moorepants
Copy link
Member Author

I would guess that omega_a = theta_a.diff(t).

@Peter230655
Copy link
Contributor

Peter230655 commented Sep 24, 2024

If I understood correctly it is not exactly the same what you are doing:

  • you want to find the values of parameters, just like he wants to find the $K_{i,j}$ parameters of eq(1) or eq(2). However, he has only one set of measurements ( $\theta_m $), while we assume we have different sets of measurements.
  • He creates M sets of eoms by adding noise to each one. if his $f_i(x(t), \dot x(t), K_{i,j}, a) = fr + fr^{\star}$ then it seems the same like you did it. (unclear what a is, page 10 is unreadable, where it may be explained)

If I understood correctly, it is a different set up.
I have forgotten too much of control theory to remember how to find out whether a controller like eq(1) or eq(2) is stable.

@moorepants
Copy link
Member Author

Are you saying he fits multiple models to one set of measurements, with each model having different noise? I think I was interpreting it to multiple measurements (his episodes being multiple measurements).

@Peter230655
Copy link
Contributor

Peter230655 commented Sep 24, 2024

Are you saying he fits multiple models to one set of measurements, with each model having different noise? I think I was interpreting it to multiple measurements (his episodes being multiple measurements).

This is how I would read the very top of page 11. I guess by process noise signal he means the $n_s(0, \sigma)$
Also $\theta_m$ has no index to indicate different measurements.
In his objective function he sums over s - and only in $n_s(0, \sigma)$ does s occur. Hard to believe, that $\theta_m$ has an index s hidden somewhere.

@Peter230655
Copy link
Contributor

I would guess that omega_a = theta_a.diff(t).

I wonder why he did not use sympy mechanics to set up the eoms.

@moorepants
Copy link
Member Author

moorepants commented Sep 24, 2024

I think he likely did.

@Peter230655
Copy link
Contributor

I think he likely did.

Makes sense. From the function you sent, I could not tell.

@moorepants
Copy link
Member Author

In equation 5:
image

I would think that the sum over M in the objective would somehow indicate that this is a sum over the same measurements, i.e. the argument of the sum would not vary with s.

@moorepants
Copy link
Member Author

Ah, I see what you are saying. The $\theta_m(t)$ in the objective does not have an $s$ subscript. So it is the same for all M episodes.

@Peter230655
Copy link
Contributor

Ah, I see what you are saying. The θ m ( t ) in the objective does not have an s subscript. So it is the same for all M episodes.

Yes, this is what I think.

@Peter230655
Copy link
Contributor

Peter230655 commented Sep 24, 2024

If his $f_i(.....) = fr + fr^{\star}$ then he subjects the controller to random forces in addition to the measured one.
Intuitively this should help to get a controller which can handle such disturbances better - which is more stable.

@tvdbogert
Copy link
Member

Yes, each episode s uses the same measurements in the objective, but each episode has its own trajectory \theta(t) which is indexed with s. All M trajectories are optimized simultaneously.

@moorepants
Copy link
Member Author

I just pushed another change that creates two independent measurements and three episodes for each of the two measurements. This also works.

@Peter230655
Copy link
Contributor

If I understand correctly, these $n_s(0, \theta)$ are the next best thing to having different measurements.
Would make sense intuitively.
Still one question: I tried to estimate 3 parameters with 6 measurements. When I added $n_s(0, \sigma)$ results got worse.

@moorepants
Copy link
Member Author

moorepants commented Sep 24, 2024

On this problem with a different random seed on each execution of the script, I'm getting about 1 in 6 do not converge to the global optimum.

@moorepants
Copy link
Member Author

But that is with an initial guess of all zeros.

@moorepants
Copy link
Member Author

Nice thing is that opty supports this as is! The only negative is the repetition in the equations of motion, which may cause some slow down for very large equations of motion systems.

@Peter230655
Copy link
Contributor

Nice thing is that opty supports this as is! The only negative is the repetition in the equations of motion, which may cause some slow down for very large equations of motion systems.

This repetition of eoms is the same with Wang's method, is it not?
Broadly speaking: is this adding of noise the next best thing to having various measurements?

@moorepants
Copy link
Member Author

@tvdbogert thanks for the tip! Nice idea you all came up with. We are doing some bicycle control experiments right now and amassing a bunch of experimental data to identify the rider's heading feedback control for use in multi-actor traffic simulations. We are hitting the same issues you and Huawei came upon (unstable solutions, not always global optimum). This may help us home in on consistent controller parameter identification.

@moorepants
Copy link
Member Author

@Peter230655 you've got sharper eyes than I do! I blazed over that missing subscript s!

@moorepants
Copy link
Member Author

Broadly speaking: is this adding of noise the next best thing to having various measurements?

I think there two separable concepts. Firstly, at least for the experiments we do in my lab, we collect data from many trials from the same person controlling a bicycle. So I have lots of noncontiguous measurements where the rider is roughly using the same internal control algorithm. I want to know the best fitting control parameters for all of the noncontiguous measurements we take. So, your example solved that.

Secondly, we need to get reliable global optima from real measurements where we don't know what the optimal parameters are. Wang's paper helps with that.

But, if you are stuck with only a single (hopefully long) measurement, then Wang's method helps you get to the global optima also.

So I don't think the addition of process noise is exactly equivalent, but it does seem to have some similar effects. I suppose having many noncontiguous measurements could be sufficient for getting to a global optima and the process noise may not be needed. But using them both might be the best of both solutions.

@Peter230655
Copy link
Contributor

@Peter230655 you've got sharper eyes than I do! I blazed over that missing subscript s!

They are old, but they have time! :-)

@Peter230655
Copy link
Contributor

Peter230655 commented Sep 24, 2024

Broadly speaking: is this adding of noise the next best thing to having various measurements?

I think there two separable concepts. Firstly, at least for the experiments we do in my lab, we collect data from many trials from the same person controlling a bicycle. So I have lots of noncontiguous measurements where the rider is roughly using the same internal control algorithm. I want to know the best fitting control parameters for all of the noncontiguous measurements we take. So, your example solved that.

Secondly, we need to get reliable global optima from real measurements where we don't know what the optimal parameters are. Wang's paper helps with that.

But, if you are stuck with only a single (hopefully long) measurement, then Wang's method helps you get to the global optima also.

So I don't think the addition of process noise is exactly equivalent, but it does seem to have some similar effects. I suppose having many noncontiguous measurements could be sufficient for getting to a global optima and the process noise may not be needed. But using them both might be the best of both solutions.

Makes sense.
I made the mistake this morning that, due to np.random.seed(...) I added the same $n_s(0, \sigma)$ to all equations.
I now corrected this, but in my example (as many measurements as $n_s(0, \sigma)$ ) it still makes things worse. My objective function likely was also not correct.
I will change the code tomorrow, so that these numbers may be picked independently and see.
This was a REALLY interesting day for me! :-)

Dumb question: Say, I have x measurements, and I want to add y noises
So, I set up y eoms, and each of the y eoms is minimized against all x measurements?
Wang set up all y noisy equations against his one measurement, so this would just be an extension to x measurements (?)

@moorepants
Copy link
Member Author

This was a REALLY interesting day for me! :-)

Yes, fun!

Dumb question: Say, I have x measurements, and I want to add y noises
So, I set up y eoms, and each of the y eoms is minimized against all x measurements?
Wang set up all y noisy equations against his one measurement, so this would just be an extension to x measurements (?)

The way I now have this one is that I have a dynamical differential equation for each measurement and for each "episode" of that measurement. So if I have 10 noncontiguous measurements (where measurements = subset of all or all state trajectories) and then I want to do 5 episodes then I need 50 equations of motion.

@Peter230655
Copy link
Contributor

This was a REALLY interesting day for me! :-)

Yes, fun!

Dumb question: Say, I have x measurements, and I want to add y noises
So, I set up y eoms, and each of the y eoms is minimized against all x measurements?
Wang set up all y noisy equations against his one measurement, so this would just be an extension to x measurements (?)

The way I now have this one is that I have a dynamical differential equation for each measurement and for each "episode" of that measurement. So if I have 10 noncontiguous measurements (where measurements = subset of all or all state trajectories) and then I want to do 5 episodes then I need 50 equations of motion.

I will change my simulation accordingly tomorrow.
I was afraid, that this would be the answer...! :-)

@Peter230655
Copy link
Contributor

Peter230655 commented Sep 25, 2024

I am trying my simulation with variable number of measurements and variable number of repeats per measurement.
Hopefully some results by this evening.

@Peter230655
Copy link
Contributor

I made PR 253, but with examples-gallery, I do not know where else to do it.

@Peter230655
Copy link
Contributor

Yes, each episode s uses the same measurements in the objective, but each episode has its own trajectory \theta(t) which is indexed with s. All M trajectories are optimized simultaneously.

Question:
Are these measurements publicly available?
If yes, is there a way I could get them? Just something to play around with in my retirement.
Thanks!

@moorepants
Copy link
Member Author

Huawei released the data publicly for his paper. Is that what data you are asking about?

@Peter230655
Copy link
Contributor

Huawei released the data publicly for his paper. Is that what data you are asking about?

Yes, this is what I meant. I just want to play around, using opty to find the $K_{i, j}$. As always with me nothing serious, important, or urgent. How would I get these data?
Thanks!

@moorepants
Copy link
Member Author

You'll have to search for the data. I think I saw that it was available (maybe in his dissertation). Also, the Park2004 model in this repository already implements his entire model except for the process noise part. That's why I opened #252. It uses generated data.

@moorepants
Copy link
Member Author

I also don't intend to merge this example or mine, because it is just a duplicate of what is there. We should have a separate different example to show how to do Wang's method.

@Peter230655
Copy link
Contributor

I also don't intend to merge this example or mine, because it is just a duplicate of what is there. We should have a separate different example to show how to do Wang's method.

I fully agree! I enjoy doing these simulations. But no point in adding something which is already there. I am not the least bit offended!!
Should I close the PR to avoid too much 'dead wood' with examples-gallery?

@moorepants
Copy link
Member Author

You can leave it open while you work on it. Eventually I'll close both.

@Peter230655
Copy link
Contributor

Peter230655 commented Sep 26, 2024

You can leave it open while you work on it. Eventually I'll close both.

I am finished with both in their present stage. I am thinking about a simple model of that person standing of a tread mill, but initial stage only. I think 249 serves no practical purpose: it converges even with very large (unrealistic) biases, but very slowly so.
I'd like to use 253 to play around with a bit longer.

@moorepants
Copy link
Member Author

I am thinking about a simple model of that person standing of a tread mill

This is already implemented in examples/Park2004.

@Peter230655
Copy link
Contributor

I am thinking about a simple model of that person standing of a tread mill

This is already implemented in examples/Park2004.

I saw it, but it did not lok 'simple' to me, surely it is realistic.

@chrismo-schmidt
Copy link

Hi all,

@tvdbogert thanks for the tip! Nice idea you all came up with. We are doing some bicycle control experiments right now and amassing a bunch of experimental data to identify the rider's heading feedback control for use in multi-actor traffic simulations. We are hitting the same issues you and Huawei came upon (unstable solutions, not always global optimum). This may help us home in on consistent controller parameter identification

As @moorepants already mentioned, I am working on a system id problem that occasionally has issues finding stable results. So I gave this approach a try with my more complex problem and real data.

Here is what I did:

  • Repeat eoms while introducing new state variables for each eom repetition and keeping the unknown system parameters. Similarly, repeat all bounds and constraints that correspond to the repeated states.
  • Add a noise symbol to each each highest derivative. For every repetition I introduced new noise symbols.
  • For every noise symbol, sample a noise trajectory and include them in the known_trajectory_map. Each sample is i.i.d and from a normal distribution with sigma as a fraction of the difference between the data-derived upper and lower bound of the measurements of the corresponding state.

As was discussed here previously, I believe that this corresponds to what Wang and van den Bogert do in their paper. With this setup, I selected a data sample and compared the results of optimization with and without process noise. For each case, I ran the same 100 different inital guesses that where sampled from wide spectrum of system parameters that result in a stable system. I tried 3 repetitions. Here are the results:

| process_noise_level | # stable sols. | obj. val. of best stable sol. | # sols. converging to best stable sol. |
| 0.0                 |    60 (60.0 %) |                   3.4541E-05  |                            60 (60.0 %) |
| 0.05                |    86 (86.0 %) |                   9.1467E-05  |                            28 (28.0 %) |    

So indeed, the optimisation is more successful at finding stable results. However, the added process noise seems to make it more difficult to find what seems to be the global optimum. The best solutions of both scenarios are not identical but very close-by. Diving the objective value of the case with process noise by the number of repetitions, it is actually even a little better than the case without process noise. However adding process noise seems to make it much harder to reach the global optimum from any given initial guess. Only 28 initial guesses end up at the optimum under process noise, while 60 end up there without it.

I don't think one can draw too many conclusions from this one sample. But it definitely shows a case where adding process noise significantly increases the success at finding stable solutions. I find it quite interesting though that less initial guesses find the best optimum. It seems that the noise somehow makes it harder for the optimization to perform large jumps through the parameter space, in my case.

The slightly better objective value of the process noise case seems too marginal to draw any generalizing conclusions from in.

@moorepants
Copy link
Member Author

@chrismo-schmidt once you find one of these optima that are close to global from the noise model, can you then solve one time with the non-noise model using that as an initial guess, thus converging to the true optima?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants