Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code generated DDP with free flyer model in contact gives random result #917

Open
edantec opened this issue Feb 26, 2021 · 15 comments
Open
Labels
bug Something isn't working

Comments

@edantec
Copy link

edantec commented Feb 26, 2021

I am trying to produce a code generated DDP object with the Talos model in contact and a very simple cost function composed of a state regularization and control regularization. This simple cost function is sufficient to stabilize Talos on its feet so I'm expecting the code generation to behave the same way.

However when executing one iteration of the DDP over a random guess, the result is completely random (cost decrease, gradient, etc... all random) and not at all repeatable. The derivatives of the code generated action model have been compared to the derivatives of the classical action model, and are the same, so the error doesn't come from that.

An example of how I build my DDP object is presented here: https://gitlab.laas.fr/proyan/mpc_controller/-/blob/master/src/hand_tracking_problem_full_codegen.cpp

@cmastalli
Copy link
Member

However when executing one iteration of the DDP over a random guess, the result is completely random (cost decrease, gradient, etc... all random) and not at all repeatable.

This is not clear. If you have a random guess, then the solver iterations will be different between them, right?

You also need to be aware that the DDP solver is much unstable than the FDDP solver. It means, it requires a "better" initial guess.

An example of how I build my DDP object is presented here: https://gitlab.laas.fr/proyan/mpc_controller/-/blob/master/src/hand_tracking_problem_full_codegen.cpp

This code is closed source, if you wanted me to have a look then you need to give me the reading rights :)

@cmastalli
Copy link
Member

Have you played with the benchmark code? Perhaps, you can reproduce your issue there.

@edantec
Copy link
Author

edantec commented Feb 26, 2021

What I meant by random guess is that for a fixed (random) guess, two repetitions of a ddp.solve won't give the same decrease in cost as it is expected with classically generated DDP.
Rohan will give you the reading rights asap.

@cmastalli
Copy link
Member

What I meant by random guess is that for a fixed (random) guess, two repetitions of a ddp.solve won't give the same decrease in cost as it is expected with classically generated DDP.

Thanks for the clarification.

Have you observed this issue in other solvers? Could you quickly try with FDDP and BoxFDDP?

@edantec
Copy link
Author

edantec commented Feb 26, 2021

I have the same issue with DDP, FDDP, and BoxFDDP.

@edantec
Copy link
Author

edantec commented Mar 1, 2021

Here is an example of how I write my code generated problem. https://github.com/edantec/crocoddyl/blob/benchmark/benchmark/talos_cppad.cpp

However I couldn't make this branch compile on my machine due to an error of type undefined reference to symbol 'dlclose@@GLIBC_2.2.5 error adding symbols: DSO missing from command line'

@proyan
Copy link
Member

proyan commented Mar 1, 2021

You need to link CMAKE_DL_LIBS in the benchmark. See the file benchmarks/CMakelists.txt for reference

@edantec
Copy link
Author

edantec commented Mar 2, 2021

I have added talos_cppad.cpp in SET(${PROJECT_NAME}_CODEGEN_BENCHMARK ) inside benchmarks/CMakelists.txt but still the same error.
On closer analysis, one iteration of code-generated ddp.solve produces derivatives of the cost and dynamics (Lxx, Fx, etc...) equal to zero for completely random knots (usually 3 or 5). I've tracked the error until this line: https://github.com/edantec/crocoddyl/blob/b58ef3d7f661f0bd2f36bb5d55491fc8cad3bc06/include/crocoddyl/core/optctrl/shooting.hxx#L181
where calcDiff produces here derivatives equal to 0 for Lxx, Fx, and so, for some random knots

@cmastalli
Copy link
Member

Is there any update regarding this topic?

@edantec
Copy link
Author

edantec commented Mar 17, 2021

Yes there is.
I've simplified the problem to consider the 4 DoF Talos arm without contact and I have the same randomness issue.
I could backtrack the error until: https://github.com/edantec/crocoddyl/blob/2d62f18b06c233133f45062918f99385b6fd8438/include/crocoddyl/core/codegen/action-base.hpp#L212
After this line d->calcDiffout contains some Fx, Lxx... equal to 0, a visibly complete random phenomenon. I didn't go further than that.

@proyan
Copy link
Member

proyan commented Mar 17, 2021

Sorry @edantec, but I have not yet worked on your issue.

Code-gen bugs take time to solve. And unfortunately, I haven't found the courage to plunge in your implementation. I may come around to it, but I can't give you any timeline for my support.

@edantec
Copy link
Author

edantec commented Mar 17, 2021

Just for illustration, I've written a simple example highlighting the problem here on a side branch: https://github.com/edantec/crocoddyl/blob/benchmark/benchmark/talos_cppad.cpp
When I execute this benchmark several time, the computed cost for classical DDP remains the same whereas CG DDP cost is random.

@cmastalli
Copy link
Member

cmastalli commented Mar 17, 2021

You are sharing the same CG model across all the running models: https://github.com/edantec/crocoddyl/blob/benchmark/benchmark/talos_cppad.cpp#L129

Despite that I am not sure if this is the issue source, could you create a single CG per node?

@edantec
Copy link
Author

edantec commented Mar 18, 2021

I've created one single CG per node:
https://github.com/edantec/crocoddyl/blob/9fd6dbe2d87a8327efd7cc1441134c48e273b741/benchmark/talos_cppad.cpp#L176
Now the result of codegen ddp is no more random, but it is still different from the result of classical DDP.
This is a progress, but something is still off.

@cmastalli
Copy link
Member

@edantec -- I wonder if this issue still occurs in the latest version of Crocoddyl. For instance, I have made a few fixes that could have sorted out this, e.g., #1165.

Could you try this again? If the results are different, then please share logs of both cases: with and without codegen.

@cmastalli cmastalli added the bug Something isn't working label Dec 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants