Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU implementation failing for simple constrained norm approximation problem #9

Open
Osburg opened this issue Mar 1, 2023 · 2 comments

Comments

@Osburg
Copy link

Osburg commented Mar 1, 2023

Hi @nepluno,

first, thanks a lot for sharing your code with the world!
I would like to use your GPU implementation of L-BFGS-B for my master thesis project. One of the applications will be a large-scale (L2-)norm approximation with non-negativity constraints on the optimization variables, i.e.

$$ \text{min } \frac{1}{m} ||A x - b||_2^2 $$

$$ \text{subject to } x \geq 0 $$

where $A \in \mathbb{R}^{m \times n}$, $b \in \mathbb{R}^m$ and $x \in \mathbb{R}^n$.
To get familiar with the code I created an example for such a problem with $m=n=1000$. While everything works very good in case of the CPU version, the GPU version often jumps to really large or NaN values and terminates after only a few iterations without having reached a value close to a minimum. Removing the box constraints leads to both versions doing the job very well.
Similar results (very fast termination without getting close to the minimum for the GPU version) could be observed for other problems of this type.

My question is now: Do you think this is a limitation of your GPU-algorithm? (I thought this could be possible, because I observed the problem only if box-bounds are enabled. So maybe your adaptions for finding the GCP do not work for this problem?) Is it a known issue?
If not: Do you have an idea what to change in my example to make things work? (see examples folder in my fork ).

My CPU is a Intel(R) Core(TM) i7-7820X CPU, the GPU is a GeForce GTX 1080 Ti with CUDA 11.7. I get the following output

problem size: 1000, constrained, CPU, single precision

Summary: 
Total time: 981.328 ms
Iteration time: 7.85062 ms / iteration
residual f: 4.29088e-09
residual g: 0.000298678
residual x: 1.18977e-05
iterations: 125
info: 1
CPU result 8.92826e-07

problem size: 1000, constrained, GPU, single precision

Summary: 
Total time: 17.6567 ms
Iteration time: 1.35821 ms / iteration
residual f: nan
residual g: 0
residual x: nan
iterations: 13
info: 0
GPU result 0.639691

problem size: 1000, unconstrained, CPU, single precision

Summary: 
Total time: 1507.06 ms
Iteration time: 7.49779 ms / iteration
residual f: 5.06273e-09
residual g: 9.37757e-05
residual x: 0.00013773
iterations: 201
info: 1
CPU result 2.74595e-07

problem size: 1000, unconstrained, GPU, single precision

Summary: 
Total time: 138.902 ms
Iteration time: 0.775991 ms / iteration
residual f: 7.42523e-09
residual g: 0.000114142
residual x: 0.00048579
iterations: 179
info: 0
GPU result 1.28257e-06

Process finished with exit code 0

It would be really nice if you could take a look at this. Thanks!

@raymondyfei
Copy link
Owner

Indeed, we haven't tested this on many optimization problems besides the ones in the paper. So it's very possible the simplified version of GCP causes divergence for the solver. Why and where it produces a NAN exactly may need more investigation though.

@Osburg
Copy link
Author

Osburg commented May 13, 2024

Hi @nepluno,
this is good to know! Thanks for your answer :)

cheers
Aaron

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants