GPU implementation failing for simple constrained norm approximation problem #9

Osburg · 2023-03-01T12:35:17Z

Hi @nepluno,

first, thanks a lot for sharing your code with the world!
I would like to use your GPU implementation of L-BFGS-B for my master thesis project. One of the applications will be a large-scale (L2-)norm approximation with non-negativity constraints on the optimization variables, i.e.

$$ \text{min } \frac{1}{m} ||A x - b||_2^2 $$

$$ \text{subject to } x \geq 0 $$

where $A \in \mathbb{R}^{m \times n}$, $b \in \mathbb{R}^m$ and $x \in \mathbb{R}^n$.
To get familiar with the code I created an example for such a problem with $m=n=1000$. While everything works very good in case of the CPU version, the GPU version often jumps to really large or NaN values and terminates after only a few iterations without having reached a value close to a minimum. Removing the box constraints leads to both versions doing the job very well.
Similar results (very fast termination without getting close to the minimum for the GPU version) could be observed for other problems of this type.

My question is now: Do you think this is a limitation of your GPU-algorithm? (I thought this could be possible, because I observed the problem only if box-bounds are enabled. So maybe your adaptions for finding the GCP do not work for this problem?) Is it a known issue?
If not: Do you have an idea what to change in my example to make things work? (see examples folder in my fork ).

My CPU is a Intel(R) Core(TM) i7-7820X CPU, the GPU is a GeForce GTX 1080 Ti with CUDA 11.7. I get the following output

problem size: 1000, constrained, CPU, single precision

Summary: 
Total time: 981.328 ms
Iteration time: 7.85062 ms / iteration
residual f: 4.29088e-09
residual g: 0.000298678
residual x: 1.18977e-05
iterations: 125
info: 1
CPU result 8.92826e-07

problem size: 1000, constrained, GPU, single precision

Summary: 
Total time: 17.6567 ms
Iteration time: 1.35821 ms / iteration
residual f: nan
residual g: 0
residual x: nan
iterations: 13
info: 0
GPU result 0.639691

problem size: 1000, unconstrained, CPU, single precision

Summary: 
Total time: 1507.06 ms
Iteration time: 7.49779 ms / iteration
residual f: 5.06273e-09
residual g: 9.37757e-05
residual x: 0.00013773
iterations: 201
info: 1
CPU result 2.74595e-07

problem size: 1000, unconstrained, GPU, single precision

Summary: 
Total time: 138.902 ms
Iteration time: 0.775991 ms / iteration
residual f: 7.42523e-09
residual g: 0.000114142
residual x: 0.00048579
iterations: 179
info: 0
GPU result 1.28257e-06

Process finished with exit code 0

It would be really nice if you could take a look at this. Thanks!

The text was updated successfully, but these errors were encountered:

raymondyfei · 2024-05-09T17:11:24Z

Indeed, we haven't tested this on many optimization problems besides the ones in the paper. So it's very possible the simplified version of GCP causes divergence for the solver. Why and where it produces a NAN exactly may need more investigation though.

Osburg · 2024-05-13T20:51:48Z

Hi @nepluno,
this is good to know! Thanks for your answer :)

cheers
Aaron

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU implementation failing for simple constrained norm approximation problem #9

GPU implementation failing for simple constrained norm approximation problem #9

Osburg commented Mar 1, 2023 •

edited

Loading

raymondyfei commented May 9, 2024

Osburg commented May 13, 2024

GPU implementation failing for simple constrained norm approximation problem #9

GPU implementation failing for simple constrained norm approximation problem #9

Comments

Osburg commented Mar 1, 2023 • edited Loading

raymondyfei commented May 9, 2024

Osburg commented May 13, 2024

Osburg commented Mar 1, 2023 •

edited

Loading