FPGA utilization? #13

gkovacsds · 2023-10-02T16:44:49Z

I know this may sound like something too much to be done in the near future, but have you considered utilizing cloud FPGA services to achieve more parallel speedups? Do you have any experience in this field?
I've recently read a paper from Tarek Nechma who claims to had success with it - though on local FPGA hardware.
Thank you for any answer or hint.

chenxm1986 · 2023-11-02T14:46:58Z

I've read some papers about FPGA-based sparse direct solvers. For all I know, these are all research works and I've never seen a success of using FPGAs in practical sparse direct solvers. Even for mature GPUs and mature CUDA, there are very few usages in practical sparse direct solvers. I believe there are unsolved challenges for using FPGAs in practical sparse LU factorization solvers.

Another important question is whether FPGAs can really be faster than CPUs for sparse LU factorization. From my experience on GPU-based sparse LU factorization, the answer is pessimistic. The results reported in papers are usually misleading. They are not end-to-end comparisons and the baselines are not the fastest CPU implementation. FPGAs and GPUs are not good at handling irregular problems. Sparse LU factorization is a representative irregular problem. Though CKTSO has a GPU module, it is faster than the CPU module only for relatively dense matrices. I believe the situation is similar for FPGAs, or even worse. In fact, the performance bottleneck is memory access, but not computation. FPGAs do not have special mechanisms to handle irregular memory access. From this point of view, I believe CPUs with large caches are still the best choice for sparse LU factorization for circuit matrices.

gkovacsds · 2023-11-06T11:24:58Z

Thank you for the answer - now my colleague directed me to an earlier paper including you as co-author - FPGA Accelerated Parallel Sparse Matrix Factorization for Circuit Simulations
Was this project and research direction a success then, or further practical application did not really work out? We are curious, if you can tell us more. We are researching this FPGA calculation topic right now - we have a commercial circuit simulation package product actually.

chenxm1986 · 2023-11-10T03:17:55Z

yes, I was involved in that early paper. I contributed some parallelism ideas. You can see that we only tested very few cases and they only showed 2-3X speedup against KLU. Current CKTSO can easily achieve this speedup even using a single thread. From an academic research point of view, trying new hardware architectures to show how to optimize the algorithm implementations targeted at reconfigurable or massively parallel hardware has scientific significance. But for practical usage, the only issue is the absolute performance. If the GPU or FPGA solver cannot be faster than CPU, there is no practical significance. I believe there are also many other practical issues need to be solved, rather than scientific problems, to really achieve higher performance than CPU solvers. If FPGAs can be faster for relatively dense matrices, it is also good.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FPGA utilization? #13

FPGA utilization? #13

gkovacsds commented Oct 2, 2023

chenxm1986 commented Nov 2, 2023

gkovacsds commented Nov 6, 2023 •

edited

Loading

chenxm1986 commented Nov 10, 2023

FPGA utilization? #13

FPGA utilization? #13

Comments

gkovacsds commented Oct 2, 2023

chenxm1986 commented Nov 2, 2023

gkovacsds commented Nov 6, 2023 • edited Loading

chenxm1986 commented Nov 10, 2023

gkovacsds commented Nov 6, 2023 •

edited

Loading