Skip to content

redpanda3/rethink-about-nvdla

Folders and files

NameName
Last commit message
Last commit date

Latest commit

5878026 · Jul 17, 2024

History

3 Commits
Jul 17, 2024

Repository files navigation

It's been 7 years since NVDLA was published, here are some common issues collected from NVDLA IP users in the industry:

Hard to Use from the Software Perspective -- from Car Factory Side

NVDLA requires the CPU side to prepare the data(a shaped image, a layer to be sent) and send it to memory, most of the work needs to be done by the CPU side. Although NVDLA has the advantage of DSA low power, engineers in the car industry prefer to use shader cores or SIMT, rather than NPU.

Low Performance on the Server Side in Scalability -- from Internet Company

NVDLA Sequence Controller, which is the CSC module, 'locks up' every step to calculate. There is no way to escalate the power of calculation, maybe in the NVDLA private version, switching NHWC to NCHW might help, but none of us from the open-source world wants to modify it from cmod to rtl to dv, that costs too much.

Costly to Build Up an NVDLA-based SOC -- from Startup Company

Although in theory, you can build up an NVDLA-based SOC without a CPU to maintain your business in some specific detection. But in reality, you need a CPU to configure each of the registers in NVDLA, and also, you will need a memory controller IP(DDR, etc...) to store the data. Mostly, I saw people use NVDLA in FPGA for demo use, but there is still a huge gap between FPGA and ASIC(peripheral IP, CPU).

Adios to the NVDLA

I tried to expand NVDLA to common algorithms in self-driving car applications in 2020 with my company, NProcessor(not existing anymore) and failed(in software stack, in scalability, in ASIC proof, more importantly, in my awful CEO strategy). During those years, I saw people still interested in this design, so I continued to maintain the soDLA(a chisel version of NVDLA) during my leisure time, adding more integration and verifications to it. Maybe many years later, people will say, "I like its ping-pong buffer" or "I like the completeness of cmodel, vmodel and virtual platform", but you'll admit that NVDLA's glory is gone.

What's Next

Will it be possible for NVDLA to switch to a software-friendly, low-end-process-friendly, scalability-enabled IP?

Software-friendly enough, so it can run mainstream framework.

Small enough, so it can be easily integrated into an M2 type of FPGA or into the efabless/openihp shuttle.

The final one is scalability-enabled, I don't know how to achieve that without an NOC-based design, welcome to discuss.

Finally

I'm glad to hear your opinion, submit an issue so that I can merge your thoughts.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published