Feature Request: Use direct_io for model load and inference #11912

jagusztinl · 2025-02-16T17:01:40Z

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

By this article using unbuffered reads could speed up modell load and big modell inference on memory constrained servers: https://forum.level1techs.com/t/deepseek-deep-dive-r1-at-home/225826

Modell load times could be significact for big modells like R1

Use O_DIRECT flag for modell load

jagusztinl added the enhancement New feature or request label Feb 16, 2025

Provide feedback