You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My use case is to process many files in a cpu heavy workload, what I did is do a glob of files, par_bridge it into a cpu intensive task, but in that task it needs to first loads the file which is IO heavy instead of CPU heavy.
I am thinking if it is possible to use another thread to do read the file into memory first then only par_iter it?
The part of buffering reads it into memory and prepare am extra set of items for each cpu intensive function to process polars_process without having to waste cpu time doing IO.
The text was updated successfully, but these errors were encountered:
If I understand you correctly, you would like to read in data with twice as many threads as you use to process it? What do you think about using two separate pools to do that, i.e. roughly
(I have not even compiled this, the code is only meant to illustrate using two threads pool to explicitly implement the two-IO-per-CPU-thread approach.)
Interesting, something like this could indeed work. But it seemed like a lot of work given that I only want to glob files. I guess I can close this issue now.
My use case is to process many files in a cpu heavy workload, what I did is do a glob of files, par_bridge it into a cpu intensive task, but in that task it needs to first loads the file which is IO heavy instead of CPU heavy.
I am thinking if it is possible to use another thread to do read the file into memory first then only par_iter it?
The part of buffering reads it into memory and prepare am extra set of items for each cpu intensive function to process
polars_process
without having to waste cpu time doing IO.The text was updated successfully, but these errors were encountered: