Par bridge with optional buffering IO handling #1173

pickfire · 2024-06-10T00:36:34Z

My use case is to process many files in a cpu heavy workload, what I did is do a glob of files, par_bridge it into a cpu intensive task, but in that task it needs to first loads the file which is IO heavy instead of CPU heavy.

I am thinking if it is possible to use another thread to do read the file into memory first then only par_iter it?

glob(...).par_buffer(polars_read).par_iter(polars_process);

The part of buffering reads it into memory and prepare am extra set of items for each cpu intensive function to process polars_process without having to waste cpu time doing IO.

The text was updated successfully, but these errors were encountered:

adamreichold · 2024-06-10T06:16:38Z

If I understand you correctly, you would like to read in data with twice as many threads as you use to process it? What do you think about using two separate pools to do that, i.e. roughly

let cpu_pool = ThreadPoolBuilder::new().build().unwrap();

let io_pool = ThreadPoolBuilder::new().num_threads(cpu_pool.current_num_threads() * 2).build().unwrap();

io_pool.scope(|io_scope| {
   glob(...).par_iter().map(polars_read).for_each(|item| {
       cpu_pool.in_place_scope(move |cpu_scope| {
           cpu_scope.spawn(move || polars_process(item));
       });
   });
});

(I have not even compiled this, the code is only meant to illustrate using two threads pool to explicitly implement the two-IO-per-CPU-thread approach.)

pickfire · 2024-07-28T13:58:35Z

Interesting, something like this could indeed work. But it seemed like a lot of work given that I only want to glob files. I guess I can close this issue now.

pickfire closed this as completed Jul 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Par bridge with optional buffering IO handling #1173

Par bridge with optional buffering IO handling #1173

pickfire commented Jun 10, 2024

adamreichold commented Jun 10, 2024

pickfire commented Jul 28, 2024 •

edited

Loading

Par bridge with optional buffering IO handling #1173

Par bridge with optional buffering IO handling #1173

Comments

pickfire commented Jun 10, 2024

adamreichold commented Jun 10, 2024

pickfire commented Jul 28, 2024 • edited Loading

pickfire commented Jul 28, 2024 •

edited

Loading