Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet reader crashes with "range end index 16 out of range for slice" error #15020

Closed
2 tasks done
mkysylov opened this issue Mar 12, 2024 · 0 comments · Fixed by #15021
Closed
2 tasks done

Parquet reader crashes with "range end index 16 out of range for slice" error #15020

mkysylov opened this issue Mar 12, 2024 · 0 comments · Fixed by #15021
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars

Comments

@mkysylov
Copy link
Contributor

mkysylov commented Mar 12, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

let lf = LazyFrame::scan_parquet("wikipedia-train.parquet", ScanArgsParquet::default())?
    .with_streaming(true)
    .limit(850000)
    .select([col("title")])
    .group_by(["title"])
    .agg([len().alias("count")]);

println!("{}", lf.explain(true)?);

let df = lf
    .sort("count", SortOptions { descending: true, ..Default::default()})
    .collect()?;

where wikipedia-train.parquet is produced by downloading wikipedia-train.arrow from https://huggingface.co/datasets/wikipedia and running the following python code:

import pyarrow as pa
import pyarrow.parquet as pq

memory_mapped_stream = pa.memory_map("wikipedia-train.arrow")
stream_reader = pa.ipc.open_stream(memory_mapped_stream)
pa_table = stream_reader.read_all()
pq.write_table(pa_table, "wikipedia-train.parquet", compression='zstd')

Log output

--- STREAMING
AGGREGATE
        [len().alias("count")] BY [col("title")] FROM
  FAST_PROJECT: [title]

      Parquet SCAN D:\wikipedia-train.parquet
      PROJECT 1/4 COLUMNS
      N_ROWS: 850000  --- END STREAMING

  DF []; PROJECT */0 COLUMNS; SELECTION: "None"
thread '<unnamed>' panicked at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce\library\alloc\src\collections\vec_deque\mod.rs:1401:36:
range end index 16 out of range for slice of length 1
stack backtrace:
   0: std::panicking::begin_panic_handler
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library\std\src\panicking.rs:645
   1: core::panicking::panic_fmt
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library\core\src\panicking.rs:72
   2: core::slice::index::slice_end_index_len_fail_rt
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library\core\src\slice\index.rs:76
   3: core::slice::index::slice_end_index_len_fail
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library\core\src\slice\index.rs:68
   4: core::slice::index::range<core::ops::range::RangeTo<usize> >
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce\library\core\src\slice\index.rs:706
   5: alloc::collections::vec_deque::VecDeque<polars_core::frame::DataFrame,alloc::alloc::Global>::drain<polars_core::frame::DataFrame,alloc::alloc:
:Global,core::ops::range::RangeTo<usize> >
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce\library\alloc\src\collections\vec_deque\mod.rs:1401
   6: polars_io::parquet::read_impl::impl$3::next_batches::async_fn$0
             at C:\Users\maksi\.cargo\git\checkouts\polars-1b4124aa9ec38670\f2a18cd\crates\polars-io\src\parquet\read_impl.rs:612
   7: tokio::runtime::park::impl$4::block_on::closure$0<enum2$<polars_io::parquet::read_impl::impl$3::next_batches::async_fn_env$0> >
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.33.0\src\runtime\park.rs:282
   8: tokio::runtime::coop::with_budget
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.33.0\src\runtime\coop.rs:107
   9: tokio::runtime::coop::budget
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.33.0\src\runtime\coop.rs:73
  10: tokio::runtime::park::CachedParkThread::block_on<enum2$<polars_io::parquet::read_impl::impl$3::next_batches::async_fn_env$0> >
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.33.0\src\runtime\park.rs:282
  11: tokio::runtime::context::blocking::BlockingRegionGuard::block_on<enum2$<polars_io::parquet::read_impl::impl$3::next_batches::async_fn_env$0> >
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.33.0\src\runtime\context\blocking.rs:66
  12: tokio::runtime::scheduler::multi_thread::impl$0::block_on::closure$0<enum2$<polars_io::parquet::read_impl::impl$3::next_batches::async_fn_env$
0> >
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.33.0\src\runtime\scheduler\multi_thread\mod.rs:87       
  13: tokio::runtime::context::runtime::enter_runtime<tokio::runtime::scheduler::multi_thread::impl$0::block_on::closure_env$0<enum2$<polars_io::par
quet::read_impl::impl$3::next_batches::async_fn_env$0> >,enum2$<core::result::Result<enum2$<core::option::Option<
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.33.0\src\runtime\context\runtime.rs:65
  14: tokio::runtime::scheduler::multi_thread::MultiThread::block_on<enum2$<polars_io::parquet::read_impl::impl$3::next_batches::async_fn_env$0> >  
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.33.0\src\runtime\scheduler\multi_thread\mod.rs:86       
  15: tokio::runtime::runtime::Runtime::block_on<enum2$<polars_io::parquet::read_impl::impl$3::next_batches::async_fn_env$0> >
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.33.0\src\runtime\runtime.rs:350
  16: polars_io::pl_async::RuntimeManager::block_on_potential_spawn<enum2$<polars_io::parquet::read_impl::impl$3::next_batches::async_fn_env$0> >   
             at C:\Users\maksi\.cargo\git\checkouts\polars-1b4124aa9ec38670\f2a18cd\crates\polars-io\src\pl_async.rs:253
  17: polars_pipe::executors::sources::parquet::impl$1::get_batches
             at C:\Users\maksi\.cargo\git\checkouts\polars-1b4124aa9ec38670\f2a18cd\crates\polars-pipe\src\executors\sources\parquet.rs:283
  18: polars_pipe::pipeline::dispatcher::drive_operator::par_process_chunks::closure$0::closure$1
             at C:\Users\maksi\.cargo\git\checkouts\polars-1b4124aa9ec38670\f2a18cd\crates\polars-pipe\src\pipeline\dispatcher\drive_operator.rs:65
  19: rayon_core::scope::impl$0::spawn::closure$0::closure$0<polars_pipe::pipeline::dispatcher::drive_operator::par_process_chunks::closure$0::closu
re_env$1>
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\scope\mod.rs:526
  20: core::panic::unwind_safe::impl$23::call_once<tuple$<>,rayon_core::scope::impl$0::spawn::closure$0::closure_env$0<polars_pipe::pipeline::dispat
cher::drive_operator::par_process_chunks::closure$0::closure_env$1> >
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce\library\core\src\panic\unwind_safe.rs:272
  21: std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<rayon_core::scope::impl$0::spawn::closure$0::closure_env$0<polars_pipe
::pipeline::dispatcher::drive_operator::par_process_chunks::closure$0::closure_env$1> >,tuple$<> >
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce\library\std\src\panicking.rs:552
  22: polars_pipe::executors::sinks::group_by::generic::ooc_state::impl$2::clone
  23: std::panicking::try<tuple$<>,core::panic::unwind_safe::AssertUnwindSafe<rayon_core::scope::impl$0::spawn::closure$0::closure_env$0<polars_pipe
::pipeline::dispatcher::drive_operator::par_process_chunks::closure$0::closure_env$1> > >
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce\library\std\src\panicking.rs:516
  24: std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<rayon_core::scope::impl$0::spawn::closure$0::closure_env$0<polars_pipe::pi
peline::dispatcher::drive_operator::par_process_chunks::closure$0::closure_env$1> >,tuple$<> >
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce\library\std\src\panic.rs:142
  25: rayon_core::unwind::halt_unwinding<rayon_core::scope::impl$0::spawn::closure$0::closure_env$0<polars_pipe::pipeline::dispatcher::drive_operato
r::par_process_chunks::closure$0::closure_env$1>,tuple$<> >
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\unwind.rs:17
  26: rayon_core::scope::ScopeBase::execute_job_closure<rayon_core::scope::impl$0::spawn::closure$0::closure_env$0<polars_pipe::pipeline::dispatcher
::drive_operator::par_process_chunks::closure$0::closure_env$1>,tuple$<> >
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\scope\mod.rs:689
  27: rayon_core::scope::ScopeBase::execute_job<rayon_core::scope::impl$0::spawn::closure$0::closure_env$0<polars_pipe::pipeline::dispatcher::drive_
operator::par_process_chunks::closure$0::closure_env$1> >
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\scope\mod.rs:679
  28: rayon_core::scope::impl$0::spawn::closure$0<polars_pipe::pipeline::dispatcher::drive_operator::par_process_chunks::closure$0::closure_env$1>  
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\scope\mod.rs:526
  29: rayon_core::job::impl$6::execute<rayon_core::scope::impl$0::spawn::closure_env$0<polars_pipe::pipeline::dispatcher::drive_operator::par_proces
s_chunks::closure$0::closure_env$1> >
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\job.rs:169
  30: rayon_core::job::JobRef::execute
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\job.rs:64
  31: rayon_core::registry::WorkerThread::execute
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\registry.rs:860
  32: rayon_core::registry::WorkerThread::wait_until_cold
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\registry.rs:786
  33: rayon_core::registry::WorkerThread::wait_until<rayon_core::latch::CoreLatch>
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\registry.rs:769
  34: rayon_core::latch::CountLatch::wait
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\latch.rs:400
  35: rayon_core::scope::ScopeBase::complete<rayon_core::scope::scope::closure$0::closure_env$0<polars_pipe::pipeline::dispatcher::drive_operator::p
ar_process_chunks::closure_env$0,tuple$<> >,tuple$<> >
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\scope\mod.rs:668
  36: rayon_core::scope::scope::closure$0<polars_pipe::pipeline::dispatcher::drive_operator::par_process_chunks::closure_env$0,tuple$<> >
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\scope\mod.rs:291
  37: rayon_core::registry::in_worker<rayon_core::scope::scope::closure_env$0<polars_pipe::pipeline::dispatcher::drive_operator::par_process_chunks:
:closure_env$0,tuple$<> >,tuple$<> >
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\registry.rs:951
  38: rayon_core::scope::scope<polars_pipe::pipeline::dispatcher::drive_operator::par_process_chunks::closure_env$0,tuple$<> >
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\scope\mod.rs:289
  39: rayon_core::thread_pool::impl$0::scope::closure$0<polars_pipe::pipeline::dispatcher::drive_operator::par_process_chunks::closure_env$0,tuple$<
> >
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\thread_pool\mod.rs:294
  40: rayon_core::thread_pool::impl$0::install::closure$0<rayon_core::thread_pool::impl$0::scope::closure_env$0<polars_pipe::pipeline::dispatcher::d
rive_operator::par_process_chunks::closure_env$0,tuple$<> >,tuple$<> >
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\thread_pool\mod.rs:147
  41: rayon_core::registry::impl$6::in_worker_cold::closure$0::closure$0<rayon_core::thread_pool::impl$0::install::closure_env$0<rayon_core::thread_
pool::impl$0::scope::closure_env$0<polars_pipe::pipeline::dispatcher::drive_operator::par_process_chunks::closure
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\registry.rs:522
  42: rayon_core::job::impl$9::call::closure$0<tuple$<>,rayon_core::registry::impl$6::in_worker_cold::closure$0::closure_env$0<rayon_core::thread_po
ol::impl$0::install::closure_env$0<rayon_core::thread_pool::impl$0::scope::closure_env$0<polars_pipe::pipeline::d
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\job.rs:218
  43: core::panic::unwind_safe::impl$23::call_once<tuple$<>,rayon_core::job::impl$9::call::closure_env$0<tuple$<>,rayon_core::registry::impl$6::in_w
orker_cold::closure$0::closure_env$0<rayon_core::thread_pool::impl$0::install::closure_env$0<rayon_core::thread_p
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce\library\core\src\panic\unwind_safe.rs:272
  44: std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<rayon_core::job::impl$9::call::closure_env$0<tuple$<>,rayon_core::regi
stry::impl$6::in_worker_cold::closure$0::closure_env$0<rayon_core::thread_pool::impl$0::install::closure_env$0<ra
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce\library\std\src\panicking.rs:552
  45: polars_pipe::executors::sinks::group_by::generic::ooc_state::impl$2::clone
  46: std::panicking::try<tuple$<>,core::panic::unwind_safe::AssertUnwindSafe<rayon_core::job::impl$9::call::closure_env$0<tuple$<>,rayon_core::regi
stry::impl$6::in_worker_cold::closure$0::closure_env$0<rayon_core::thread_pool::impl$0::install::closure_env$0<ra
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce\library\std\src\panicking.rs:516
  47: std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<rayon_core::job::impl$9::call::closure_env$0<tuple$<>,rayon_core::registry
::impl$6::in_worker_cold::closure$0::closure_env$0<rayon_core::thread_pool::impl$0::install::closure_env$0<rayon_
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce\library\std\src\panic.rs:142
  48: rayon_core::unwind::halt_unwinding<rayon_core::job::impl$9::call::closure_env$0<tuple$<>,rayon_core::registry::impl$6::in_worker_cold::closure
$0::closure_env$0<rayon_core::thread_pool::impl$0::install::closure_env$0<rayon_core::thread_pool::impl$0::scope:
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\unwind.rs:17
  49: enum2$<rayon_core::job::JobResult<tuple$<> > >::call<tuple$<>,rayon_core::registry::impl$6::in_worker_cold::closure$0::closure_env$0<rayon_cor
e::thread_pool::impl$0::install::closure_env$0<rayon_core::thread_pool::impl$0::scope::closure_env$0<polars_pipe:
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\job.rs:218
  50: rayon_core::job::impl$4::execute<rayon_core::latch::LatchRef<rayon_core::latch::LockLatch>,rayon_core::registry::impl$6::in_worker_cold::closu
re$0::closure_env$0<rayon_core::thread_pool::impl$0::install::closure_env$0<rayon_core::thread_pool::impl$0::scop
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\job.rs:120
  51: rayon_core::job::JobRef::execute
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\job.rs:64
  52: rayon_core::registry::WorkerThread::execute
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\registry.rs:860
  53: rayon_core::registry::WorkerThread::wait_until_cold
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\registry.rs:794
  54: rayon_core::registry::WorkerThread::wait_until<rayon_core::latch::OnceLatch>
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\registry.rs:769
  55: rayon_core::registry::WorkerThread::wait_until_out_of_work
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\registry.rs:818
  56: rayon_core::registry::main_loop
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\registry.rs:923
  57: rayon_core::registry::ThreadBuilder::run
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\registry.rs:53
  58: rayon_core::registry::impl$2::spawn::closure$0
             at C:\Users\maksi\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-core-1.12.1\src\registry.rs:98
  59: core::hint::black_box
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce\library\core\src\hint.rs:286
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: process didn't exit successfully: `target\debug\bpm.exe` (exit code: 101)

Process finished with exit code 101

Issue description

Parquet reader tries to read n items from self.chunks_fifo, while less then n items is available. This causes a panic.
Documentation for VecDeque mentions that it panics "if the end point is greater than the length of the deque", which seems to be the problem here.

Expected behavior

No panic

Installed versions

polars = { version = "0.38.2", features = [ "lazy", "streaming", "strings", "parquet", ]}
@mkysylov mkysylov added bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant