Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go] 32-bit panic: utils.GetMinMaxInt32() #40672

Closed
powersj opened this issue Mar 19, 2024 · 1 comment
Closed

[Go] 32-bit panic: utils.GetMinMaxInt32() #40672

powersj opened this issue Mar 19, 2024 · 1 comment
Assignees
Milestone

Comments

@powersj
Copy link

powersj commented Mar 19, 2024

Describe the bug, including details regarding any error messages, version, and platform.

While working on a parquet file parser, I was running our tests on a 32-bit system and came across a panic. This is reproducable with the in-tree parquet-reader, using the the parquet file at the end of this report. First, it works on my 64-bit system as expected:

$ git clone git@github.com:apache/arrow
$ cd arrow/go/parquet/cmd/parquet_reader
$ cp ~/input.parquet .
$ go run . input.parquet 
File name: input.parquet
Version: v2.6
Created By: parquet-cpp-arrow version 15.0.1
Num Rows: 1
Number of RowGroups: 1
Number of Real Columns: 2
Number of Columns: 2
Number of Selected Columns: 2
Column 0: value (INT64)
Column 1: timestamp (BYTE_ARRAY/UTF8)
--- Row Group: 0  ---
--- Total Bytes: 201  ---
--- Rows: 1  ---
Column 0
 Values: 1, Min: 42, Max: 42, Null Values: 0, Distinct Values: 0
 Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY
 Uncompressed Size: 92, Compressed Size: 96
Column 1
 Values: 1, Min: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], Max: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], Null Values: 0, Distinct Values: 0
 Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY
 Uncompressed Size: 109, Compressed Size: 113
--- Values ---
value             |timestamp         |
42                |1710683608143228692|

Once I force 32-bit arch you can see the crash:

$ GOARCH=386 go run . input.parquet 
File name: input.parquet
Version: v2.6
Created By: parquet-cpp-arrow version 15.0.1
Num Rows: 1
Number of RowGroups: 1
Number of Real Columns: 2
Number of Columns: 2
Number of Selected Columns: 2
Column 0: value (INT64)
Column 1: timestamp (BYTE_ARRAY/UTF8)
--- Row Group: 0  ---
--- Total Bytes: 201  ---
--- Rows: 1  ---
Column 0
 Values: 1, Min: 42, Max: 42, Null Values: 0, Distinct Values: 0
 Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY
 Uncompressed Size: 92, Compressed Size: 96
Column 1
 Values: 1, Min: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], Max: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], Null Values: 0, Distinct Values: 0
 Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY
 Uncompressed Size: 109, Compressed Size: 113
--- Values ---
value             |timestamp         |
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x83502c7]

goroutine 1 [running]:
github.com/apache/arrow/go/v16/internal/utils.GetMinMaxInt32(...)
	/home/powersj/test/arrow/go/internal/utils/min_max.go:190
github.com/apache/arrow/go/v16/parquet/internal/encoding.(*Int64DictConverter).IsValid(0x9492fe0, {0x9413270, 0x1, 0x1})
	/home/powersj/test/arrow/go/parquet/internal/encoding/typed_encoder.gen.go:495 +0x27
github.com/apache/arrow/go/v16/parquet/internal/utils.(*RleDecoder).GetBatchWithDictInt64(0x94e0708, {0x8488974, 0x9492fe0}, {0x95b4c00, 0x1, 0x80})
	/home/powersj/test/arrow/go/parquet/internal/utils/typed_rle_dict.gen.go:378 +0x228
github.com/apache/arrow/go/v16/parquet/internal/utils.(*RleDecoder).GetBatchWithDict(0x94e0708, {0x8488974, 0x9492fe0}, {0x839e4e0, 0x9511a84})
	/home/powersj/test/arrow/go/parquet/internal/utils/rle.go:417 +0x14e
github.com/apache/arrow/go/v16/parquet/internal/encoding.(*dictDecoder).decode(...)
	/home/powersj/test/arrow/go/parquet/internal/encoding/decoder.go:146
github.com/apache/arrow/go/v16/parquet/internal/encoding.(*DictInt64Decoder).Decode(0x95ae700, {0x95b4c00, 0x1, 0x80})
	/home/powersj/test/arrow/go/parquet/internal/encoding/typed_encoder.gen.go:436 +0x75
github.com/apache/arrow/go/v16/parquet/file.(*Int64ColumnChunkReader).ReadBatch.func1(0x0, 0x1)
	/home/powersj/test/arrow/go/parquet/file/column_reader_types.gen.go:93 +0xc3
github.com/apache/arrow/go/v16/parquet/file.(*columnChunkReader).readBatch(0x959c428, 0x80, {0x94f2300, 0x80, 0x80}, {0x94f2400, 0x80, 0x80}, 0x9511b6c)
	/home/powersj/test/arrow/go/parquet/file/column_reader.go:514 +0x2ab
github.com/apache/arrow/go/v16/parquet/file.(*Int64ColumnChunkReader).ReadBatch(0x959c428, 0x80, {0x95b4c00, 0x80, 0x80}, {0x94f2300, 0x80, 0x80}, {0x94f2400, 0x80, ...})
	/home/powersj/test/arrow/go/parquet/file/column_reader_types.gen.go:92 +0xa3
main.(*Dumper).readNextBatch(0x94a0820)
	/home/powersj/test/arrow/go/parquet/cmd/parquet_reader/dumper.go:88 +0x27f
main.(*Dumper).Next(0x94a0820)
	/home/powersj/test/arrow/go/parquet/cmd/parquet_reader/dumper.go:163 +0x61
main.main()
	/home/powersj/test/arrow/go/parquet/cmd/parquet_reader/main.go:359 +0x2a69
exit status 2

The parquet file I used was generated via the following:

#!/usr/bin/env python
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

df = pandas.DataFrame({
    'value': [42],
    'timestamp': ["1710683608143228692"]
})

pyarrow.parquet.write_table(pyarrow.Table.from_pandas(df), "input.parquet")

Component(s)

Go

zeroshade added a commit to zeroshade/arrow that referenced this issue Mar 19, 2024
zeroshade added a commit that referenced this issue Mar 19, 2024
### Rationale for this change
Running something which calls `MinMaxInt32` on a 32-bit architecture was crashing because it wasn't dropping to the noasm solution and was calling assembly designed for a 64-bit architecture. 

### What changes are included in this PR?
Adding the same build constraints to `min_max_noasm.go` as we have on others so that it gets built for 32-bit architectures to fallback to the pure go solution.

* GitHub Issue: #40672

Authored-by: Matt Topol <zotthewizard@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
@zeroshade zeroshade added this to the 16.0.0 milestone Mar 19, 2024
@zeroshade
Copy link
Member

Issue resolved by pull request 40676
#40676

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants