-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write encoding of type DELTA_BINARY_PACKED corrupts file #190
Comments
hi, @jaredmessenger It's a bug and i have fixed it. please use the latest codes on master branch to test again. package main
import (
"log"
"time"
"github.com/xitongsys/parquet-go-source/local"
"github.com/xitongsys/parquet-go/reader"
"github.com/xitongsys/parquet-go/writer"
"github.com/xitongsys/parquet-go/parquet"
)
type Delta struct {
Timestamp int64 `parquet:"name=timestamp, type=INT64, encoding=DELTA_BINARY_PACKED"`
Value string `parquet:"name=value, type=UTF8, encoding=DELTA_LENGTH_BYTE_ARRAY"`
}
func main() {
var err error
fw, err := local.NewLocalFileWriter("flat.parquet")
if err != nil {
log.Println("Can't create local file", err)
return
}
//write
pw, err := writer.NewParquetWriter(fw, new(Delta), 4)
if err != nil {
log.Println("Can't create parquet writer", err)
return
}
pw.RowGroupSize = 128 * 1024 * 1024 //128M
pw.CompressionType = parquet.CompressionCodec_SNAPPY
num := 2
for i := 0; i < num; i++ {
stu := Delta{
Timestamp: time.Now().UnixNano() / 1e6,
Value: "SomeString",
}
if err = pw.Write(stu); err != nil {
log.Println("Write error", err)
}
}
if err = pw.WriteStop(); err != nil {
log.Println("WriteStop error", err)
return
}
log.Println("Write Finished")
fw.Close()
///read
fr, err := local.NewLocalFileReader("flat.parquet")
if err != nil {
log.Println("Can't open file")
return
}
pr, err := reader.NewParquetReader(fr, new(Delta), 4)
if err != nil {
log.Println("Can't create parquet reader", err)
return
}
num = int(pr.GetNumRows())
stus := make([]Delta, num)
if err = pr.Read(&stus); err != nil {
log.Println("Read error", err)
}
log.Println(stus)
pr.ReadStop()
fr.Close()
} Result: go run b.go
2019/12/17 10:23:53 Write Finished
2019/12/17 10:23:53 [{1576549433357 SomeString} {1576549433357 SomeString}] Using Apache parquet-tools.jar
|
zolstein
pushed a commit
to zolstein/parquet-go
that referenced
this issue
Jun 23, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Trying to use
DELTA_BINARY_PACKED
for timestamps corrupts the file.Local flat example, but with a timestamp.
Reading from parquet-tool
Using Apache Brew parquet-tools
The text was updated successfully, but these errors were encountered: