Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-5123: [Rust] Parquet derive for simple structs #4140

Closed
wants to merge 36 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
7c50980
Migrating from strings to smarter enum derive
xrl Apr 9, 2019
90cd1ee
Aggressive reformat
xrl Apr 9, 2019
7b54759
Refactor to pull out leaf type, comments of old code still present
xrl Apr 9, 2019
09fb15e
Comments removed and code formatted
xrl Apr 9, 2019
73317f6
ColumnWriter syn type from custom Type
xrl Apr 10, 2019
cff71c9
Tests pass
xrl Apr 11, 2019
ff626ca
No parquet data files in git
xrl Apr 11, 2019
e7fdbc4
Licenses
xrl Apr 11, 2019
74d112f
rustfmt pass
xrl Apr 11, 2019
5208f93
NaiveDateTime support behind a feature flag
xrl Apr 12, 2019
b14cda3
test optional timestamps, add panic for schema/struct field mismatch
xrl Apr 13, 2019
e844133
WIP
xrl Apr 16, 2019
700dea4
Feedback from PR
xrl Apr 17, 2019
f964a1c
More feedback changes
xrl Apr 17, 2019
13453ac
More feedback changes
xrl Apr 17, 2019
98b3321
cargo fmt pass
xrl Apr 17, 2019
021ec6f
reintroducing the records alias with a comment
xrl Apr 17, 2019
d93e84a
Release script uses wildcards for READMEs
xrl Apr 17, 2019
934aa8f
Cast num_days to i32
xrl Apr 17, 2019
e8e3381
UUID feature and tests
xrl Apr 22, 2019
8d557e5
Propagate errors
xrl Oct 23, 2019
ec7086c
Switch to parse_quote
xrl Oct 23, 2019
4766b53
Less unwrap
xrl Oct 23, 2019
e91f9e5
Switch to simpler recursive definition
xrl Oct 23, 2019
c98cef7
nit
xrl Oct 23, 2019
7bfa155
Some comments for a tricky bit
xrl Oct 23, 2019
57a05b0
No more generic x variable, use better names
xrl Oct 23, 2019
001ae1b
cargo fmt
xrl Oct 24, 2019
63de331
another rust fmt
xrl Oct 24, 2019
bf0a468
Sample program for parquet derive in the rust doc
xrl Oct 24, 2019
4ca84ad
Simplify checking length of generic args, skip using an iterator
xrl Oct 24, 2019
d04b112
- Update versions to latest
bryantbiggs Jan 21, 2020
76c65ef
Update error messages
bryantbiggs Jan 25, 2020
90a8f12
fix version in README
nevi-me Sep 14, 2020
abb8fe2
fix release script and failing test
nevi-me Sep 14, 2020
d1d7261
Update parquet_derive version
kou Sep 14, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@
!rust/arrow-flight/Cargo.toml
!rust/parquet/Cargo.toml
!rust/parquet/build.rs
!rust/parquet_derive/Cargo.toml
!rust/parquet_derive_test/Cargo.toml
!rust/datafusion/Cargo.toml
!rust/datafusion/benches
!rust/integration-testing/Cargo.toml
8 changes: 6 additions & 2 deletions ci/docker/debian-10-rust.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -58,14 +58,18 @@ RUN mkdir \
/arrow/rust/benchmarks/src \
/arrow/rust/datafusion/src \
/arrow/rust/integration-testing/src \
/arrow/rust/parquet/src && \
/arrow/rust/parquet/src \
/arrow/rust/parquet_derive/src \
/arrow/rust/parquet_derive_test/src && \
touch \
/arrow/rust/arrow-flight/src/lib.rs \
/arrow/rust/arrow/src/lib.rs \
/arrow/rust/benchmarks/src/lib.rs \
/arrow/rust/datafusion/src/lib.rs \
/arrow/rust/integration-testing/src/lib.rs \
/arrow/rust/parquet/src/lib.rs
/arrow/rust/parquet/src/lib.rs \
/arrow/rust/parquet_derive/src/lib.rs \
/arrow/rust/parquet_derive_test/src/lib.rs

# Compile dependencies for the whole workspace
RUN cd /arrow/rust && cargo build --workspace --lib --all-features
58 changes: 58 additions & 0 deletions dev/release/00-prepare-test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -330,6 +330,35 @@ def test_version_pre_tag
"+See [crate documentation](https://docs.rs/crate/parquet/#{@release_version}) on available API."],
],
},
{
path: "rust/parquet_derive/Cargo.toml",
hunks: [
["-version = \"#{@snapshot_version}\"",
"+version = \"#{@release_version}\""],
["-parquet = { path = \"../parquet\", version = \"#{@snapshot_version}\" }",
"+parquet = { path = \"../parquet\", version = \"#{@release_version}\" }"],
],
},
{
path: "rust/parquet_derive/README.md",
hunks: [
["-parquet = \"#{@snapshot_version}\"",
"-parquet_derive = \"#{@snapshot_version}\"",
"+parquet = \"#{@release_version}\"",
"+parquet_derive = \"#{@release_version}\""],
],
},
{
path: "rust/parquet_derive_test/Cargo.toml",
hunks: [
["-version = \"#{@snapshot_version}\"",
"+version = \"#{@release_version}\"",
"-parquet = { path = \"../parquet\", version = \"#{@snapshot_version}\" }",
"-parquet_derive = { path = \"../parquet_derive\", version = \"#{@snapshot_version}\" }",
"+parquet = { path = \"../parquet\", version = \"#{@release_version}\" }",
"+parquet_derive = { path = \"../parquet_derive\", version = \"#{@release_version}\" }"],
],
},
],
parse_patch(git("log", "-n", "1", "-p")))
end
Expand Down Expand Up @@ -537,6 +566,35 @@ def test_version_post_tag
"+See [crate documentation](https://docs.rs/crate/parquet/#{@next_snapshot_version}) on available API."],
],
},
{
path: "rust/parquet_derive/Cargo.toml",
hunks: [
["-version = \"#{@release_version}\"",
"+version = \"#{@next_snapshot_version}\""],
["-parquet = { path = \"../parquet\", version = \"#{@release_version}\" }",
"+parquet = { path = \"../parquet\", version = \"#{@next_snapshot_version}\" }"],
],
},
{
path: "rust/parquet_derive/README.md",
hunks: [
["-parquet = \"#{@release_version}\"",
"-parquet_derive = \"#{@release_version}\"",
"+parquet = \"#{@next_snapshot_version}\"",
"+parquet_derive = \"#{@next_snapshot_version}\""],
],
},
{
path: "rust/parquet_derive_test/Cargo.toml",
hunks: [
["-version = \"#{@release_version}\"",
"+version = \"#{@next_snapshot_version}\"",
"-parquet = { path = \"../parquet\", version = \"#{@release_version}\" }",
"-parquet_derive = { path = \"../parquet_derive\", version = \"#{@release_version}\" }",
"+parquet = { path = \"../parquet\", version = \"#{@next_snapshot_version}\" }",
"+parquet_derive = { path = \"../parquet_derive\", version = \"#{@next_snapshot_version}\" }"],
],
},
],
parse_patch(git("log", "-n", "1", "-p")))
end
Expand Down
26 changes: 7 additions & 19 deletions dev/release/00-prepare.sh
Original file line number Diff line number Diff line change
Expand Up @@ -151,29 +151,17 @@ update_versions() {
-e "s/^(arrow = .* version = )\".*\"(( .*)|(, features = .*))$/\\1\"${version}\"\\2/g" \
-e "s/^(arrow-flight = .* version = )\".+\"( .*)/\\1\"${version}\"\\2/g" \
-e "s/^(parquet = .* version = )\".*\"(( .*)|(, features = .*))$/\\1\"${version}\"\\2/g" \
-e "s/^(parquet_derive = .* version = )\".*\"(( .*)|(, features = .*))$/\\1\"${version}\"\\2/g" \
*/Cargo.toml
rm -f */Cargo.toml.bak
git add */Cargo.toml

# Update version number for parquet README
sed -i.bak -E -e \
"s/^parquet = \".+\"/parquet = \"${version}\"/g" \
parquet/README.md
sed -i.bak -E -e \
"s/docs.rs\/crate\/parquet\/.+\)/docs.rs\/crate\/parquet\/${version}\)/g" \
parquet/README.md
rm -f parquet/README.md.bak
git add parquet/README.md

# Update version number for datafusion README
sed -i.bak -E -e \
"s/^datafusion = \".+\"/datafusion = \"${version}\"/g" \
datafusion/README.md
sed -i.bak -E -e \
"s/docs.rs\/crate\/datafusion\/.+\)/docs.rs\/crate\/datafusion\/${version}\)/g" \
datafusion/README.md
rm -f datafusion/README.md.bak
git add datafusion/README.md
sed -i.bak -E \
-e "s/^([^ ]+) = \".+\"/\\1 = \"${version}\"/g" \
-e "s,docs\.rs/crate/([^/]+)/[^)]+,docs.rs/crate/\\1/${version},g" \
*/README.md
rm -f */README.md.bak
git add */README.md
cd -
}

Expand Down
2 changes: 2 additions & 0 deletions rust/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
members = [
"arrow",
"parquet",
"parquet_derive",
"parquet_derive_test",
"datafusion",
"arrow-flight",
"integration-testing",
Expand Down
6 changes: 4 additions & 2 deletions rust/parquet/src/record/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,10 @@
mod api;
pub mod reader;
mod record_writer;
mod triplet;

pub use self::api::{
List, ListAccessor, Map, MapAccessor, Row, RowAccessor, RowFormatter,
pub use self::{
api::{List, ListAccessor, Map, MapAccessor, Row, RowAccessor},
record_writer::RecordWriter,
};
26 changes: 26 additions & 0 deletions rust/parquet/src/record/record_writer.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

use super::super::errors::ParquetError;
use super::super::file::writer::RowGroupWriter;

pub trait RecordWriter<T> {
fn write_to_row_group(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking whether we can have a higher API, so that users do not need to directly manipulate row groups. Instead, could we pass a file writer, like the following?

fn write(&self, file: &mut dyn FileWriter) -> Result<()>;

this internally will write a row group for each call.

&self,
row_group_writer: &mut Box<RowGroupWriter>,
) -> Result<(), ParquetError>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can import and use the result type from parquet_rs so this can just be Result<()>.

}
37 changes: 37 additions & 0 deletions rust/parquet_derive/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

[package]
name = "parquet_derive"
version = "2.0.0-SNAPSHOT"
authors = ["Apache Arrow <dev@arrow.apache.org>"]
keywords = [ "parquet" ]
edition = "2018"

[lib]
proc-macro = true

[features]
chrono = []
bigdecimal = []
uuid = []

[dependencies]
proc-macro2 = "1.0.8"
quote = "1.0.2"
syn = { version = "1.0.14", features = ["full", "extra-traits"] }
parquet = { path = "../parquet", version = "2.0.0-SNAPSHOT" }
98 changes: 98 additions & 0 deletions rust/parquet_derive/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Parquet Derive

A crate for deriving `RecordWriter` for arbitrary, _simple_ structs. This does not generate writers for arbitrarily nested
structures. It only works for primitives and a few generic structures and
various levels of reference. Please see features checklist for what is currently
supported.

Derive also has some support for the chrono time library. You must must enable the `chrono` feature to get this support.

## Usage
Add this to your Cargo.toml:
```toml
[dependencies]
parquet = "2.0.0-SNAPSHOT"
parquet_derive = "2.0.0-SNAPSHOT"
```
xrl marked this conversation as resolved.
Show resolved Hide resolved

and this to your crate root:
```rust
extern crate parquet;
#[macro_use] extern crate parquet_derive;
```

Example usage of deriving a `RecordWriter` for your struct:

```rust
use parquet;
use parquet::record::RecordWriter;

#[derive(ParquetRecordWriter)]
struct ACompleteRecord<'a> {
pub a_bool: bool,
pub a_str: &'a str,
pub a_string: String,
pub a_borrowed_string: &'a String,
pub maybe_a_str: Option<&'a str>,
pub magic_number: i32,
pub low_quality_pi: f32,
pub high_quality_pi: f64,
pub maybe_pi: Option<f32>,
pub maybe_best_pi: Option<f64>,
}

// Initialize your parquet file
let mut writer = SerializedFileWriter::new(file, schema, props).unwrap();
let mut row_group = writer.next_row_group().unwrap();

// Build up your records
let chunks = vec![ACompleteRecord{...}];

// The derived `RecordWriter` takes over here
(&chunks[..]).write_to_row_group(&mut row_group);

writer.close_row_group(row_group).unwrap();
writer.close().unwrap();
```

## Features
- [X] Support writing `String`, `&str`, `bool`, `i32`, `f32`, `f64`, `Vec<u8>`
- [ ] Support writing dictionaries
- [X] Support writing logical types like timestamp
- [X] Derive definition_levels for `Option`
- [ ] Derive definition levels for nested structures
- [ ] Derive writing tuple struct
- [ ] Derive writing `tuple` container types

## Requirements
- Same as `parquet-rs`

## Test
Testing a `*_derive` crate requires an intermediate crate. Go to `parquet_derive_test` and run `cargo test` for
unit tests.

## Docs
To build documentation, run `cargo doc --no-deps`.
To compile and view in the browser, run `cargo doc --no-deps --open`.

## License
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0.
Loading