Skip to content

Commit

Permalink
Add a Parquet file with column chunk key-value metadata (#49)
Browse files Browse the repository at this point in the history
* Add a Parquet file with column chunk key-value metadata

This file has a single row group with 0 row and 1 column. The column
chunk has key-value metadata, with a key "foo" mapped to a value "bar".

Created with this code:

```c++
PARQUET_ASSIGN_OR_THROW(
    auto sink, arrow::io::FileOutputStream::Open(
                   "column-chunk-key-value-metadata.parquet"));
parquet::ParquetFileWriter::Open(
    sink, std::static_pointer_cast<parquet::schema::GroupNode>(
              parquet::schema::GroupNode::Make(
                  "schema", parquet::Repetition::REQUIRED,
                  {parquet::schema::PrimitiveNode::Make(
                      "column1", parquet::Repetition::OPTIONAL,
                      parquet::Type::INT32)})))
    ->AppendRowGroup()
    ->NextColumn()
    ->key_value_metadata()
    .Append("foo", "bar");
```

* Rename to match the prevalent style

* Make it 2 columns

* Update data/README.md

* Add a KeyValue entry without Value

* Update data/README.md

Co-authored-by: mwish <maplewish117@gmail.com>

* Update README.md

* Update README.md

---------

Co-authored-by: mwish <maplewish117@gmail.com>
  • Loading branch information
clee704 and mapleFU authored Jul 21, 2024
1 parent 1bf4bd3 commit 9b48ff4
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
| concatenated_gzip_members.parquet | 513 UINT64 numbers compressed using 2 concatenated gzip members in a single data page |
| byte_stream_split.zstd.parquet | Standard normals with `BYTE_STREAM_SPLIT` encoding. See [note](#byte-stream-split) below |
| incorrect_map_schema.parquet | Contains a Map schema without explicitly required keys, produced by Presto. See [note](#incorrect-map-schema) |
| column_chunk_key_value_metadata.parquet | two INT32 columns, one with column chunk key-value metadata {"foo": "bar", "thisiskeywithoutvalue": null} note that the second key "thisiskeywithoutvalue", does not have a value, but the value can be mapped to an empty string "" when read depending on the client |

TODO: Document what each file is in the table above.

Expand Down Expand Up @@ -425,4 +426,4 @@ message hive_schema {
}
}
}
```
```
Binary file added data/column_chunk_key_value_metadata.parquet
Binary file not shown.

0 comments on commit 9b48ff4

Please sign in to comment.