forked from apache/arrow
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PARQUET-442: Nested schema conversion, Thrift struct decoupling, dump…
…-schema utility Several inter-related things here: * Added SchemaDescriptor and ColumnDescriptor types to hold computed structure information (e.g. max ref/def levels) about the file schema. These are used now in the FileReader and ColumnReader * I also added, very similar to parquet-mr (though leaned down), a logical schema node class structure which can be used for both the file reading and writing. * Added FlatSchemaConverter to convert Parquet flat schema metadata into a nested logical schema * Added a SchemaPrinter tool and parquet-dump-schema CLI tool to visit a nested schema and print it to the console. * Another big thing here is that per PARQUET-446 and related work in parquet-mr, it's important for both the public API of this project and internal development to limit our coupling to the compiled Thrift headers. I added `Type`, `Repetition`, and `LogicalType` enums to the `parquet_cpp` namespace and inverted the dependency between the column readers, scanners, and encoders to use these enums. * A bunch of unit tests. Author: Wes McKinney <wes@cloudera.com> Closes apache#38 from wesm/PARQUET-442 and squashes the following commits: 9ca0219 [Wes McKinney] Add a unit test for SchemaPrinter fdd37cd [Wes McKinney] Comment re: FLBA node ctor 3a15c0c [Wes McKinney] Add some SchemaDescriptor and ColumnDescriptor tests 27e1805 [Wes McKinney] Don't squash supplied CMAKE_CXX_FLAGS 76dd283 [Wes McKinney] Refactor Make* methods as static member functions 2fae8cd [Wes McKinney] Trim some includes b2e2661 [Wes McKinney] More doc about the parquet_cpp enums bd78d7c [Wes McKinney] Move metadata enums to parquet/types.h and add rest of parquet:: enums. Add NONE value to Compression 415305b [Wes McKinney] cpplint 4ac84aa [Wes McKinney] Refactor to make PrimitiveNode and GroupNode ctors private. Add MakePrimitive and MakeGroup factory functions. Move parquet::SchemaElement function into static FromParquet ctors so can set private members 3169b24 [Wes McKinney] NewPrimitive should set num_children = 0 always 954658e [Wes McKinney] Add a comment for TestSchemaConverter.InvalidRoot and uncomment tests for root nodes of other repetition types 55d21b0 [Wes McKinney] Remove schema-builder-test.cc 71c1eab [Wes McKinney] Remove crufty builder.h, will revisit 7ef2dee [Wes McKinney] Fix list encoding comment 8c5af4e [Wes McKinney] Remove old comment, unneeded cast 6b041c5 [Wes McKinney] First draft SchemaDescriptor::Init. Refactor to use ColumnDescriptor. Standardize on parquet_cpp enums instead of Thrift metadata structs. Limit #include from Thrift 841ae7f [Wes McKinney] Don't export SchemaPrinter for now 834389a [Wes McKinney] Add Node::Visotor API and implement a simple schema dump CLI tool a8bf5c8 [Wes McKinney] Catch and throw exception (instead of core dump) if run out of schema children. Add a Node::Visitor abstract API bde8b18 [Wes McKinney] Can compare FLBA type metadata in logical schemas f0df0ba [Wes McKinney] Finish a nested schema conversion test 0af0161 [Wes McKinney] Check that root schema node is repeated 5df00aa [Wes McKinney] Expose GroupConverter API, add test for invalid root beaa99f [Wes McKinney] Refactor slightly and add an FLBA test 6e248b8 [Wes McKinney] Schema tree conversion first cut, add a couple primitive tests 9685c90 [Wes McKinney] Rename Schema -> RootSchema and add another unit test f7d0487 [Wes McKinney] Schema types test coverage, move more methods into compilation unit d746352 [Wes McKinney] Better isolate thrift dependency. Move schema/column descriptor into its own header a8e5a0a [Wes McKinney] Tweaks fb9d7ad [Wes McKinney] Draft of flat to nested schema conversion. No tests yet 3015063 [Wes McKinney] More prototyping. Rename Type -> Node. PrimitiveNode factory functions a8a7a01 [Wes McKinney] Start drafting schema types Change-Id: I484f0a6f02d17d3905f2a40e3b0f17a01554a413
- Loading branch information
1 parent
9c0066c
commit 94257f8
Showing
31 changed files
with
2,093 additions
and
236 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.