Potential conflict for non alphabetic character leading schema #294

chris920820 · 2020-07-08T01:35:24Z

Hey, @xitongsys !
Per a7314a1,
Seems we are adding a prefix P_ to the schema that is not leading by nonalphabetic character.
However, if a parquet has the schema P__x and _x, it will result in conflict, since we can no longer distinguish if data came from P__x or _x. Also, it might be problematic for the consumer to know this convention. For example, if the consumer is expecting the column _x exist, and try to read data using name _x it will fail because it has internally converted to P__x.

Do we have some places that enforce this naming convention (no leading non alphabetic char)? Does Golang compiler enforce that in some places?

Is there any better we could handle this more gracefully? To avoid using non alphabetic leading characters as variable name, could we can add a global prefix instead of just add a prefix of certain columns?

The text was updated successfully, but these errors were encountered:

mitigate #294

xitongsys · 2020-09-10T12:55:57Z

hi, @chris920820
Sorry for so late response.
For now I just mitigate this issue in the pull request #310 and also add some comment in readme.

…itongsys#289) * refactor packages to use encoding.Values container * refactor page and dictionary creation to use encoding.Values * go vet fix * reduce memory footprint of encoding.Values * refactor encoding.Encoding to use simple Go types * port parquet-go package to use pair of values+offsets to represent byte arrays * add fuzz tests back * optimize DELTA_LENGTH_BYTE_ARRAY decoding (xitongsys#291) * optimize DELTA_LENGTH_BYTE_ARRAY decoding * add link to online documentation * fix * add a unit test for decodeByteArrayLengths * Update encoding/delta/length_byte_array_amd64.s Co-authored-by: Kevin Burke <kevin.burke@segment.com> * optimize DELTA_LENGTH_BYTE_ARRAY encoding (xitongsys#292) Co-authored-by: Kevin Burke <kevin.burke@segment.com> * account for size of offsets buffer when benchmarking throughput * optimize DELTA_BYTE_ARRAY decoding (xitongsys#294) * PR feedback Co-authored-by: Kevin Burke <kevin.burke@segment.com>

xitongsys closed this as completed in 27b6877 Sep 10, 2020

xitongsys added a commit that referenced this issue Sep 10, 2020

Merge pull request #310 from xitongsys/dev

c4bb6b3

mitigate #294

xitongsys reopened this Sep 10, 2020

xitongsys closed this as completed Sep 21, 2020

durango pushed a commit to edms/parquet-go that referenced this issue Apr 14, 2021

fix xitongsys#294

e17b417

durango pushed a commit to edms/parquet-go that referenced this issue Apr 14, 2021

fix xitongsys#294

bd6fa37

hangxie mentioned this issue Jun 11, 2021

[BUG] Can't use numbers in parquet column names #387

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential conflict for non alphabetic character leading schema #294

Potential conflict for non alphabetic character leading schema #294

chris920820 commented Jul 8, 2020 •

edited

Loading

xitongsys commented Sep 10, 2020

Potential conflict for non alphabetic character leading schema #294

Potential conflict for non alphabetic character leading schema #294

Comments

chris920820 commented Jul 8, 2020 • edited Loading

xitongsys commented Sep 10, 2020

chris920820 commented Jul 8, 2020 •

edited

Loading