Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(parsers.parquet): Add Apache Parquet Parser #15008

Merged
merged 8 commits into from
Apr 10, 2024

Conversation

powersj
Copy link
Contributor

@powersj powersj commented Mar 17, 2024

Summary

Checklist

  • No AI generated code was used in this PR

Related issues

fixes: #14785

@telegraf-tiger telegraf-tiger bot added feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/parser 1. Request for new parser plugins 2. Issues/PRs that are related to parser plugins labels Mar 17, 2024
@powersj powersj changed the title feat(parsers.parquet): Add Apache Parquet Serializer feat(parsers.parquet): Add Apache Parquet Parser Mar 17, 2024
@powersj powersj self-assigned this Mar 18, 2024
@powersj powersj force-pushed the feat/parquet-parser branch 2 times, most recently from df2c071 to db951d5 Compare March 19, 2024 02:05
@powersj
Copy link
Contributor Author

powersj commented Mar 19, 2024

@srebhan thoughts on the 386 failure? Do I need to be checking the float64 values against math.MaxUint32?

@srebhan
Copy link
Member

srebhan commented Mar 19, 2024

@powersj looks like a Apache Arrow issue. They had those issues before... I would report it to them...

@powersj powersj force-pushed the feat/parquet-parser branch from db951d5 to 88d6dcc Compare March 19, 2024 14:37
@powersj
Copy link
Contributor Author

powersj commented Mar 19, 2024

I can either put this on hold till the upstream issue is resolved: apache/arrow#40672 or we could limit the tests and usage to 64-bit. Otherwise, I believe the PR could get an initial review.

@powersj powersj marked this pull request as ready for review March 19, 2024 16:23
@powersj
Copy link
Contributor Author

powersj commented Mar 19, 2024

Upstream issue resolved!

@powersj powersj added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Mar 19, 2024
@powersj powersj assigned srebhan and DStrand1 and unassigned powersj Mar 19, 2024
Copy link
Member

@DStrand1 DStrand1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Have just a couple questions

plugins/parsers/parquet/parser.go Outdated Show resolved Hide resolved
plugins/parsers/parquet/columns.go Show resolved Hide resolved
@srebhan srebhan removed their assignment Mar 20, 2024
Copy link
Member

@DStrand1 DStrand1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @srebhan can you take a look at this?

@DStrand1 DStrand1 assigned srebhan and unassigned DStrand1 Mar 26, 2024
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @powersj for the awesome parser and the exemplary test coverage! Some comments from my side...

plugins/parsers/parquet/README.md Show resolved Hide resolved
plugins/parsers/parquet/README.md Outdated Show resolved Hide resolved
plugins/parsers/parquet/README.md Outdated Show resolved Hide resolved
plugins/parsers/parquet/columns.go Outdated Show resolved Hide resolved
plugins/parsers/parquet/columns.go Outdated Show resolved Hide resolved
plugins/parsers/parquet/parser.go Outdated Show resolved Hide resolved
plugins/parsers/parquet/parser.go Outdated Show resolved Hide resolved
plugins/parsers/parquet/parser_test.go Outdated Show resolved Hide resolved
plugins/parsers/parquet/testcases/benchmark/README.md Outdated Show resolved Hide resolved
plugins/parsers/parquet/parser.go Outdated Show resolved Hide resolved
@powersj powersj force-pushed the feat/parquet-parser branch from 8956f21 to 1d69fe1 Compare April 2, 2024 14:20
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks @powersj! Just one question left from my side... In case the code is as it should be, it would be nice to add a small comment to the code explaining why you can use the length of the first column.

plugins/parsers/parquet/parser.go Outdated Show resolved Hide resolved
@powersj powersj force-pushed the feat/parquet-parser branch from 9f85b1b to 552bced Compare April 4, 2024 22:27
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@srebhan
Copy link
Member

srebhan commented Apr 8, 2024

@DStrand1 assigning it back to you in case you want to take another look after the code changes...

@srebhan srebhan assigned DStrand1 and unassigned srebhan Apr 8, 2024
Copy link
Member

@DStrand1 DStrand1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @powersj can you check the "No AI generated code was used" button?

@DStrand1 DStrand1 assigned powersj and unassigned DStrand1 Apr 9, 2024
@powersj powersj force-pushed the feat/parquet-parser branch from 6dac14a to d7422d4 Compare April 10, 2024 13:24
@powersj
Copy link
Contributor Author

powersj commented Apr 10, 2024

@DStrand1 button clicked and rebased on master. Will let tests run before hitting the button.

@telegraf-tiger
Copy link
Contributor

Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip.
Downloads for additional architectures and packages are available below.

⚠️ This pull request increases the Telegraf binary size by 1.88 % for linux amd64 (new size: 236.7 MB, nightly size 232.3 MB)

📦 Click here to get additional PR build artifacts

Artifact URLs

DEB RPM TAR GZ ZIP
amd64.deb aarch64.rpm darwin_amd64.tar.gz windows_amd64.zip
arm64.deb armel.rpm darwin_arm64.tar.gz windows_arm64.zip
armel.deb armv6hl.rpm freebsd_amd64.tar.gz windows_i386.zip
armhf.deb i386.rpm freebsd_armv7.tar.gz
i386.deb ppc64le.rpm freebsd_i386.tar.gz
mips.deb riscv64.rpm linux_amd64.tar.gz
mipsel.deb s390x.rpm linux_arm64.tar.gz
ppc64el.deb x86_64.rpm linux_armel.tar.gz
riscv64.deb linux_armhf.tar.gz
s390x.deb linux_i386.tar.gz
linux_mips.tar.gz
linux_mipsel.tar.gz
linux_ppc64le.tar.gz
linux_riscv64.tar.gz
linux_s390x.tar.gz

@powersj powersj merged commit ba9cbee into influxdata:master Apr 10, 2024
26 checks passed
@powersj powersj deleted the feat/parquet-parser branch April 10, 2024 13:54
@github-actions github-actions bot added this to the v1.31.0 milestone Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/parser 1. Request for new parser plugins 2. Issues/PRs that are related to parser plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parquet Parser
3 participants