Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add shims for Spark 3.4.0 #5472

Merged
merged 10 commits into from
May 27, 2022
Merged

Conversation

firestarman
Copy link
Collaborator

@firestarman firestarman commented May 12, 2022

This PR is to add shims for Spark 3.4.0.

It has mainly

  • created the necessary classes for the 340 spark shims, Rapids shuffle manager and service provider.
  • created shims for Parquet reading related changes, whose names starting with Parquet.
  • created a new class ShimFilePartitionReaderFactory being the parent of the Rapids PERFILE readers, to hide the changes in Spark 3.4.0.
  • added 340 to the build scripts.
  • fixed some build errors by adding shims.

closes #5128
closes #5495

Signed-off-by: Firestarman firestarmanllc@gmail.com

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

firestarman commented May 12, 2022

For early review.
It is draft becuase I am figuring out the failing ITs, will file the follow-up issues accordingly.
[Updated] The failing ITs are tracked by #5480.

And there are 21 unit tests failing, also failing on 330, tracked by #5457.

AnsiCastOpSuite:
- Write bytes to string *** FAILED ***
- Write shorts to string *** FAILED ***
- Write ints to string *** FAILED ***
- Write longs to string *** FAILED ***
- Write ints to long *** FAILED ***
- Write longs to int (values within range) *** FAILED ***
- Write longs to short (values within range) *** FAILED ***
- Write longs to byte (values within range) *** FAILED ***
- Write ints to short (values within range) *** FAILED ***
- Write ints to byte (values within range) *** FAILED ***
- Write shorts to byte (values within range) *** FAILED ***
- Write floats to long (values within range) *** FAILED ***
- Write floats to int (values within range) *** FAILED ***
- Write floats to short (values within range) *** FAILED ***
- Write floats to byte (values within range) *** FAILED ***
- Write doubles to long (values within range) *** FAILED ***
- Write doubles to int (values within range) *** FAILED ***
- Write doubles to short (values within range) *** FAILED ***
- Write doubles to byte (values within range) *** FAILED ***
- Copy ints to long *** FAILED ***
- Copy long to float *** FAILED ***

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman firestarman marked this pull request as draft May 12, 2022 07:15
@jlowe jlowe added this to the May 2 - May 20 milestone May 12, 2022
@sameerz sameerz added the build Related to CI / CD or cleanly building label May 12, 2022
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

PR is blocked by #5457. Keep draft

@firestarman firestarman requested a review from gerashegalov May 13, 2022 02:13
@firestarman firestarman marked this pull request as ready for review May 16, 2022 01:45
@firestarman
Copy link
Collaborator Author

A follow-up issue #5495

@firestarman
Copy link
Collaborator Author

build

@firestarman
Copy link
Collaborator Author

CI failed, since the premerge build requires updates accordingly. Waiting for 22.08.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman firestarman changed the base branch from branch-22.06 to branch-21.08 May 23, 2022 11:49
@firestarman firestarman changed the base branch from branch-21.08 to branch-22.08 May 23, 2022 11:50
@firestarman firestarman requested a review from jlowe May 23, 2022 11:58
@firestarman
Copy link
Collaborator Author

A new follow-up #5589

@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman firestarman requested a review from jlowe May 24, 2022 02:33
@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

Copy link
Contributor

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be tests on 340+ to verify we're properly falling back on the limit-with-offset scenarios, as it's silent data corruption if we don't get that correct.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

Added tests for it

@firestarman
Copy link
Collaborator Author

build

@firestarman firestarman requested a review from jlowe May 26, 2022 03:22
@firestarman firestarman merged commit 7b743c8 into NVIDIA:branch-22.08 May 27, 2022
@firestarman firestarman deleted the 340-shim branch May 27, 2022 02:04
HaoYang670 pushed a commit to HaoYang670/spark-rapids that referenced this pull request Jun 6, 2022
* Add shims for Spark 340

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] create Spark 3.4 shim
3 participants