-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(pyspark): implement new experimental read/write directory methods #9272
feat(pyspark): implement new experimental read/write directory methods #9272
Conversation
b008b1f
to
c9ef6df
Compare
@@ -360,6 +325,7 @@ def con(data_dir, tmp_path_factory, worker_id): | |||
@pytest.fixture(scope="session") | |||
def con_streaming(data_dir, tmp_path_factory, worker_id): | |||
backend_test = TestConfForStreaming.load_data(data_dir, tmp_path_factory, worker_id) | |||
backend_test._load_data() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure why this is needed, let me investigate a bit what's going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured out what the problem is! The pyspark backend loads statefully and the streaming conf ended up reusing the same temp directory from the batch conf, which already exists when the batch conf tried to load data, so the streaming conf skips data loading. Because tmpdir
is passed as a fixture, I added a line of code that changes the directory naming for the streaming conf.
76f613b
to
206cd9c
Compare
206cd9c
to
3093145
Compare
d51bd42
to
f933779
Compare
85317b3
to
4f03594
Compare
3237871
to
6bc9364
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this! Since we're using the experimental
we are free to break this across non-major versions based on user feedback/bugs/etc.
Description of changes
Implement new experimental read/write directory methods in Pyspark backend to support streaming read/write.
Issues closed
#8984