[SPARK-44705][PYTHON] Make PythonRunner single-threaded #42385

utkarsh39 · 2023-08-08T01:31:06Z

What changes were proposed in this pull request?

PythonRunner, a utility that executes Python UDFs in Spark, uses two threads in a producer-consumer model today. This multi-threading model is problematic and confusing as Spark's execution model within a task is commonly understood to be single-threaded.
More importantly, this departure of a double-threaded execution resulted in a series of customer issues involving race conditions and deadlocks between threads as the code was hard to reason about. There have been multiple attempts to reign in these issues, viz., fix 1, fix 2, fix 3. Moreover, the fixes have made the code base somewhat abstruse by introducing multiple daemon monitor threads to detect deadlocks. This PR makes PythonRunner single-threaded making it easier to reason about and improving code health.

Current Execution Model in Spark for Python UDFs

For queries containing Python UDFs, the main Java task thread spins up a new writer thread to pipe data from the child Spark plan into the Python worker evaluating the UDF. The writer thread runs in a tight loop: evaluates the child Spark plan, and feeds the resulting output to the Python worker. The main task thread simultaneously consumes the Python UDF’s output and evaluates the parent Spark plan to produce the final result.
The I/O to/from the Python worker uses blocking Java Sockets necessitating the use of two threads, one responsible for input to the Python worker and the other for output. Without two threads, it is easy to run into a deadlock. For example, the task can block forever waiting for the output from the Python worker. The output will never arrive until the input is supplied to the Python worker, which is not possible as the task thread is blocked while waiting on output.

Proposed Fix

The proposed fix is to move to the standard single-threaded execution model within a task, i.e., to do away with the writer thread. In addition to mitigating the crashes, the fix reduces the complexity of the existing code by doing away with many safety checks in place to track deadlocks in the double-threaded execution model.

In the new model, the main task thread alternates between consuming/feeding data to the Python worker using asynchronous I/O through Java’s SocketChannel. See the read() method in the code below for approximately how this is achieved.

case class PythonUDFRunner {

  private var nextRow: Row = _
  private var endOfStream = false
  private var childHasNext = true
  private var buffer: ByteBuffer = _

  def hasNext(): Boolean = nextRow != null || {
     if (!endOfStream) {
       read(buffer)
       nextRow = deserialize(buffer)
       hasNext
     } else {
       false
     }
  }

  def next(): Row = {
     if (hasNext) {
       val outputRow = nextRow
       nextRow = null
       outputRow
     } else {
       null
     }
  }
 
  def read(buf: Array[Byte]): Row = {
    var n = 0
    while (n == 0) {
    // Alternate between reading/writing to the Python worker using async I/O
    if (pythonWorker.isReadable) {
      n = pythonWorker.read(buf)
    }
    if (pythonWorker.isWritable) {
      consumeChildPlanAndWriteDataToPythonWorker()
    }
  }
  
  def consumeChildPlanAndWriteDataToPythonWorker(): Unit = {
      // Tracks whether the connection to the Python worker can be written to. 
      var socketAcceptsInput = true
      while (socketAcceptsInput && (childHasNext || buffer.hasRemaining)) {
        if (!buffer.hasRemaining && childHasNext) {
          // Consume data from the child and buffer it.
          writeToBuffer(childPlan.next(), buffer)
          childHasNext = childPlan.hasNext()
          if (!childHasNext) {
            // Exhausted child plan’s output. Write a keyword to the Python worker signaling the end of data input.
            writeToBuffer(endOfStream)
          }
        }
        // Try to write as much buffered data as possible to the Python worker.
        while (buffer.hasRemaining && socketAcceptsInput) {
          val n = writeToPythonWorker(buffer)
          // `writeToPythonWorker()` returns 0 when the socket cannot accept more data right now.
          socketAcceptsInput = n > 0
        }
      }
    }
}

Why are the changes needed?

This PR makes PythonRunner single-threaded making it easier to reason about and improving code health.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests.

zhengruifeng · 2023-08-08T03:08:31Z

cc @HyukjinKwon @ueshin

HyukjinKwon

Already reviewed this actually. LGTM if tests pass. This is a nice fix to have.

core/src/main/scala/org/apache/spark/ContextAwareIterator.scala

dongjoon-hyun

Hi, @utkarsh39 . Could you address, @ueshin and @HyukjinKwon 's review comment?

[SPARK-44705][PYTHON] Make PythonRunner single-threaded #42385 (comment)

HyukjinKwon · 2023-08-11T01:34:25Z

Merged to master.

### What changes were proposed in this pull request? PythonRunner, a utility that executes Python UDFs in Spark, uses two threads in a producer-consumer model today. This multi-threading model is problematic and confusing as Spark's execution model within a task is commonly understood to be single-threaded. More importantly, this departure of a double-threaded execution resulted in a series of customer issues involving [race conditions](https://issues.apache.org/jira/browse/SPARK-33277) and [deadlocks](https://issues.apache.org/jira/browse/SPARK-38677) between threads as the code was hard to reason about. There have been multiple attempts to reign in these issues, viz., [fix 1](https://issues.apache.org/jira/browse/SPARK-22535), [fix 2](apache#30177), [fix 3](apache@243c321). Moreover, the fixes have made the code base somewhat abstruse by introducing multiple daemon [monitor threads](https://github.com/apache/spark/blob/a3a32912be04d3760cb34eb4b79d6d481bbec502/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala#L579) to detect deadlocks. This PR makes PythonRunner single-threaded making it easier to reason about and improving code health. #### Current Execution Model in Spark for Python UDFs For queries containing Python UDFs, the main Java task thread spins up a new writer thread to pipe data from the child Spark plan into the Python worker evaluating the UDF. The writer thread runs in a tight loop: evaluates the child Spark plan, and feeds the resulting output to the Python worker. The main task thread simultaneously consumes the Python UDF’s output and evaluates the parent Spark plan to produce the final result. The I/O to/from the Python worker uses blocking Java Sockets necessitating the use of two threads, one responsible for input to the Python worker and the other for output. Without two threads, it is easy to run into a deadlock. For example, the task can block forever waiting for the output from the Python worker. The output will never arrive until the input is supplied to the Python worker, which is not possible as the task thread is blocked while waiting on output. #### Proposed Fix The proposed fix is to move to the standard single-threaded execution model within a task, i.e., to do away with the writer thread. In addition to mitigating the crashes, the fix reduces the complexity of the existing code by doing away with many safety checks in place to track deadlocks in the double-threaded execution model. In the new model, the main task thread alternates between consuming/feeding data to the Python worker using asynchronous I/O through Java’s [SocketChannel](https://docs.oracle.com/javase/7/docs/api/java/nio/channels/SocketChannel.html). See the `read()` method in the code below for approximately how this is achieved. ``` case class PythonUDFRunner { private var nextRow: Row = _ private var endOfStream = false private var childHasNext = true private var buffer: ByteBuffer = _ def hasNext(): Boolean = nextRow != null || { if (!endOfStream) { read(buffer) nextRow = deserialize(buffer) hasNext } else { false } } def next(): Row = { if (hasNext) { val outputRow = nextRow nextRow = null outputRow } else { null } } def read(buf: Array[Byte]): Row = { var n = 0 while (n == 0) { // Alternate between reading/writing to the Python worker using async I/O if (pythonWorker.isReadable) { n = pythonWorker.read(buf) } if (pythonWorker.isWritable) { consumeChildPlanAndWriteDataToPythonWorker() } } def consumeChildPlanAndWriteDataToPythonWorker(): Unit = { // Tracks whether the connection to the Python worker can be written to. var socketAcceptsInput = true while (socketAcceptsInput && (childHasNext || buffer.hasRemaining)) { if (!buffer.hasRemaining && childHasNext) { // Consume data from the child and buffer it. writeToBuffer(childPlan.next(), buffer) childHasNext = childPlan.hasNext() if (!childHasNext) { // Exhausted child plan’s output. Write a keyword to the Python worker signaling the end of data input. writeToBuffer(endOfStream) } } // Try to write as much buffered data as possible to the Python worker. while (buffer.hasRemaining && socketAcceptsInput) { val n = writeToPythonWorker(buffer) // `writeToPythonWorker()` returns 0 when the socket cannot accept more data right now. socketAcceptsInput = n > 0 } } } } ``` ### Why are the changes needed? This PR makes PythonRunner single-threaded making it easier to reason about and improving code health. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests. Closes apache#42385 from utkarsh39/SPARK-44705. Authored-by: Utkarsh <utkarsh.agarwal@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

LuciferYang · 2023-08-14T05:14:50Z

This PR caused the failure of the Scala 2.13 mima check. #42479

LuciferYang · 2023-08-14T07:57:14Z

@utkarsh39

I found that this PR may caused some PySpark test cases to fail in the Java 17 daily tests(pyspark-sql and pyspark-connect module)：

To verify this , I conducted some local testing using Java 17

java -version
openjdk version "17.0.8" 2023-07-18 LTS
OpenJDK Runtime Environment Zulu17.44+15-CA (build 17.0.8+7-LTS)
OpenJDK 64-Bit Server VM Zulu17.44+15-CA (build 17.0.8+7-LTS, mixed mode, sharing)

Revert to the previous PR before SPARK-44705 and run the following commands:

// [SPARK-44765][CONNECT] Simplify retries of ReleaseExecute
git reset --hard 9bde882fcb39e9fedced0df9702df2a36c1a84e6
export SKIP_UNIDOC=true
export SKIP_MIMA=true
export SKIP_PACKAGING=true
./dev/run-tests --parallelism 1 --modules "pyspark-sql"

Finished test(python3.9): pyspark.sql.tests.test_udtf (57s) ... 2 tests were skipped

The tests in pyspark.sql.tests.test_udtf passed.

Revert to SPARK-44705 and run the following commands:

// [SPARK-44705][PYTHON] Make PythonRunner single-threaded
git reset --hard 8aaff55839493e80e3ce376f928c04aa8f31d18c
export SKIP_UNIDOC=true
export SKIP_MIMA=true
export SKIP_PACKAGING=true
./dev/run-tests --parallelism 1 --modules "pyspark-sql"

======================================================================
FAIL: test_udtf_with_analyze_table_argument_adding_columns (pyspark.sql.tests.test_udtf.UDTFTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/tests/test_udtf.py", line 1340, in test_udtf_with_analyze_table_argument_adding_columns
    assertSchemaEqual(
  File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/testing/utils.py", line 356, in assertSchemaEqual
    raise PySparkAssertionError(
pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_SCHEMA] Schemas do not match.
--- actual
+++ expected
- StructType([StructField('a', LongType(), True)])
+ StructType([StructField('id', LongType(), False), StructField('is_even', BooleanType(), True)])

======================================================================
FAIL: test_udtf_with_analyze_table_argument_repeating_rows (pyspark.sql.tests.test_udtf.UDTFTests) (query_no=0)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/tests/test_udtf.py", line 1394, in test_udtf_with_analyze_table_argument_repeating_rows
    assertSchemaEqual(df.schema, expected_schema)
  File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/testing/utils.py", line 356, in assertSchemaEqual
    raise PySparkAssertionError(
pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_SCHEMA] Schemas do not match.
--- actual
+++ expected
- StructType([StructField('id', LongType(), False), StructField('is_even', BooleanType(), True)])
+ StructType([StructField('id', LongType(), False)])

======================================================================
FAIL: test_udtf_with_analyze_table_argument_repeating_rows (pyspark.sql.tests.test_udtf.UDTFTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/tests/test_udtf.py", line 1400, in test_udtf_with_analyze_table_argument_repeating_rows
    self.spark.sql(
AssertionError: AnalysisException not raised

======================================================================
FAIL: test_udtf_with_analyze_using_accumulator (pyspark.sql.tests.test_udtf.UDTFTests) (query_no=0)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/tests/test_udtf.py", line 1625, in test_udtf_with_analyze_using_accumulator
    assertSchemaEqual(df.schema, StructType().add("col1", IntegerType()))
  File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/testing/utils.py", line 356, in assertSchemaEqual
    raise PySparkAssertionError(
pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_SCHEMA] Schemas do not match.
--- actual
+++ expected
- StructType([StructField('a', IntegerType(), True), StructField('b', IntegerType(), True)])
+ StructType([StructField('col1', IntegerType(), True)])

======================================================================
FAIL: test_udtf_with_analyze_using_accumulator (pyspark.sql.tests.test_udtf.UDTFTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/tests/test_udtf.py", line 1628, in test_udtf_with_analyze_using_accumulator
    self.assertEqual(test_accum.value, 222)
AssertionError: 111 != 222

----------------------------------------------------------------------
Ran 174 tests in 54.619s

FAILED (failures=34, errors=6, skipped=2)

There are 34 test failures after this one merged.

@utkarsh39 Do you have time to fix these test cases? For this, I have created SPARK-44797.

Or should we revert this PR to restore the Java 17 daily tests first? @HyukjinKwon @ueshin @dongjoon-hyun

utkarsh39 · 2023-08-14T14:17:53Z

@utkarsh39

I found that this PR may caused some PySpark test cases to fail in the Java 17 daily tests(pyspark-sql and pyspark-connect module)：

To verify this , I conducted some local testing using Java 17

java -version
openjdk version "17.0.8" 2023-07-18 LTS
OpenJDK Runtime Environment Zulu17.44+15-CA (build 17.0.8+7-LTS)
OpenJDK 64-Bit Server VM Zulu17.44+15-CA (build 17.0.8+7-LTS, mixed mode, sharing)

Revert to the previous PR before SPARK-44705 and run the following commands:

// [SPARK-44765][CONNECT] Simplify retries of ReleaseExecute
git reset --hard 9bde882fcb39e9fedced0df9702df2a36c1a84e6
export SKIP_UNIDOC=true
export SKIP_MIMA=true
export SKIP_PACKAGING=true
./dev/run-tests --parallelism 1 --modules "pyspark-sql"

Finished test(python3.9): pyspark.sql.tests.test_udtf (57s) ... 2 tests were skipped

The tests in pyspark.sql.tests.test_udtf passed.

Revert to SPARK-44705 and run the following commands:

// [SPARK-44705][PYTHON] Make PythonRunner single-threaded
git reset --hard 8aaff55839493e80e3ce376f928c04aa8f31d18c
export SKIP_UNIDOC=true
export SKIP_MIMA=true
export SKIP_PACKAGING=true
./dev/run-tests --parallelism 1 --modules "pyspark-sql"

======================================================================
FAIL: test_udtf_with_analyze_table_argument_adding_columns (pyspark.sql.tests.test_udtf.UDTFTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/tests/test_udtf.py", line 1340, in test_udtf_with_analyze_table_argument_adding_columns
    assertSchemaEqual(
  File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/testing/utils.py", line 356, in assertSchemaEqual
    raise PySparkAssertionError(
pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_SCHEMA] Schemas do not match.
--- actual
+++ expected
- StructType([StructField('a', LongType(), True)])
+ StructType([StructField('id', LongType(), False), StructField('is_even', BooleanType(), True)])

======================================================================
FAIL: test_udtf_with_analyze_table_argument_repeating_rows (pyspark.sql.tests.test_udtf.UDTFTests) (query_no=0)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/tests/test_udtf.py", line 1394, in test_udtf_with_analyze_table_argument_repeating_rows
    assertSchemaEqual(df.schema, expected_schema)
  File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/testing/utils.py", line 356, in assertSchemaEqual
    raise PySparkAssertionError(
pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_SCHEMA] Schemas do not match.
--- actual
+++ expected
- StructType([StructField('id', LongType(), False), StructField('is_even', BooleanType(), True)])
+ StructType([StructField('id', LongType(), False)])

======================================================================
FAIL: test_udtf_with_analyze_table_argument_repeating_rows (pyspark.sql.tests.test_udtf.UDTFTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/tests/test_udtf.py", line 1400, in test_udtf_with_analyze_table_argument_repeating_rows
    self.spark.sql(
AssertionError: AnalysisException not raised

======================================================================
FAIL: test_udtf_with_analyze_using_accumulator (pyspark.sql.tests.test_udtf.UDTFTests) (query_no=0)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/tests/test_udtf.py", line 1625, in test_udtf_with_analyze_using_accumulator
    assertSchemaEqual(df.schema, StructType().add("col1", IntegerType()))
  File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/testing/utils.py", line 356, in assertSchemaEqual
    raise PySparkAssertionError(
pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_SCHEMA] Schemas do not match.
--- actual
+++ expected
- StructType([StructField('a', IntegerType(), True), StructField('b', IntegerType(), True)])
+ StructType([StructField('col1', IntegerType(), True)])

======================================================================
FAIL: test_udtf_with_analyze_using_accumulator (pyspark.sql.tests.test_udtf.UDTFTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yangjie01/SourceCode/git/spark-mine-sbt/python/pyspark/sql/tests/test_udtf.py", line 1628, in test_udtf_with_analyze_using_accumulator
    self.assertEqual(test_accum.value, 222)
AssertionError: 111 != 222

----------------------------------------------------------------------
Ran 174 tests in 54.619s

FAILED (failures=34, errors=6, skipped=2)

There are 34 test failures after this one merged.

@utkarsh39 Do you have time to fix these test cases? For this, I have created SPARK-44797.

Or should we revert this PR to restore the Java 17 daily tests first? @HyukjinKwon @ueshin @dongjoon-hyun

I will try to get these tests fixed ASAP

ueshin · 2023-08-14T15:43:35Z

I think #42422 includes the fix. Could you take a look?

ueshin · 2023-08-14T15:59:17Z

I merged #42422. Let's see the next daily tests. Thanks.

LuciferYang · 2023-08-14T16:01:45Z

I merged #42422. Let's see the next daily tests. Thanks.

Thanks ~

ueshin · 2023-08-14T21:38:03Z

core/src/main/scala/org/apache/spark/ContextAwareIterator.scala

 */
 @DeveloperApi
+@deprecated("Only usage for Python evaluation is now extinct", "3.5.0")


@utkarsh39 This should be 4.0.0.

PR to fix it: #42494

LuciferYang · 2023-08-15T00:30:50Z

I merged #42422. Let's see the next daily tests. Thanks.

https://github.com/apache/spark/actions/runs/5861115482/job/15890643041

Still some failed, can't determine the reason for now, further investigation is needed.

ueshin · 2023-08-15T01:04:15Z

@LuciferYang The error is different from the previous one that seems to be fixed.
Is it possible to rerun the test? I guess it's just flaky.

LuciferYang · 2023-08-15T02:33:44Z

@LuciferYang The error is different from the previous one that seems to be fixed. Is it possible to rerun the test? I guess it's just flaky.

re-run the failed ones, there are 26 [Errno 111] Connection refused, not sure this is flaky

LuciferYang · 2023-08-15T04:15:42Z

@LuciferYang The error is different from the previous one that seems to be fixed. Is it possible to rerun the test? I guess it's just flaky.

The test passed after retrying, thanks for your work ~ @ueshin

### What changes were proposed in this pull request? #42385 deprecated `ContextAwareIterator` but the deprecation version was incorrectly set to 3.5. This PR fixes it to be 4.0. ### Why are the changes needed? Fix deprecation version. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not needed. Closes #42494 from utkarsh39/SPARK-44705-fix-deprecation-version. Authored-by: Utkarsh <utkarsh.agarwal@databricks.com> Signed-off-by: Takuya UESHIN <ueshin@databricks.com>

rangadi · 2023-08-16T06:23:55Z

core/src/main/scala/org/apache/spark/api/python/StreamingPythonRunner.scala

+      val workerFactory =
+        new PythonWorkerFactory(pythonExec, workerModule, envVars.asScala.toMap)
+      val (worker: PythonWorker, _) = workerFactory.createSimpleWorker(blockingMode = true)


What is this change about?
It broke stop() method below.
cc: @WweiL, @HyukjinKwon

Yes it breaks the stop() method below. It should be updated to like this:
https://github.com/apache/spark/blob/branch-3.5/core/src/main/scala/org/apache/spark/api/python/StreamingPythonRunner.scala#L109-L113

@utkarsh39 we will create a followup ticket to fix this

Thanks @WweiL

Thanks guys.

### What changes were proposed in this pull request? PythonRunner, a utility that executes Python UDFs in Spark, uses two threads in a producer-consumer model today. This multi-threading model is problematic and confusing as Spark's execution model within a task is commonly understood to be single-threaded. More importantly, this departure of a double-threaded execution resulted in a series of customer issues involving [race conditions](https://issues.apache.org/jira/browse/SPARK-33277) and [deadlocks](https://issues.apache.org/jira/browse/SPARK-38677) between threads as the code was hard to reason about. There have been multiple attempts to reign in these issues, viz., [fix 1](https://issues.apache.org/jira/browse/SPARK-22535), [fix 2](apache#30177), [fix 3](apache@243c321). Moreover, the fixes have made the code base somewhat abstruse by introducing multiple daemon [monitor threads](https://github.com/apache/spark/blob/a3a32912be04d3760cb34eb4b79d6d481bbec502/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala#L579) to detect deadlocks. This PR makes PythonRunner single-threaded making it easier to reason about and improving code health. #### Current Execution Model in Spark for Python UDFs For queries containing Python UDFs, the main Java task thread spins up a new writer thread to pipe data from the child Spark plan into the Python worker evaluating the UDF. The writer thread runs in a tight loop: evaluates the child Spark plan, and feeds the resulting output to the Python worker. The main task thread simultaneously consumes the Python UDF’s output and evaluates the parent Spark plan to produce the final result. The I/O to/from the Python worker uses blocking Java Sockets necessitating the use of two threads, one responsible for input to the Python worker and the other for output. Without two threads, it is easy to run into a deadlock. For example, the task can block forever waiting for the output from the Python worker. The output will never arrive until the input is supplied to the Python worker, which is not possible as the task thread is blocked while waiting on output. #### Proposed Fix The proposed fix is to move to the standard single-threaded execution model within a task, i.e., to do away with the writer thread. In addition to mitigating the crashes, the fix reduces the complexity of the existing code by doing away with many safety checks in place to track deadlocks in the double-threaded execution model. In the new model, the main task thread alternates between consuming/feeding data to the Python worker using asynchronous I/O through Java’s [SocketChannel](https://docs.oracle.com/javase/7/docs/api/java/nio/channels/SocketChannel.html). See the `read()` method in the code below for approximately how this is achieved. ``` case class PythonUDFRunner { private var nextRow: Row = _ private var endOfStream = false private var childHasNext = true private var buffer: ByteBuffer = _ def hasNext(): Boolean = nextRow != null || { if (!endOfStream) { read(buffer) nextRow = deserialize(buffer) hasNext } else { false } } def next(): Row = { if (hasNext) { val outputRow = nextRow nextRow = null outputRow } else { null } } def read(buf: Array[Byte]): Row = { var n = 0 while (n == 0) { // Alternate between reading/writing to the Python worker using async I/O if (pythonWorker.isReadable) { n = pythonWorker.read(buf) } if (pythonWorker.isWritable) { consumeChildPlanAndWriteDataToPythonWorker() } } def consumeChildPlanAndWriteDataToPythonWorker(): Unit = { // Tracks whether the connection to the Python worker can be written to. var socketAcceptsInput = true while (socketAcceptsInput && (childHasNext || buffer.hasRemaining)) { if (!buffer.hasRemaining && childHasNext) { // Consume data from the child and buffer it. writeToBuffer(childPlan.next(), buffer) childHasNext = childPlan.hasNext() if (!childHasNext) { // Exhausted child plan’s output. Write a keyword to the Python worker signaling the end of data input. writeToBuffer(endOfStream) } } // Try to write as much buffered data as possible to the Python worker. while (buffer.hasRemaining && socketAcceptsInput) { val n = writeToPythonWorker(buffer) // `writeToPythonWorker()` returns 0 when the socket cannot accept more data right now. socketAcceptsInput = n > 0 } } } } ``` ### Why are the changes needed? This PR makes PythonRunner single-threaded making it easier to reason about and improving code health. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests. Closes apache#42385 from utkarsh39/SPARK-44705. Authored-by: Utkarsh <utkarsh.agarwal@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

### What changes were proposed in this pull request? apache#42385 deprecated `ContextAwareIterator` but the deprecation version was incorrectly set to 3.5. This PR fixes it to be 4.0. ### Why are the changes needed? Fix deprecation version. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not needed. Closes apache#42494 from utkarsh39/SPARK-44705-fix-deprecation-version. Authored-by: Utkarsh <utkarsh.agarwal@databricks.com> Signed-off-by: Takuya UESHIN <ueshin@databricks.com>

### What changes were proposed in this pull request? PythonRunner, a utility that executes Python UDFs in Spark, uses two threads in a producer-consumer model today. This multi-threading model is problematic and confusing as Spark's execution model within a task is commonly understood to be single-threaded. More importantly, this departure of a double-threaded execution resulted in a series of customer issues involving [race conditions](https://issues.apache.org/jira/browse/SPARK-33277) and [deadlocks](https://issues.apache.org/jira/browse/SPARK-38677) between threads as the code was hard to reason about. There have been multiple attempts to reign in these issues, viz., [fix 1](https://issues.apache.org/jira/browse/SPARK-22535), [fix 2](apache#30177), [fix 3](apache@243c321). Moreover, the fixes have made the code base somewhat abstruse by introducing multiple daemon [monitor threads](https://github.com/apache/spark/blob/a3a32912be04d3760cb34eb4b79d6d481bbec502/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala#L579) to detect deadlocks. This PR makes PythonRunner single-threaded making it easier to reason about and improving code health. #### Current Execution Model in Spark for Python UDFs For queries containing Python UDFs, the main Java task thread spins up a new writer thread to pipe data from the child Spark plan into the Python worker evaluating the UDF. The writer thread runs in a tight loop: evaluates the child Spark plan, and feeds the resulting output to the Python worker. The main task thread simultaneously consumes the Python UDF’s output and evaluates the parent Spark plan to produce the final result. The I/O to/from the Python worker uses blocking Java Sockets necessitating the use of two threads, one responsible for input to the Python worker and the other for output. Without two threads, it is easy to run into a deadlock. For example, the task can block forever waiting for the output from the Python worker. The output will never arrive until the input is supplied to the Python worker, which is not possible as the task thread is blocked while waiting on output. #### Proposed Fix The proposed fix is to move to the standard single-threaded execution model within a task, i.e., to do away with the writer thread. In addition to mitigating the crashes, the fix reduces the complexity of the existing code by doing away with many safety checks in place to track deadlocks in the double-threaded execution model. In the new model, the main task thread alternates between consuming/feeding data to the Python worker using asynchronous I/O through Java’s [SocketChannel](https://docs.oracle.com/javase/7/docs/api/java/nio/channels/SocketChannel.html). See the `read()` method in the code below for approximately how this is achieved. ``` case class PythonUDFRunner { private var nextRow: Row = _ private var endOfStream = false private var childHasNext = true private var buffer: ByteBuffer = _ def hasNext(): Boolean = nextRow != null || { if (!endOfStream) { read(buffer) nextRow = deserialize(buffer) hasNext } else { false } } def next(): Row = { if (hasNext) { val outputRow = nextRow nextRow = null outputRow } else { null } } def read(buf: Array[Byte]): Row = { var n = 0 while (n == 0) { // Alternate between reading/writing to the Python worker using async I/O if (pythonWorker.isReadable) { n = pythonWorker.read(buf) } if (pythonWorker.isWritable) { consumeChildPlanAndWriteDataToPythonWorker() } } def consumeChildPlanAndWriteDataToPythonWorker(): Unit = { // Tracks whether the connection to the Python worker can be written to. var socketAcceptsInput = true while (socketAcceptsInput && (childHasNext || buffer.hasRemaining)) { if (!buffer.hasRemaining && childHasNext) { // Consume data from the child and buffer it. writeToBuffer(childPlan.next(), buffer) childHasNext = childPlan.hasNext() if (!childHasNext) { // Exhausted child plan’s output. Write a keyword to the Python worker signaling the end of data input. writeToBuffer(endOfStream) } } // Try to write as much buffered data as possible to the Python worker. while (buffer.hasRemaining && socketAcceptsInput) { val n = writeToPythonWorker(buffer) // `writeToPythonWorker()` returns 0 when the socket cannot accept more data right now. socketAcceptsInput = n > 0 } } } } ``` ### Why are the changes needed? This PR makes PythonRunner single-threaded making it easier to reason about and improving code health. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests. Closes apache#42385 from utkarsh39/SPARK-44705. Authored-by: Utkarsh <utkarsh.agarwal@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

### What changes were proposed in this pull request? apache#42385 deprecated `ContextAwareIterator` but the deprecation version was incorrectly set to 3.5. This PR fixes it to be 4.0. ### Why are the changes needed? Fix deprecation version. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not needed. Closes apache#42494 from utkarsh39/SPARK-44705-fix-deprecation-version. Authored-by: Utkarsh <utkarsh.agarwal@databricks.com> Signed-off-by: Takuya UESHIN <ueshin@databricks.com>

fix

b438395

github-actions bot added SQL CORE PYTHON labels Aug 8, 2023

HyukjinKwon changed the title ~~[SPARK-44705] Make PythonRunner single-threaded~~ [SPARK-44705][PYTHON] Make PythonRunner single-threaded Aug 8, 2023

HyukjinKwon approved these changes Aug 8, 2023

View reviewed changes

compilation fix

7839713

ueshin reviewed Aug 8, 2023

View reviewed changes

core/src/main/scala/org/apache/spark/ContextAwareIterator.scala Show resolved Hide resolved

utkarsh39 added 2 commits August 8, 2023 17:25

fix scalastyle

a1531b0

fix broadcast test failures

33099a9

dongjoon-hyun reviewed Aug 10, 2023

View reviewed changes

utkarsh39 added 3 commits August 10, 2023 11:07

change deprecation version

db91647

Merge branch 'master' into SPARK-44705

de83c89

Fix streaming tests

5ff35a3

utkarsh39 requested review from ueshin and dongjoon-hyun August 11, 2023 00:30

HyukjinKwon closed this in 8aaff55 Aug 11, 2023

LuciferYang mentioned this pull request Aug 14, 2023

[SPARK-44798][BUILD] Fix Scala 2.13 mima check after SPARK-44705 merged #42479

Closed

ueshin mentioned this pull request Aug 14, 2023

[SPARK-44749][SQL][PYTHON] Support named arguments in Python UDTF #42422

Closed

ueshin reviewed Aug 14, 2023

View reviewed changes

utkarsh39 mentioned this pull request Aug 15, 2023

[SPARK-44705][FOLLOWUP] Fix Deprecation Version of ContetAwareIterator #42494

Closed

rangadi reviewed Aug 16, 2023

View reviewed changes

rangadi mentioned this pull request Aug 16, 2023

[SPARK-44433][PYTHON][CONNECT][SS][FOLLOWUP] Terminate listener process with removeListener and improvements #42283

Closed

razajafri mentioned this pull request Jan 25, 2024

PythonRunner Changes [databricks] NVIDIA/spark-rapids#10274

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-44705][PYTHON] Make PythonRunner single-threaded #42385

[SPARK-44705][PYTHON] Make PythonRunner single-threaded #42385

utkarsh39 commented Aug 8, 2023

zhengruifeng commented Aug 8, 2023 •

edited

Loading

HyukjinKwon left a comment

dongjoon-hyun left a comment

HyukjinKwon commented Aug 11, 2023

LuciferYang commented Aug 14, 2023

LuciferYang commented Aug 14, 2023 •

edited

Loading

utkarsh39 commented Aug 14, 2023

ueshin commented Aug 14, 2023

ueshin commented Aug 14, 2023

LuciferYang commented Aug 14, 2023

ueshin Aug 14, 2023

utkarsh39 Aug 15, 2023

LuciferYang commented Aug 15, 2023

ueshin commented Aug 15, 2023

LuciferYang commented Aug 15, 2023

LuciferYang commented Aug 15, 2023

rangadi Aug 16, 2023

WweiL Aug 16, 2023

WweiL Aug 16, 2023

utkarsh39 Aug 16, 2023

rangadi Aug 16, 2023

[SPARK-44705][PYTHON] Make PythonRunner single-threaded #42385

[SPARK-44705][PYTHON] Make PythonRunner single-threaded #42385

Conversation

utkarsh39 commented Aug 8, 2023

What changes were proposed in this pull request?

Current Execution Model in Spark for Python UDFs

Proposed Fix

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

zhengruifeng commented Aug 8, 2023 • edited Loading

HyukjinKwon left a comment

Choose a reason for hiding this comment

dongjoon-hyun left a comment

Choose a reason for hiding this comment

HyukjinKwon commented Aug 11, 2023

LuciferYang commented Aug 14, 2023

LuciferYang commented Aug 14, 2023 • edited Loading

utkarsh39 commented Aug 14, 2023

ueshin commented Aug 14, 2023

ueshin commented Aug 14, 2023

LuciferYang commented Aug 14, 2023

ueshin Aug 14, 2023

Choose a reason for hiding this comment

utkarsh39 Aug 15, 2023

Choose a reason for hiding this comment

LuciferYang commented Aug 15, 2023

ueshin commented Aug 15, 2023

LuciferYang commented Aug 15, 2023

LuciferYang commented Aug 15, 2023

rangadi Aug 16, 2023

Choose a reason for hiding this comment

WweiL Aug 16, 2023

Choose a reason for hiding this comment

WweiL Aug 16, 2023

Choose a reason for hiding this comment

utkarsh39 Aug 16, 2023

Choose a reason for hiding this comment

rangadi Aug 16, 2023

Choose a reason for hiding this comment

zhengruifeng commented Aug 8, 2023 •

edited

Loading

LuciferYang commented Aug 14, 2023 •

edited

Loading