The SparkConfActor is for updating Spark configurations at runtime. If the change is on the global Spark-Session, then it will change the behavior of all actors (from all jobs & actions) in a pipeline. One good practice is to make a data process pipeline into multiple work-units, and each of them is defined as a job in the pipeline (also turn off the singleSparkSession at the pipeline level), so in such case, any changes to the Spark-Conf in one job only affect the behavior of the actors associated with the job.

the configs property is for specifying key-value pairs for the Spark configurations
the hadoopConfigs property is for specifying key-value pairs for the Hadoop configurations of the Spark-Context.

Actor Class: com.qwshen.etl.common.SparkConfActor

The definition of the SparkConfActor:

in YAML

  actor:
    type: com.qwshen.etl.common.SparkConfActor
    properties:
      configs:
        spark.sql.shuffle.partitions: 300
        spark.files.overwrite: true
        spark.sql.adaptive.enabled: true
      hadoopConfigs:
        mapreduce.fileoutputcommitter.marksuccessfuljobs: false

in JSON

  {
    "actor": {
      "type": "com.qwshen.etl.common.SparkConfActor",
      "properties": {
        "configs": {
          "spark.sql.shuffle.partitions": 300,
          "spark.files.overwrite": true,
          "spark.sql.adaptive.enabled": true
        },
        "hadoopConfigs": {
          "mapreduce.fileoutputcommitter.marksuccessfuljobs": false
        }
      }
    }
  }

in XML

  <actor type="com.qwshen.etl.common.SparkConfActor">
    <properties>
      <configs>
        <spark.sql.shuffle.partitions>300</spark.sql.shuffle.partitions>
        <spark.files.overwrite>true</spark.files.overwrite>
        <spark.sql.adaptive.enabled>true</spark.sql.adaptive.enabled>
      </configs>
      <hadoopConfigs>
        <mapreduce.fileoutputcommitter.marksuccessfuljobs>false</mapreduce.fileoutputcommitter.marksuccessfuljobs>
      </hadoopConfigs>
    </properties>
  </actor>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-conf-actor.md

spark-conf-actor.md

Files

spark-conf-actor.md

Latest commit

History

spark-conf-actor.md

File metadata and controls