Skip to content

Latest commit

 

History

History
56 lines (53 loc) · 2 KB

spark-conf-actor.md

File metadata and controls

56 lines (53 loc) · 2 KB

The SparkConfActor is for updating Spark configurations at runtime. If the change is on the global Spark-Session, then it will change the behavior of all actors (from all jobs & actions) in a pipeline. One good practice is to make a data process pipeline into multiple work-units, and each of them is defined as a job in the pipeline (also turn off the singleSparkSession at the pipeline level), so in such case, any changes to the Spark-Conf in one job only affect the behavior of the actors associated with the job.

  • the configs property is for specifying key-value pairs for the Spark configurations
  • the hadoopConfigs property is for specifying key-value pairs for the Hadoop configurations of the Spark-Context.

Actor Class: com.qwshen.etl.common.SparkConfActor

The definition of the SparkConfActor:

  • in YAML
  actor:
    type: com.qwshen.etl.common.SparkConfActor
    properties:
      configs:
        spark.sql.shuffle.partitions: 300
        spark.files.overwrite: true
        spark.sql.adaptive.enabled: true
      hadoopConfigs:
        mapreduce.fileoutputcommitter.marksuccessfuljobs: false
  • in JSON
  {
    "actor": {
      "type": "com.qwshen.etl.common.SparkConfActor",
      "properties": {
        "configs": {
          "spark.sql.shuffle.partitions": 300,
          "spark.files.overwrite": true,
          "spark.sql.adaptive.enabled": true
        },
        "hadoopConfigs": {
          "mapreduce.fileoutputcommitter.marksuccessfuljobs": false
        }
      }
    }
  }
  • in XML
  <actor type="com.qwshen.etl.common.SparkConfActor">
    <properties>
      <configs>
        <spark.sql.shuffle.partitions>300</spark.sql.shuffle.partitions>
        <spark.files.overwrite>true</spark.files.overwrite>
        <spark.sql.adaptive.enabled>true</spark.sql.adaptive.enabled>
      </configs>
      <hadoopConfigs>
        <mapreduce.fileoutputcommitter.marksuccessfuljobs>false</mapreduce.fileoutputcommitter.marksuccessfuljobs>
      </hadoopConfigs>
    </properties>
  </actor>