Skip to content
This repository has been archived by the owner on Dec 20, 2022. It is now read-only.

Configuration Properties

Peter Rudenko edited this page Oct 3, 2018 · 14 revisions

SparkRDMA has several runtime properties that can be set along side other Spark properties:

General Properties

Property Name Default/Min/Max Description
spark.shuffle.rdma.driverPort random/1025/65535 Port the RDMA driver instance will listen on.
spark.shuffle.rdma.executorPort random/1025/65535 Port the RDMA executor instances will listen on.
spark.shuffle.rdma.portMaxRetries 16/1/1000 Maximum number of attempts to bind to an RDMA port before failing. Each retry will increment the previously attempted port number by 1. This value applies to both the RDMA driverPort and RDMA executorPort.
spark.shuffle.rdma.cpuList All CPUs/--/-- The list of CPUs that should be used by the RDMA services for thread creation and event processing. It is recommended to only use the CPU cores associated with the NUMA node that the Mellanox NIC is attached to. The parameter should be specified as a comma separated list, but can also take a hyphenated range. Invalid syntax will result in reverting to the default value. Examples: 1,3,5 or 1-5, or 1-4,10-12
spark.shuffle.rdma.useOdp false On-Demand-Paging (ODP) is a technique to ease the memory registration. Applications do not need to pin down the underlying physical pages of the address space, and track the validity of the mappings. Rather, the HCA (Host Channel Adapter) requests the latest translations from the OS when pages are not present, and the OS invalidates translations which are no longer valid due to either non-present pages or mapping changes. See more...
spark.shuffle.rdma.collectOdpStats true Collect and report ODP statistics
spark.shuffle.rdma.device.num 0 Device number to get ODP stats from sysfs (/sys/class/infiniband_verbs/uverbs$DEVICE_NUMBER/) (only if spark.shuffle.rdma.useOdp=true and spark.shuffle.rdma.collectOdpStats=true)
spark.shuffle.rdma.preAllocateBuffers Comma separated list of buffer size : buffer count pairs. E.g. 4k:1000,16k:500

RDMA Queue Pair (QP) Properties

Property Name Default/Min/Max Description
spark.shuffle.rdma.recvQueueDepth 1024/256/65535 The maximum number of outstanding receive work requests that can be posted to the QP.
spark.shuffle.rdma.sendQueueDepth 4096/256/65535 The maximum number of outstanding send work requests that can be posted to the QP.
spark.shuffle.rdma.recvWrSize 4k/2k/1m The size (in bytes) of the buffers used to receive data from a SEND operation.

RDMA Connection Management Properties

Property Name Default/Min/Max Description
spark.shuffle.rdma.rdmaCmEventTimeout 20000/-1/60000 The amount of time to wait (in milliseconds) for RDMA CM events before failing. A value of -1 means to wait forever.
spark.shuffle.rdma.teardownListenTimeout 50/-1/60000 The amount of time to wait (in milliseconds) for RDMA disconnect events before failing. A value of -1 means to wait forever.
spark.shuffle.rdma.resolvePathTimeout 2000/-1/60000 The amount of time to wait (in milliseconds) for RDMA resolve address and resolve route events before failing. A value of -1 means to wait forever.
spark.shuffle.rdma.maxConnectionAttempts 5/1/100 Maximum attempts to set up remote connections before failing a task

Shuffle Writer (Mapper) Properties

Property Name Default/Min/Max Description
spark.shuffle.rdma.shuffleWriteBlockSize 8m/4k/512m The storage block size used for the shuffle writer. When using "ChunkedPartitionAgg" writer method, it's the size of each memory buffer used to store ShuffleWrite data. In “Wrapper” mode, it's the size of each file mapping – e.g. a 120MB file is broken down into 8MB sized file mappings.

Shuffle Reader (Reducer) Properties

Property Name Default/Min/Max Description
spark.shuffle.rdma.shuffleReadBlockSize 256k/0/512m The transfer size to be used for block fetches on shuffle read operations. The SparkRDMA layer will aggregate the blocks into a single buffer until it reaches this size. When set to "0", no aggregation will be performed on the reader side.
spark.shuffle.rdma.maxBytesInFlight 64m/128k/100g Maximum bytes that shuffle read operations will attempt to fetch at any given moment. If this threshold is reached, then fetches will resume only once outstanding requests complete.
spark.shuffle.rdma.
partitionLocationFetchTimeout
30000/1000/MAX_INT The amount of time to wait (in milliseconds) for fetching Shuffle metadata