-
Notifications
You must be signed in to change notification settings - Fork 29
Configuration
Ondřej Moravčík edited this page May 4, 2015
·
16 revisions
There are 3 way to configure ruby-spark and spark. Configuration can be changed before creating context. After that config is read-only.
SPARK_RUBY_SERIALIZER="oj" ruby-spark shell
Content of property file:
# This is just a comment
spark.ruby.serializer oj
spark.ruby.batch_size 4096
spark.ruby.executor.options -W1
# For shell
ruby-spark shell --properties-file conf.conf
# In ruby
Spark.config.from_file(FILE_PATH)
This muts be done before starting.
Spark.config do
set_app_name "RubySpark"
set_master "local[*]"
set 'spark.ruby.batch_size', 100
set 'spark.ruby.serializer', 'oj'
end
Spark.config.set('spark.ruby.batch_size', 100)
$sc.parallelize(1..10, 3, serializer: "oj")
Key | Default value | Description |
---|---|---|
spark.ruby.parallelize_strategy | inplace |
What happen with Array during parallelize method
inplace: an array is converted using choosen serializer deep_copy: and array is cloned first to prevent changin the original data |
spark.ruby.serializer | marshal |
Default serializer
marshal: ruby's default (slowest) oj: faster than marshal but doesn't work on jruby message_pack: fastest but cannot serialize large numbers and some objects |
spark.ruby.batch_size | 1024 | Number of items which will be serialized and send as one item |
spark.ruby.worker.type | process |
Type of workers.
process: new workers are created by fork function thread: worker is represented as thread |
spark.ruby.executor.uri | Where is located ruby-spark gem. Is field is empty - executor will use installed gem. | |
spark.ruby.executor.command | %s |
Command template for ruby script execution.
Can be useful if you are using some ruby version manager.
Rbenv: bash --norc -i -c "export HOME=/home/user; cd; source .bashrc; %s" Template must contain '%s' which represent origin ruby command. |
spark.ruby.executor.options | Ruby options for scripts. | |
spark.ruby.executor.env.[VARIABLE] | Environment variables for ruby scripts. |