
How do I configure spark-shell on HDInsight clusters?

Issue: Need to configure the amount of memory and number of cores that spark-shell can use.

Resolution Steps:

  1. Refer to the topic Why did my Spark application fail with OutOfMemoryError? to determine which Spark configurations need to be set and to what values.

  2. Launch spark-shell with a command similar to the following (change the actual value of the configurations as applicable):

spark-shell --num-executors 4 --executor-memory 4g --executor-cores 2 --driver-memory 8g --driver-cores 4

Further Reading:

Spark job submission on HDInsight clusters