hdinsight.github.io

How do I configure Spark application through spark-submit on HDInsight clusters?

Issue:

Need to configure at submit time through spark-submit, the amount of memory and number of cores that a Spark application can use on HDInsight clusters.

  1. Refer to the topic Why did my Spark application fail with OutOfMemoryError? to determine which Spark configurations need to be set and to what values.

  2. Launch spark-shell with a command similar to the following (change the actual value of the configurations as applicable):

spark-submit --master yarn-cluster --class com.microsoft.spark.application --num-executors 4 --executor-memory 4g --executor-cores 2 --driver-memory 8g --driver-cores 4 /home/user/spark/sparkapplication.jar

Further Reading:

Spark job submission on HDInsight clusters