hdinsight.github.io

How do I configure Spark application through LIVY on HDInsight clusters?

Issue:

Need to configure at submit time through LIVY, the amount of memory and number of cores that a Spark application can use on HDInsight clusters.

  1. Refer to the topic Why did my Spark application fail with OutOfMemoryError? to determine which Spark configurations need to be set and to what values.

  2. Submit the Spark application to LIVY using a REST client like CURL with a command similar to the following (change the actual values as applicable):

curl -k --user 'username:password' -v -H 'Content-Type: application/json' -X POST -d '{ "file":"wasb://container@storageaccountname.blob.core.windows.net/example/jars/sparkapplication.jar", "className":"com.microsoft.spark.application", "numExecutors":4, "executorMemory":"4g", "executorCores":2, "driverMemory":"8g", "driverCores":4}'  

Further Reading:

Spark job submission on HDInsight clusters