For example, you would like to create a job that requires a class that is only available in a specific jar file (mssql-jdbc-6.2.2.jre8.jar). This jar file is not included in the default JDBC jar that is installed on the cluster.
The three Parameters listed below are used to load an external jar file.
spark.driver.extraClassPath
spark.yarn.user.classpath.first
spark.executor.extraClassPath
The following curl command submits a Spark application with additional conf “spark.yarn.user.classpath.first” required to force user classpath apart from extraClassPath.
[!Note] Jars can be loaded from Storage accounts its not needed to be on local disks. If you are using multiple jar files then list all the jar file with comma-separated.
curl -k --user "admin:Pass@word1" -v -H 'Content-Type: application/json' -X POST -d '{ "file":"wasbs:///sparkhbase/sparkhbase.jar", "className":"HBaseTest", "jars":["wasbs:///sparkhbase/mssql-jdbc-6.2.2.jre8.jar"],"conf":{ "spark.driver.extraClassPath":"wasbs:///sparkhbase/mssql-jdbc-6.2.2.jre8.jar","spark.yarn.user.classpath.first":"true"} }' "https://{clustername}.azurehdinsight.net/livy/batches"
The following spark-submit
command is equivalent to the curl
command used previously:
spark-submit --class HBaseTest --conf spark.yarn.user.classpath.first=true --conf spark.yarn.submit.waitAppCompletion=false --conf spark.jars=wasbs:///sparkhbase/mssql-jdbc-6.2.2.jre8.jar --conf spark.master=yarn-cluster --conf spark.driver.extraClassPath=wasbs:///sparkhbase/mssql-jdbc-6.2.2.jre8.jar wasbs:///sparkhbase/sparkhbase.jar
The same parameters can be forwarded from ADF using the typeProperties, If your application requires multiple jar files you have an option to drop all the jar file under jars folder when using from ADF.
"typeProperties": {
"rootPath": "sparktestadhoc/JarRoot",
"entryFilePath": "sparkhbase.jar",
"className": "HBaseTest",
"sparkConfig": {
"spark.driver.extraClassPath": "wasbs:///JarRoot/Jars",
"spark.yarn.user.classpath.first": "true"
}
For more information, see Spark runtime environment configuration properties.