hdinsight.github.io

Unable to launch Spark Application as a part of oozie workflow using Oozie Shell Action.

Scenario:

Spark Applications that are part of oozie workflow launched using Oozie Shell Action will fail with one of the following expections on Spark 2.1+ clusters and without spark 1.6. Same Spark Application would complete successfully when launched using the spark-submit.

Exceptions: One of these exception will be shown.

Analyze the logs to find Oozie was trying to launch Spark 1.6 jars instead of spark 2.1 jars. Following line is from the YARN logs under the “directory.info” section:

lrwxrwxrwx 1 yarn hadoop  118 Sep 13 15:36 __spark__.jar -> //mnt/resource/hadoop/yarn/local/usercache/yarn/filecache/12/spark-assembly-1.6.3.2.6.1.10-4-hadoop2.7.3.2.6.1.10-4.jar

Oozie triggers the shell actions on anyone of the worker nodes and it appears that when Oozie ran “spark-submit”, it linked the Spark 1.6.3 jar. Spark Applications submitted using Oozie shell action does not honor SPARK_HOME=/usr/hdp/current/spark2-client and SPARK_MAJOR_VERSION=2 environment variables.

Keep the workflow self-contained so include the full path for the spark-submit and specify the Spark version used

<exec>/usr/hdp/current/spark2-client/bin/spark-submit</exec>

Instead of

<exec>$SPARK_HOME/bin/spark-submit</exec>

And

<env-var>SPARK_MAJOR_VERSION=2</env-var>