hdinsight.github.io

How do I access local HDFS from inside HDInsight cluster?

Issue:

Need to access local HDFS instead of WASB or ADLS from inside HDInsight cluster.

Resolution Steps:

  1. From command line use hdfs dfs -D "fs.default.name=hdfs://mycluster/" ... literally as in the following command:
hdiuser@hn0-spark2:~$ hdfs dfs -D "fs.default.name=hdfs://mycluster/" -ls /
Found 3 items
drwxr-xr-x   - hdiuser hdfs          0 2017-03-24 14:12 /EventCheckpoint-30-8-24-11102016-01
drwx-wx-wx   - hive    hdfs          0 2016-11-10 18:42 /tmp
drwx------   - hdiuser hdfs          0 2016-11-10 22:22 /user
  1. From source code use the URI hdfs://mycluster/ literally as in the following sample application:
import java.io.IOException;
import java.net.URI;
import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;

public class JavaUnitTests {

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();

        String hdfsUri = "hdfs://mycluster/";

        conf.set("fs.defaultFS", hdfsUri);

        FileSystem fileSystem = FileSystem.get(URI.create(hdfsUri), conf);

        RemoteIterator<LocatedFileStatus> fileStatusIterator = fileSystem.listFiles(new Path("/tmp"), true);

        while(fileStatusIterator.hasNext()) {

            System.out.println(fileStatusIterator.next().getPath().toString());
        }
    }
}

Run the compiled JAR (for example named java-unit-tests-1.0.jar) on HDInsight cluster with the following command:

hdiuser@hn0-spark2:~$ hadoop jar java-unit-tests-1.0.jar JavaUnitTests
hdfs://mycluster/tmp/hive/hive/5d9cf301-2503-48c7-9963-923fb5ef79a7/inuse.info
hdfs://mycluster/tmp/hive/hive/5d9cf301-2503-48c7-9963-923fb5ef79a7/inuse.lck
hdfs://mycluster/tmp/hive/hive/a0be04ea-ae01-4cc4-b56d-f263baf2e314/inuse.info
hdfs://mycluster/tmp/hive/hive/a0be04ea-ae01-4cc4-b56d-f263baf2e314/inuse.lck