Using the Hadoop HDFS Command Line

Introduction

The Hadoop File System (HDFS) includes an easy to use command line interface which you can use to manage the distributed file system.You will need access to a working Hadoop cluster in you want to follow along. If you don’t have direct access to a cluster you can create a local Hortonworks or Cloudera single-node cluster, as specified in my earlier post: getting started with Enterprise  Hadoop

Logging in to the cluster

Since we will be using the command line interface here, we need to be able to log into our cluster. In my case I am using the Hortonworks Sandbox VM, running in VirtualBox. On my local machine, I created the hostname hw_sandbox by adding the following entry to my C:\Windows\System32\drivers\etc\hosts file:

The generic syntax for using ssh is as follows:

This is what I use to log into my Hortonworks sandbox:

Note that, unlike me, you should not be using the root user  name if you are on a real cluster! Winking smile

Generic Command Structure

When you look for the right command line keywords you us, you might get different answers. Three different command lines as supported. They appear to be very similar but they have minute differences:

  1. hadoop fs <arguments>
  2. hadoop dfs <arguments>
  3. hdfs dfs <arguments>

Let’s take a look at the command structure for each one. First the hadoop fs command line:


FS relates to any generic file system which could be local, HDFS, DFTP, S3 FS etc.. Most of the Hadoop distributions are configured to default to HDFS  if you specify hadoop fs. You can explicitly specify HDFS in the hadoop command line, like so:


This explicitly specifies HDFS, and would work for any HDFS operations.However, this command has been deprecated. You should use the hdfs dfs syntax instead, as is shown below:


This syntax works for all operations against HDFS in it is the recommend command to use instead of hadoop fs or hadoop dfs. Indeed, if you use the hadoop dfs syntax, it will locate hdfs and delegate the command to hdfs dfs.

Creating a directory in HDFS

The syntax for creating a directory is shown below:

You can specify a path using two different syntax notations, as is show below:

  • You can use a standard local path (for example /user/bennie)
  • You can use the hdfs:// URL syntax (for example hdfs://sandbox.hortonworks.com/user/bennie).

Note: depending on the configuration of your cluster, you might have to specify a port in addition to the host name.

Below is an example of these commands running in the Hortonworks sandbox bash shell:

Uploading a file and Listing Contents

Let’s create a local file, we’ll use echo in this case:


You have a variety of commands available to upload a file to HDFS, I am using the put command here:


Next, we can list the contents of our directory with the ls command. Again, we can use the standard file system notation,  or we can use the hdfs:// notation, as is shown below:


Below is an example of these commands running in the shell:

Downloading a File from HDFS

We can use the get command to download a file from HDFS to our local file system:


Below is an example of these command running in the shell:

1 Comment

Leave a Reply

Your email address will not be published. Required fields are marked *