Accessing the Hadoop filesystem (HDFS)

Hadoop has it own distributed file system called HDFS which can be accessed with the hadoop utility. The command to access the HDFS is the file system user client command “fs”

Type “hadoop fs” on the command line to get a bunch of generic and command options supported by this command. Here are a few steps to upload a file, run some MapReduce code on it and download the results from the HDFS

Type hadoop fs -ls to get a listing of your default directory on HDFS. It should be  /user/<username>

Create a input  directory in your default HDFS directory by using

“hdfs fs -mkdir grep_input”

Upload a file to the input directory. I selected the MapReduce HTML tutorial file which is in the Hadoop distribution. You can select anything you want from your local file system

hadoop fs -put /opt/hadoop/docs/mapred_tutorial.html grep_input

Check with:

hadoop fs -ls grep_input

You can download the file from hdfs onto your local system by using the get command. Assuming ou want to download to current directory.

hadoop fs -get grep_input/mapred_tutorial.html .

Check with:

ls -lt

In the next posting we will discuss how to run standard MapReduce programs  distributed with the standard Hadoop installation and also discuss job management and tracking.

This entry was posted in User Guide. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s