Hadoop has it own distributed file system called HDFS which can be accessed with the hadoop utility. The command to access the HDFS is the file system user client command “fs”
Type “hadoop fs” on the command line to get a bunch of generic and command options supported by this command. Here are a few steps to upload a file, run some MapReduce code on it and download the results from the HDFS
Type hadoop fs -ls to get a listing of your default directory on HDFS. It should be /user/<username>
Create a input directory in your default HDFS directory by using
“hdfs fs -mkdir grep_input”
Upload a file to the input directory. I selected the MapReduce HTML tutorial file which is in the Hadoop distribution. You can select anything you want from your local file system
hadoop fs -put /opt/hadoop/docs/mapred_tutorial.html grep_input
Check with:
hadoop fs -ls grep_input
You can download the file from hdfs onto your local system by using the get command. Assuming ou want to download to current directory.
hadoop fs -get grep_input/mapred_tutorial.html .
Check with:
ls -lt
In the next posting we will discuss how to run standard MapReduce programs distributed with the standard Hadoop installation and also discuss job management and tracking.