Local filesystem vs HDFS
HDFS and Linux commands have a lot in common. If you are familiar with Linux commands, HDFS commands will be easy to grasp. We will see some of the well known commands to work with your local filesystem in linux and HDFS, such as mkdir to create a directory, cp to copy, ls to list the contents of a directory, etc.
If not already done, we first need to connect to the main node of our cluster.
All HDFS commands start with hadoop fs. Regular ls command on root directory will bring the files from root directory in the local file sytem. hadoop fs -ls / list the files from the root directory in HDFS.
In the terminal, type in both commands and see what happens:
ls /
hadoop fs -ls /
While the normal ls command lists all directory and files from your local filesystem, HDFS gives an overview of the directory and files stored in the Hadoop cluster across all nodes.
Now let's download two files to the home directory in the local filesystem using the following commands:
wget https://s3-us-west-1.amazonaws.com/julienheck/hadoop/datasets/ml-100k/u.data
wget https://s3-us-west-1.amazonaws.com/julienheck/hadoop/datasets/ml-100k/u.item
Note that the home directory in your local file sytem will be /home/<username> whereas the home directory in HDFS will be /user/<username>.
Since we are logged in as ubuntu user, the files will be saved under/home/ubuntu in the local filesystem.
Type ls to check the files have been downloaded successfully.
Let's connect to another node of our cluster while keeping our initial connection open. Type in the ls commands again:
ls
hadoop fs -ls /
The /home/ubuntu directory is empty, there is no trace of the files we just downloaded on the local filesystem (of the first node). On the other hand, the output of the HDFS command is the same on both nodes.
No matter what nodes you are connected to in your cluster, the view of your HDFS filesystem will be exactly the same. HDFS gives a global overview of your cluster.
Let's go back to our first node and create a new directory in HDFS using the following command:
hadoop fs -mkdir myNewDir
Note that this new folder will be created in the home directory of the ubuntu user in HDFS (not in the local filesystem): /user/ubuntu
We can check that the new directory has been created successfully with:
hadoop fs -ls
If we go back to the second node, we can see the newly created directory as well:
If we try to look for this newly created directory in our local filesystem (on either node), we won't be able to find it since this directory has been created in HDFS.