Replication factor in HDFS
We have seen that HDFS supports block replication to provide fault tolerance and availability. By default HDFS replicates a block to 3 nodes.
Open the terminal, connect to the main node and list the content of the myNewDir2 directory using:
hadoop fs -ls myNewDir2
The number 3 after the right permissions is actually the replication factor for the file u.data. It means that each block of this file will be replicated 3 times in the cluster.
We can change the replication factor using the dfs.replication property. In the terminal, type in the following command:
hadoop fs -Ddfs.replication=2 -cp myNewDir2/u.data myNewDir/u.data.rep2
hadoop fs -ls myNewDir