Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Friday, June 23, 2017

Recover the deleted file/folder in HDFS


By default Hadoop deletes the files/directory permanently but sometimes they are deleted accidentally and you want to get them back. You have to enable Trash feature for this purpose. There are two properties (fs.trash.interval & fs.trash.checkpoint.interval) to be set in core-site.xml to move the deleted files and directories in .Trash folder which is located in HDFS /user/$USER/.Trash.
Configuration file : core-site.xml


[hdpsysuser@hdpmaster hadoop]$ vi core-site.xml

<configuration>
        <property>
                        <name>fs.defaultFS</name>
                        <value>hdfs://hdpmaster:9000/</value>
        </property>


<property>
<name>fs.trash.interval</name>
<value>1440</value> 
</property>

<property>
<name>fs.trash.checkpoint.interval</name>
<value>60</value>
</property>



</configuration>


fs.trash.interval will cause all the deleted files and directories to move in .Trash folder and keep the data for 24 hours.
fs.trash.checkpoint.interval will cause to checkpoint intervel to happen for every 60 min and deletes all the files and directories which are more than 24 hours old in .Trash folder.

After configuration changes, restart hadoop services to take effect.

[hdpsysuser@hdpmaster hadoop]$ stop-yarn.sh
stopping yarn daemons
stopping resourcemanager
nn01: stopping nodemanager
nn01: stopping nodemanager
nn01: stopping nodemanager
no proxyserver to stop
[hdpsysuser@hdpmaster hadoop]$ stop-dfs.sh
Stopping namenodes on [nn01]
hdpmaster: stopping namenode
dn03: stopping datanode
dn02: stopping datanode
dn01: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode

After restart hadoop services, delete some file

[hdpsysuser@nn01 ~]$ hdfs dfs -rm /userdata/inam/test2.txt
17/05/07 13:57:46 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 120 minutes, Emptier interval = 45 minutes.
Moved: 'hdfs://nn01:9000/userdata/inam/test2.txt' to trash at: hdfs://nn01:9000/user/hdpsysuser/.Trash/Current

You clearly get message where file has been moved.

List the files in .Trash

[hdpsysuser@nn01 ~]$ hdfs dfs -ls /user/hdpsysuser/.Trash/Current/userdata/inam
Found 1 items
-rw-r--r--   3 hdpsysuser supergroup         24 2017-04-19 12:23 /user/hdpsysuser/.Trash/Current/userdata/inam/test2.txt

Restore file to original location

[hdpsysuser@nn01 ~]$ hdfs dfs -mv /user/hdpsysuser/.Trash/Current/userdata/inam/test2.txt /userdata/inam


No comments: