Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Thursday, November 09, 2017

Diagnostics: Fix Under replicated blocks [Ambari Dashboard]


I see below in Ambari dashboard under HSDS Summary.







Get the full details of the files which are causing your problem

[hdfs@te1-hdp-rp-nn01 root]$ hdfs fsck / -files -blocks -locations

...
...
user/zeppelin/.sparkStaging/application_1503219907931_0037/pyspark.zip 455033 bytes, 1 block(s):  OK
0. BP-1135333773-192.168.44.133-1501079051032:blk_1073843531_102745 len=455033 repl=3 [DatanodeInfoWithStorage[192.168.44.137:50010,DS-6cee9ae8-5113-40df-bd7c-982974892993,DISK], DatanodeInfoWithStorage[192.168.44.135:50010,DS-e9e9bfd5-f4b5-4829-b6d6-b54065b25275,DISK], DatanodeInfoWithStorage[192.168.44.136:50010,DS-5a3944bc-417f-4fc9-8f51-189aba424bc0,DISK]]

/user/zeppelin/.sparkStaging/application_1503219907931_0037/sparkr.zip 682117 bytes, 1 block(s):  OK
0. BP-1135333773-192.168.44.133-1501079051032:blk_1073843533_102747 len=682117 repl=3 [DatanodeInfoWithStorage[192.168.44.137:50010,DS-6cee9ae8-5113-40df-bd7c-982974892993,DISK], DatanodeInfoWithStorage[192.168.44.135:50010,DS-e9e9bfd5-f4b5-4829-b6d6-b54065b25275,DISK], DatanodeInfoWithStorage[192.168.44.136:50010,DS-5a3944bc-417f-4fc9-8f51-189aba424bc0,DISK]]

/user/zeppelin/test <dir>
Status: HEALTHY
 Total size:    3093146787 B (Total open files size: 742 B)
 Total dirs:    469
 Total files:   22098
 Total symlinks:                0 (Files currently being written: 6)
 Total blocks (validated):      22095 (avg. block size 139993 B) (Total open file blocks (not validated): 5)
 Minimally replicated blocks:   22095 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       9 (0.040733196 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0
 Corrupt blocks:                0
 Missing replicas:              63 (0.09495388 %)
 Number of data-nodes:          3
 Number of racks:               1
FSCK ended at Thu Nov 09 12:14:57 AST 2017 in 1186 milliseconds


Regarding under replicated blocks, HDFS is suppose to recover them automatically (by creating missing copies to fulfill the replication factor). If after a few days it doesn't, you can trigger the recovery by running the below.


[hdfs@te1-hdp-rp-nn01 root]$ hdfs fsck / | grep 'Under replicated'
Connecting to namenode via http://te1-hdp-rp-nn01:50070/fsck?ugi=hdfs&path=%2F
/user/hive/.staging/job_1501079109321_0010/job.jar:  Under replicated BP-1135333773-192.168.44.133-1501079051032:blk_1073741932_1108. Target Replicas is 10 but found 3 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).


[hdfs@te1-hdp-rp-nn01 root]$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}'
Connecting to namenode via http://te1-hdp-rp-nn01:50070/fsck?ugi=hdfs&path=%2F
/user/hive/.staging/job_1501079109321_0010/job.jar
/user/hive/.staging/job_1501079109321_0010/job.split
/user/hive/.staging/job_1501079109321_0010/libjars/hive-hcatalog-core.jar
/user/hive/.staging/job_1501079109321_0011/job.jar
/user/hive/.staging/job_1501079109321_0011/job.split
/user/hive/.staging/job_1501079109321_0011/libjars/hive-hcatalog-core.jar
/user/hive/.staging/job_1501079109321_0018/job.jar
/user/hive/.staging/job_1501079109321_0018/job.split
/user/hive/.staging/job_1501079109321_0018/libjars/hive-hcatalog-core.jar


[hdfs@te1-hdp-rp-nn01 root]$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files

[hdfs@te1-hdp-rp-nn01 root]$ for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ;  hadoop fs -setrep 3 $hdfsfile; done

Fixing /user/hive/.staging/job_1501079109321_0010/job.jar :
Replication 3 set: /user/hive/.staging/job_1501079109321_0010/job.jar
Fixing /user/hive/.staging/job_1501079109321_0010/job.split :
Replication 3 set: /user/hive/.staging/job_1501079109321_0010/job.split
Fixing /user/hive/.staging/job_1501079109321_0010/libjars/hive-hcatalog-core.jar :
Replication 3 set: /user/hive/.staging/job_1501079109321_0010/libjars/hive-hcatalog-core.jar
Fixing /user/hive/.staging/job_1501079109321_0011/job.jar :
Replication 3 set: /user/hive/.staging/job_1501079109321_0011/job.jar
Fixing /user/hive/.staging/job_1501079109321_0011/job.split :
Replication 3 set: /user/hive/.staging/job_1501079109321_0011/job.split
Fixing /user/hive/.staging/job_1501079109321_0011/libjars/hive-hcatalog-core.jar :
Replication 3 set: /user/hive/.staging/job_1501079109321_0011/libjars/hive-hcatalog-core.jar
Fixing /user/hive/.staging/job_1501079109321_0018/job.jar :
Replication 3 set: /user/hive/.staging/job_1501079109321_0018/job.jar
Fixing /user/hive/.staging/job_1501079109321_0018/job.split :
Replication 3 set: /user/hive/.staging/job_1501079109321_0018/job.split
Fixing /user/hive/.staging/job_1501079109321_0018/libjars/hive-hcatalog-core.jar :
Replication 3 set: /user/hive/.staging/job_1501079109321_0018/libjars/hive-hcatalog-core.jar

No comments: