Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Monday, December 25, 2017

Configuring NFS Gateway for HDFS [HDP]



The NFS Gateway for HDFS allows clients to mount HDFS and interact with it through NFS, as if it were part of their local file system. The gateway supports NFSv3.
After mounting HDFS, a user can:


• Browse the HDFS file system through their local file system on NFSv3 client-compatible
operating systems.

• Upload and download files between the HDFS file system and their local file system.

• Stream data directly to HDFS through the mount point.

Prerequisites
Hadoop Client already installed

Configure the HDFS NFS Gateway

1- Configuration on NN

The user running the NFS gateway must be able to proxy all users that are using NFS mounts. For example, if user "hdpclient" is running the gateway and users belong to groups "nfsgrp1" and "nfsgrp2", then set the following values in the core-site.xml file on the NameNode. on HDP you can have all these configuration files in /etc/hadoop/conf/.

In Ambari you can use custom core-site link in Advance tab of HDFS Config, where these properties are added as key,value pair.


<property>
<name>hadoop.proxyuser.hdpclient.groups</name>
<value>nfsgrp1,nfsgrp2</value>
<description>
The 'hdpclient' user is allowed to proxy all members of the 'nfsgrp1' and 'nfsgrp2' groups. Set this to '*' to allow hdpclient user to proxy any group.
</description>
</property>
<property>
<name>hadoop.proxyuser.hdpclient.hosts</name>
<value>en01</value>
<description>
This is the host where the nfs gateway is running. Set this to '*' to allow requests from any hosts to be proxied.
</description>
</property>

The preceding properties are the only required configuration settings for the NFS gateway in non-secure mode. Above change in amabri will require to restart all affected services. It will make change in core-site.xml located at /etc/hadoop/2.6.1.0-129/0 on Name Node.

Keep in mind, you need to set the dfs.namenode.accesstime.precision value to 3600000 also in name node configuration (use Ambari UI) otherwise you will get the below error.

[oracle@te1-hdp-rp-en01 oraclenfs]$ cp -p /data/mydata/emp.csv .
cp: cannot create regular file ‘./emp.csv’: Input/output error

2- Configuration on HDFS NFS gateway (Edge Node)

The NFS gateway uses the same settings that are used by the NameNode and DataNode. 
Configure the following properties based on your application's requirements:

a) Edit the hdfs-site.xml file on your NFS gateway machine. Modify the following property:

<property>
<name>dfs.namenode.accesstime.precision</name>
<value>3600000</value>
<description>
The access time for HDFS file is precise up to this value. The default value is 1 hour. Setting a value of 0 disables access times for HDFS.
</description>
</property>

[I Copied hdfs-site.xml from data node (/etc/hadoop/conf) to Edge Node (en01) /usr/hadoopsw/hadoop2.7.3, you should change as per your environment]


b) Add the following property to the hdfs-site.xml file:

<property>
<name>dfs.nfs3.dump.dir</name>
<value>/tmp/.hdfs-nfs</value>
</property>

The NFS client often reorders writes. Sequential writes can arrive at the NFS gateway at random order. This directory is used to temporarily save out-of-order writes before writing to HDFS. One needs to make sure the directory has enough space. For example, if the application uploads 10 files with each having 100MB, it is recommended for this directory to have 1GB space in case if a worst-case write reorder happens to every file. 


c) Update the following property in the hdfs-site.xml file:

<property>
<name>dfs.nfs.exports.allowed.hosts</name>
<value>* rw</value>
</property>

By default, the export can be mounted by any client. You must update this property to control access. The value string contains the machine name and access privilege, separated by whitespace characters. The machine name can be in single host, wildcard, or IPv4 network format. The access privilege uses rw or ro to specify readwrite or readonly access to exports. If you do not specifiy an access privilege, the default machine access to exports is readonly. Separate machine dentries by ;. For example, 192.168.0.0/22 rw ; host*.example.com ; host1.test.org ro;.

Restart the NFS gateway after this property is updated.

d) Specify JVM heap space 

Specify JVM heap space (HADOOP_NFS3_OPTS) for the NFS Gateway. You can increase the JVM heap allocation for the NFS gateway using this option.

export HADOOP_NFS3_OPTS="-Xms2048m -Xmx2048m"

vi /home/hdpclient/.bash_profile
##NFS related
export HADOOP_NFS3_OPTS="-Xms1024m -Xmx2048m"

The above example specifies a 2 GB process heap (1GB starting size and 2GB maximum):


3- Start and Verify the NFS Gateway Service

Three daemons are required to provide NFS service: rpcbind (or portmap), mountd and nfsd. The NFS gateway process has both nfsd and mountd. It shares the HDFS root "/" as
the only export. We recommend using the portmap included in NFS gateway package, as shown below. The included portmap must be used on some Linux systems, for example, SLES 11 and RHEL 6.2.

a) Stop nfs/rpcbind/portmap services provided by the platform if running:   

[root@en01 ~]# service nfs stop

Redirecting to /bin/systemctl stop  nfs.service

[root@en01 ~]# service rpcbind stop

Redirecting to /bin/systemctl stop  rpcbind.service
Warning: Stopping rpcbind.service, but it can still be activated by:
  rpcbind.socket

b) Start the included portmap package (needs root privileges), using one of the following two commands:  

[root@en01 ~]# hadoop-daemon.sh start portmap

starting portmap, logging to /usr/hadoopsw/hadoop-2.7.3/logs/hadoop-root-portmap-en01.out
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

[root@en01 ~]# hadoop-daemon.sh stop portmap
stopping portmap

[root@en01 ~]# hadoop portmap

DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
.....
^C17/12/24 11:50:40 ERROR portmap.Portmap: RECEIVED SIGNAL 2: SIGINT
17/12/24 11:50:40 INFO portmap.Portmap: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down Portmap at en01/192.168.44.134
************************************************************/


[root@en01 ~]# hdfs portmap

...
...
STARTUP_MSG:   build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff; compiled by 'root' on 2016-08-18T01:41Z
STARTUP_MSG:   java = 1.8.0_121
************************************************************/
17/12/24 12:04:42 INFO portmap.Portmap: registered UNIX signal handlers for [TERM, HUP, INT]
17/12/24 12:04:42 INFO portmap.Portmap: Portmap server started at tcp:///0.0.0.0:111, udp:///0.0.0.0:111


c) Start mountd and nfsd.

No root privileges are required for this command. However, verify that the user starting the Hadoop cluster and the user starting the NFS gateway are the same.

[root@en01 ~]# hdfs nfs3

17/12/24 12:06:19 INFO nfs3.Nfs3Base: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting Nfs3
STARTUP_MSG:   host = en01/192.168.44.134
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.7.3
...
...
17/12/24 12:06:20 INFO http.HttpServer2: Jetty bound to port 50079
17/12/24 12:06:20 INFO mortbay.log: jetty-6.1.26
17/12/24 12:06:20 INFO mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50079
17/12/24 12:06:20 INFO oncrpc.SimpleTcpServer: Started listening to TCP requests at port 2049 for Rpc program: NFS3 at localhost:2049 with workerCount 0



Verify the validity of NFS-related services

a) Execute the following command to verify that all the services are up and running:

[root@en01 ~]# rpcinfo -p en01
   program vers proto   port  service
    100005    3   udp   4242  mountd
    100005    1   tcp   4242  mountd
    100000    2   udp    111  portmapper
    100000    2   tcp    111  portmapper
    100005    3   tcp   4242  mountd
    100005    2   tcp   4242  mountd
    100003    3   tcp   2049  nfs
    100005    2   udp   4242  mountd
    100005    1   udp   4242  mountd



b) Verify that the HDFS namespace is exported and can be mounted:

[root@en01 ~]# showmount -e en01

Export list for en01:
/ *


Access HDFS

To access HDFS, first mount the export "/". Currently NFS v3 is supported. It uses TCP, as the transportation protocol is TCP.


a) Mount the HDFS namespace as follows:


Create folder to be used to mount the NFS 

[root@en01 ~]# mkdir -p /data/hdfsloc

Below command (syntax) is used to mount

mount -t nfs -o vers=3,proto=tcp,nolock,sync,rsize=1048576,wsize=1048576 $server:/ $mount_point

Because NLM is not supported, the mount option nolock is needed. 
Use the sync option for performance when writing large files. The sync mount option to the NFS client improves the performance and reliability of writing large files to HDFS using the NFS gateway. If the sync option is specified, the NFS client machine flush writes operations to the NFS gateway before returning control to the client application. A useful side effect of sync is that the client does not issue reordered writes. This reduces buffering requirements on the NFS gateway. sync is specified on the client machine when mounting the NFS share.

[root@en01 ~]# mount -t nfs -o vers=3,proto=tcp,nolock,sync,rsize=1048576,wsize=1048576 en01:/ /data/hdfsloc

Check the mount point

[root@en01 ~]# df
Filesystem             1K-blocks      Used  Available Use% Mounted on
.....
....
tmpfs                    2038280         0    2038280   0% /run/user/1005
en01:/     4264275968 188922880 4075353088   5% /data/hdfsloc


[root@en01 hdfsloc]# df -h
Filesystem             Size  Used Avail Use% Mounted on
...
tmpfs                  2.0G     0  2.0G   0% /run/user/1005
en01:/      4.0T  181G  3.8T   5% /data/hdfsloc

List folder/files on HDFS 

[root@en01 ~]# cd /data/hdfsloc/
[root@en01 hdfsloc]# ll
total 7
drwxrwxrwx 10    3701572 3070102565 320 Dec 20 14:57 app-logs
drwxr-xr-x  5 hdfs       hdfs       160 Nov  8 09:18 apps
drwxr-xr-x  4    3701572 3070102565 128 Jul 26 17:24 ats
drwxr-xr-x  3 hdfs       hdfs        96 Dec 21 12:28 catalog
drwxrwxrwx  6 hdfs       hdfs       192 Dec 21 14:47 data
drwxrwxrwx  6 hdfs       hdfs       192 Aug  9 12:50 flume
drwxr-xr-x  3 hdfs       hdfs        96 Jul 26 17:24 hdp
drwxr-xr-x  3 3213608373 hdfs        96 Jul 26 17:24 mapred
drwxrwxrwx  4 3213608373 3070102565 128 Jul 26 17:24 mr-history
drwxrwxrwx 17  109638365 3070102565 544 Dec 24  2017 spark2-history
drwxrwxrwx 20 hdfs       hdfs       640 Nov 21 16:03 tmp
drwxr-xr-x 14 hdfs       hdfs       448 Dec 21 17:02 user


[root@en01 10]# pwd

/data/hdfsloc/data/flume/syslogs2/2017/10


Cat any file on HDFS

[root@en01 10]# cat /data/hdfsloc/data/flume/syslogs2/2017/10/syslog.1509483515603

<37>Oct 31 23:58:33 nn01 su: (to ambari-qa) root on none
<86>Oct 31 23:58:33 nn01 su: pam_unix(su-l:session): session opened for user ambari-qa by (uid=0)
<86>Oct 31 23:58:43 nn01 su: pam_unix(su-l:session): session closed for user ambari-qa


Congrats your HDFS is available as NFS.

No comments: