Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Friday, June 23, 2017

Forward syslog to Flume with rsyslog


Introduction


Syslog  


In computing, syslog is a standard for message logging. It allows separation of the software that generates messages, the system that stores them, and the software that reports and analyzes them. Each message is labeled with a facility code, indicating the software type generating the message, and assigned a severity label.

Computer system designers may use syslog for system management and security auditing as well as general informational, analysis, and debugging messages. A wide variety of devices, such as printers, routers, and message receivers across many platforms use the syslog standard. This permits the consolidation of logging data from different types of systems in a central repository. Implementations of syslog exist for many operating systems.

Benefits of syslog

  • Helps analyze the root cause for any trouble or problem caused 
  • Reduce overall downtime helping to troubleshoot issues faster with all the logs 
  • Improves incident management by active detection of issues 
  • Self-determination of incidents along with auto resolution 
  • Simplified architecture with different level of severity like error,info,warning etc


In this post, I'll be using HDFS as central repository for syslogs and hive as analytical platform.

Rsyslog  


Rsyslog is an open-source software utility used on UNIX and Unix-like computer systems for forwarding log messages in an IP network. It implements the basic syslog protocol, extends it with content-based filtering, rich filtering capabilities, flexible configuration options and adds features such as using TCP for transport.

Note:
Please review the post Streaming Twitter Data using Apache Flume  before this one for Flume introduction and architechture.

Flume's syslog TCP source  


The Syslog TCP source provides an endpoint for messages over TCP, allowing for a larger payload size and TCP retry semantics that should be used for any reliable inter-server communications.

To create a Syslog TCP source, set the type property to syslogtcp.

vi /usr/hadoopsw/apache-flume-1.7.0-bin/conf/syslog.conf

# Naming the components on the current agent.
agent.sources=SourceSyslog
agent.channels=ChannelMem
agent.sinks=SinkHDFS

# Describing/Configuring the source
#agent.sources.SourceSyslog.type=syslogudp
agent.sources.SourceSyslog.type=syslogtcp
agent.sources.SourceSyslog.host=0.0.0.0
agent.sources.SourceSyslog.port=12345
agent.sources.SourceSyslog.keepFields=true


# Describing/Configuring the channel
agent.channels.ChannelMem.type=memory
agent.channels.ChannelMem.capacity = 10000
agent.channels.ChannelMem.transactionCapacity = 1000

# Describing/Configuring the sink  
agent.sinks.SinkHDFS.type=hdfs
agent.sinks.SinkHDFS.hdfs.path = /flume/syslogs/
agent.sinks.SinkHDFS.hdfs.fileType = DataStream
agent.sinks.SinkHDFS.hdfs.writeFormat = Text
agent.sinks.SinkHDFS.hdfs.batchSize = 1000
agent.sinks.SinkHDFS.hdfs.rollSize = 0
agent.sinks.SinkHDFS.hdfs.rollCount = 10000

# Binding the source and sink to the channel
agent.sources.SourceSyslog.channels = ChannelMem
agent.sinks.SinkHDFS.channel = ChannelMem




The keepFields property tells the source to include the syslog fields as part of the body.

By default, these are simply removed, as they become Flume header values. Memory channel Property capacity determines 10000 events and transactionCapacity determines maximum number of events that can be written, also called a put, by a source’s ChannelProcessor, the component responsible for moving data from the source to the channel, in a single transaction. This is also the number of events that can be read, also called a take, in a single transaction by the SinkProcessor, which is the component responsible for moving data from the channel to the sink.

Remember that if you increase capacity property value, you will most likely have to increase your Java heap space using the -Xmx, and optionally -Xms, parameters.

Run Flume agent 

Create relevant folders in HDFS as mentioned in flume configuration

[hdpsysuser@te1-hdp-rp-nn01 ~]$ hdfs dfs -mkdir /flume/syslogs
[hdpsysuser@te1-hdp-rp-nn01 ~]$ hdfs dfs -chmod -R 777 /flume



Now you can run the flume using below command


flume-ng agent -n agent -c conf -f $FLUME_HOME/conf/syslogudp.conf - Dflume.root.logger=INFO,console

Test with nc


Now  test the connection to your flume agent using nc. nc is the command which runs netcat, a simple Unix utility that reads and writes data across network connections, using the TCP or UDP protocol.


[hdpclient@en01 ~]$ nc localhost 12345  
Event-1

Type above line and press enter, this line should be transported to Flume agent and it should write to HDFS location specified in the configuration file.

Test whether data reached to final destination (HDFS) 

[hdpclient@en01 ~]$ hdfs dfs -cat /flume/syslogs/FlumeData.1497263656231
Event-1


Use Hive to analyze (Optional)

create database flume;
use flume;

create external table syslog(log_line string) location '/flume/syslogs'

select line
from syslog
--where line like '%user root%'
--where line like '%Invalid user%'
where lower(line) like '%authentication failure%'
limit 5;

Configure remote logging with rsyslog (Unix Client) 

OS: Redhat 7.x


Configure the rsyslog to send rsyslog events to another server using TCP.

1- Add the following line to the RULES section of /etc/rsyslog.conf

# remote host is: name/ip:port, e.g. 192.168.0.1:514, port optional
#*.* @remote-host:514
*.*         @@10.10.10.1:514
vi /etc/rsyslog.conf

#### RULES ####

# Log all kernel messages to the console.

# Logging much else clutters up the screen.
#kern.*                                                 /dev/console

# Log anything (except mail) of level info or higher.

# Don't log private authentication messages!
*.info;mail.none;authpriv.none;cron.none                /var/log/messages

# The authpriv file has restricted access.

authpriv.*                                              /var/log/secure

# Log all the mail messages in one place.

mail.*                                                  -/var/log/maillog


# Log cron stuff

cron.*                                                  /var/log/cron

# Everybody gets emergency messages

*.emerg                                                 :omusrmsg:*

# Save news errors of level crit and higher in a special file.

uucp,news.crit                                          /var/log/spooler

# Save boot messages also to boot.log

local7.*                                                /var/log/boot.log

# remote host is: name/ip:port, e.g. 192.168.0.1:514, port optional

#*.* @remote-host:514
*.*         @@192.168.44.134:12345


Note: You can add more than one hosts/agents in rsyslog.conf, message will be pushed to all servers in configuration


*.*         @@192.168.44.138:12346
*.*         @@192.168.44.138:12345

2- Restart rsyslog.

[hdpsysuser@te1-hdp-rp-dn04 ~]$ service rsyslog restart

Redirecting to /bin/systemctl restart  rsyslog.service
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to manage system services or units.
Authenticating as: hdpsysuser
Password:
==== AUTHENTICATION COMPLETE ===

3- Test the configuration using logger command. Logger makes entries in the system log.


[root@te1-hdp-rp-dn04 ~]# logger Test from Data Node - 4
Check your message

[root@te1-hdp-rp-dn04 ~]# tail /var/log/messages
Jun 12 14:18:59 te1-hdp-rp-dn04 fprintd: ** Message: entering main loop
Jun 12 14:19:19 te1-hdp-rp-dn04 su: (to root) hdpsysuser on pts/1
Jun 12 14:19:19 te1-hdp-rp-dn04 dbus[950]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Jun 12 14:19:19 te1-hdp-rp-dn04 dbus-daemon: dbus[950]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Jun 12 14:19:19 te1-hdp-rp-dn04 dbus[950]: [system] Successfully activated service 'org.freedesktop.problems'
Jun 12 14:19:19 te1-hdp-rp-dn04 dbus-daemon: dbus[950]: [system] Successfully activated service 'org.freedesktop.problems'
Jun 12 14:19:29 te1-hdp-rp-dn04 fprintd: ** Message: No devices in use, exit
Jun 12 14:19:39 te1-hdp-rp-dn04 hdpsysuser: Test from Data Node - 4
Jun 12 14:20:01 te1-hdp-rp-dn04 systemd: Started Session 11254 of user root.
Jun 12 14:20:01 te1-hdp-rp-dn04 systemd: Starting Session 11254 of user root.


4- Verify the flume agent's HDFS Location

[hdpclient@en01 ~]$ hdfs dfs -ls /flume/syslogs
Found 8 items
-rw-r--r--   3 hdpclient supergroup         10 2017-06-12 13:46 /flume/syslogs/FlumeData.1497264415313
-rw-r--r--   3 hdpclient supergroup         17 2017-06-12 13:50 /flume/syslogs/FlumeData.1497264697909
-rw-r--r--   3 hdpclient supergroup          9 2017-06-12 13:51 /flume/syslogs/FlumeData.1497264761727
-rw-r--r--   3 hdpclient supergroup         10 2017-06-12 13:55 /flume/syslogs/FlumeData.1497264970862
-rw-r--r--   3 hdpclient supergroup        497 2017-06-12 14:14 /flume/syslogs/FlumeData.1497266092851
-rw-r--r--   3 hdpclient supergroup         36 2017-06-12 14:16 /flume/syslogs/FlumeData.1497266217106
-rw-r--r--   3 hdpclient supergroup       1272 2017-06-12 14:16 /flume/syslogs/FlumeData.1497266252522
-rw-r--r--   3 hdpclient supergroup        176 2017-06-12 14:17 /flume/syslogs/FlumeData.1497266292306

Verify using browser


Check using Hive table
Query the hive table created earlier.





Monitor Flume metrics

You can configure the Flume agent to start an HTTP server that will output JSON that can use queries by outside mechanisms.

Start the Flume agent with these properties:

-Dflume.monitoring.type=http
-Dflume.monitoring.port=44444

flume-ng agent -n agent -c conf -f $FLUME_HOME/conf/syslog.conf - Dflume.root.logger=INFO,console -Dflume.monitoring.type=http -Dflume.monitoring.port=44444



Now, when you go to http://SERVER_OR_IP:44444/metrics

you will see something like below


{"CHANNEL.ChannelMem":{"ChannelCapacity":"1000000","ChannelFillPercentage":"0.0","Type":"CHANNEL","EventTakeSuccessCount":"14","ChannelSize":"0","EventTakeAttemptCount":"47","StartTime":"1497273770141","EventPutAttemptCount":"14","EventPutSuccessCount":"14","StopTime":"0"},"SINK.SinkHDFS":{"ConnectionCreatedCount":"2","ConnectionClosedCount":"2","Type":"SINK","BatchCompleteCount":"0","BatchEmptyCount":"31","EventDrainAttemptCount":"14","StartTime":"1497273770143","EventDrainSuccessCount":"14","BatchUnderflowCount":"2","StopTime":"0","ConnectionFailedCount":"0"},"SOURCE.SourceSyslog":{"EventReceivedCount":"14","AppendBatchAcceptedCount":"0","Type":"SOURCE","EventAcceptedCount":"14","AppendReceivedCount":"0","StartTime":"1497273770202","AppendAcceptedCount":"0","OpenConnectionCount":"0","AppendBatchReceivedCount":"0","StopTime":"0"}}




The channel’s ChannelSize or ChannelFillPercentage metrics will give you a good idea whether the data is coming in faster than it is going out. It will also tell you whether you have it set large enough for maintenance/outages of your data volume.

Looking at the sink, EventDrainSuccessCount versus EventDrainAttemptCount will tell you how often output is successful when compared to the times tried. ConnectionFailedCount metric is a good indicator of persistent connection problems.


A growing ConnectionCreatedCount metric can indicate that connections are dropping and reopening too often.


No comments: