Apache FLUME basics – Configuring Simple agent

how to configure flume agent and start the agent.

  1. Create the flume.conf file. It’s good to create under it’s default conf folder i.e., ‘/etc/flume/conf’
  2. vi /etc/flume/conf/flume.conf

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
#a1.sinks.k1.type = logger
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://sandbox.hortonworks.com:8020/user/test/temp/
a1.sinks.k1.hdfs.fileType = DataStream

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3. Start the agent using below line, required run in background using nohup and &

flume-ng agent –conf conf –conf-file /etc/flume/conf/flume.conf –name a1

When you start, you will last messages something like this, which says source, channel and sink are started.

16/02/04 12:56:56 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
16/02/04 12:56:56 INFO node.Application: Starting Sink k1
16/02/04 12:56:56 INFO node.Application: Starting Source r1
16/02/04 12:56:56 INFO source.NetcatSource: Source starting
16/02/04 12:56:57 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.
16/02/04 12:56:57 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started
16/02/04 12:56:57 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]

4. Test the agent

4.1 Check if flume process running with the agent name started in point 3

ps -ef|grep flume
flume    26401 17895 12 12:56 pts/1    00:00:19 /usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/java -Xmx20m -cp conf:/usr/hdp/2.3.0.0-2130/flume/lib/*:/usr/hdp/2.3.0.0-2130/hadoop/conf:/usr/hdp/2.3.0.0-2130/hadoop/lib/activation-1.1.jar:/usr/hdp/2.3.0.0-2130/hadoop/lib/apacheds-i18n-2.0.0-M15.jar:…………..very lengthy output

4.2 We use source netcat to run on port 44444,  verity port listen status

LISTEN status indicates Source started properly

netstat -na|grep 44444
tcp        0      0 ::ffff:127.0.0.1:44444      :::*                        LISTEN   

4.3 Now do the telnet test to verify source/channel/sink i.e when you telnet and type some message, it should pass through memory channel to the hdfs and create a file there as it’s configured as sink.

Terminal 1:

telnet localhost 44444
Trying 127.0.0.1…
Connected to localhost.
Escape character is ‘^]’.
how are you doing. we are testing basic flume configuration
OK

Quit by typing ctrl+]

Terminal 2: Where you have started flume agent will show this output.

16/02/04 13:04:07 INFO hdfs.BucketWriter: Closing hdfs://sandbox.hortonworks.com:8020/user/test/temp//FlumeData.1454591015290.tmp
16/02/04 13:04:07 INFO hdfs.BucketWriter: Renaming hdfs://sandbox.hortonworks.com:8020/user/test/temp/FlumeData.1454591015290.tmp to hdfs://sandbox.hortonworks.com:8020/user/test/temp/FlumeData.1454591015290
16/02/04 13:04:07 INFO hdfs.HDFSEventSink: Writer callback called.

Terminal 1: After quit from telnet, cat the hdfs file to see its content.

hdfs dfs -cat hdfs://sandbox.hortonworks.com:8020/user/test/temp/FlumeData.1454591015290
how are you doing. we are testing basic flume configuration