Ticker

6/recent/ticker-posts

Apache Flume | Configuration File Creation - Datacloudy

 

Flume is a distributed, reliable and advisable service for efficiently collecting, aggregating and moving large amounts of streaming event data.

It is a distributed data collection service that gets flow of data from their source and aggregates them to where they have to be processed.

It was designed to handle log data solely, but later, it was developed to process event data.

The three main component in Flume flow is,

  • Source
  • Channel
  • Sink
Basic thing of Working in flume is creating the Flume Configuration File, we will see about this.
Before that, let us see the working flow of Flume.




          




From the diagram, we can understand that the data is arriving to source from streaming source by connecting with API. Then the data is moved to channel at last it reaches the sink.

The channel is actually the first entry point of system to store data. We can consider it like a RAM in CPU. So it is like Near real time data transfer, because it is stored in channel for some time.

The sink is the final destination that is HDFS.

Now let us come to Flume Configuration file creation, here we need to consider about 6 points,

Six Steps of Flume Configuration file creation:

  1. First step is to decide the name of the agent.

  2. In the Second step, we need to Setup the Source.

  3. In the Third step, we need to Setup the Channel.

  4. The forth step is to Setup the Sink.

  5. Fifth step is, Connect Source with Channel and Sink with Channel.

  6. The sixth step, that is the final step is to Execute the Configuration. 

The Advise here is, no need to remember everything, always go with the Flume Documentation for the creation of the Configuration File.

Let us see an example,
        The use case we are going to see here is , grab the real time data from Netcat source and store the data in HDFS

Step 1:

    For creating the config file we can use vi command,
vi configfilename.conf    


Step2:

we need to write the below code and that can be referred from the Flume manual.

  #Create variable of 3 components:

AgentName.Sources=getFromNetcat
AgentName.Channels=getToRAM
AgentName.Sinks=writetoHDFS

  #Configure The Source:

AgentName.Sources.getFromNetcat.type=netcat
AgentName.Sources.getFromNetcat.bind={Ipaddress}
AgentName.Sources.getFromNetcat.port={portnumber}

  #Configure The Channel:

AgentName.Channel.goToRAM.type=memory.

  #Configure The Sink:

AgentName.Sinks.writetoHDFS.type=hdfs
AgentName.Sinks.writetoHDFS.hdfs.path={path}
AgentName.Sinks.writetoHDFS.hdfs.writeFormat=Text
AgentName.Sinks.writetoHDFS.hdfs.fileType=DataStream

  #Connect Source With Channel:

AgentName.Sources.getFromNetcat.Channels=goToRAM


  #Connect sink with channel:

AgentName.Sinks.writetoHDFS.channel=goToRAM


Note in the above example the data that is mention in { } it is the data that needs to be provided by the user. That is it needs to be configured by the user.

Step3:

We have created the config file as shown above. Now, we need to run it by below command

flume-ng agent --name AgentName -f configfilename.conf    


Thus, we have seen the overview of Flume and a simple example to create the configuration file for flume. Hope it will be helpful. You can follow this page by clicking follow button on the top right of the screen. You can also follow us on the social media via social media plugin given above the follow button.


Thank You !!! 

    

Post a Comment

0 Comments

Ad Code