Map Reduce Program explanation with a example java program:

M S Dillibabu
4 min readMar 3, 2020

--

Lets take the example of word count program. Below program will find the number of each distinct words in file.txt(500MB size) using map reduce concept. We will be dealing with three java files. They are driver code, mapper code, reducer code. Before going into the program lets have discussion about Box classes which is used in allmmapreduce program.All key value pairs used in mapper and reducer class should be in objective type. In order to overcome that Box classes came into existence.Likewise, in java collection frameworks, we will be using the wrapper classes instead of primitive data types example map interface has key value pair in wrapper classes.

We can convert the primitive types into Box classes by — new IntWritable(int), new LongWritable(long) etc. We can convert Box classes into primitve types also by — get() method and for Text Box class we will be using toString() method to convert into primitive type as you can find above image

As we discussed earlier file.txt is splitted into four blocks of each block has 128 MB and last block has 116 MB of data(total 500 MB). Before going into mapper class, record reader need to input the key value pair to mapper class. We will not be writing any code for recordreader, it will automatically change the data into (byteoffset, entire line) format. This key value pair will be in TextInputFormat only by default. If we want to input the key value pair in KeyValueTextInputFormat or any other format listed earlier we need to write explicitly in driver code. Lets deal with Mapper class

Flow chart

Lets deal with mapper code.

we will be extending the MapReduceBase class inorder to pass the data from mapper to reducer class and implementing mapper interface. As i said, the mapper class will be inputted with (byteoffset, entire line ) by record reader, this will be in (LongWritable, Text) format. example (0, hi how are you) -> (LongWritable, Text) format and the output key value pairs from mapper class will be in (Text, IntWritable)-> As per Flow chart image the output of mapper class is (hi, 1) which is obviously (Text, IntWritable). So the mapper interface is given as Mapper<LongWritable,Text,Text,IntWritable>. map method is implemented because of mapper interface implemented, it contains (input key, input value, output collector and reporter parameters). Output Collector is used to collect the output from mapper class and take it to input for reducer class. Reporter is used to report any issues while performing map method and it will report to driver code. Next we need to process each word in a line and assign it as count 1, for that operation we need to change the Text box class type to String primitve types. Since methods will work on primitive types only. so the value.toString() is used.Now the OutputCollector will collect each word count as 1 and input to reducer code.

Reducer code:

Here we will be implementing the reducer interface and the reduce method is defined. It has Iterator<IntWritable> values as a parameter — since we need to count the each word occurance, it will be used to process every word. Here values are in box class type so it needs to be changed to primitive types by get() method to sum and atlast it is collected through OutputCollector.

Driver Code:-

Steps to execute the above program:-

  • Create a text file in your local machine and write some text into it.
    $ nano file.txt
  • Create a directory in HDFS, where to keep this file.
    $ hdfs dfs -mkdir /test
  • Upload the file.txt file on HDFS in the specific directory.
    $ hdfs dfs -put /home/msd/file.txt /test
  • Create the jar file of this 3 program (mapper,reducer,driver code) and name it countworddemo.jar.
  • Run the jar file
    hadoop jar /home/msd/wordcountdemo.jar WC_Runner /test/file.txt /r_output
  • The output is stored in /r_output/part-00000

--

--

No responses yet