Hadoop – Mapper & Reducer (Mar `16)


Data file & Description

For this project, I used a huge enrollment data of my university. There are over 660,000 data rows on this csv file. It’s almost impossible to look over all these. So, what I tried to do was parsing this chunk of data as my need using mapper and reducer from Hadoop.

  • A: Semester ID
  • B: Semester
  • C: Location
  • D: Days of the week
  • E: Time
  • F: Coursed
  • G: Course Name
  • H: Actual Enrollment
  • I: Max Enrollment


With this csv, I parsed as [Location(w/o room#)_Semester  ActualEnrolled].

So on my mapper, it gets location(but not room#) and then # of Actual Enrollment, but it filters unknown data(some data has unknown or Arr value).

On reducer, it adds up all the same building’s # of enrollment.

Actual code


Chang Min Park

This code filters out Semester name "Unknown"
also, building name "Arr" or "Unknown", but building number doesn't matter
also, if enrolled number is not an actual number.


import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
   public static boolean isNumber(String string){
      catch(Exception e){
         return false;
      return true;

   public static class TokenizerMapper
      extends Mapper<Object, Text, Text, IntWritable>{

      private final static IntWritable one = new IntWritable (1);
      private Text word = new Text ();
      public void map (Object key, Text value, Context context
         )throws IOException, InterruptedException{
         String[] str = value.toString().split(",");
         String[] building = str[2].split(" ");

         } else{
            context.write(word, new IntWritable(Integer.parseInt(str[7])));

   public static class IntSumReducer
      extends Reducer <Text, IntWritable, Text, IntWritable> {
      private IntWritable result = new IntWritable();
      public void reduce (Text key, Iterable<IntWritable> values,
         Context context) throws IOException, InterruptedException {
         int sum = 0;
         for(IntWritable val : values) {
            sum += val.get();
         context.write(key, result);
   public static void main (String[] args) throws Exception {
      Configuration conf = new Configuration();
      Job job = Job.getInstance(conf, "word count");
      FileInputFormat.addInputPath(job, new Path(args [0]));
      FileOutputFormat.setOutputPath(job, new Path(args [1]));
      System.exit(job.waitForCompletion(true) ? 0 : 1 );


This result prints out each location's all year total # of actual enrollment.
It can help a lot on assigning lecture hall for university technicians.