Lab session 1

Due Tuesday, 1 October 2019, 11:59 PM
Late submissions: Only allowed for participants who have been granted an extension
Word adjacency networks

First steps with Hadoop.

I have updated the

  - fixed typos

  - included a comment to describe how to use multiple inputs 

  - gave a pattern to parse movies.csv

        private static final Pattern p = Pattern.compile("^([0-9]+),([^,\"]+|\"[^\"]+\"),(.+)$");

        public void map(LongWritable offset, Text line, Context context)
            throws IOException, InterruptedException {
            Matcher m = p.matcher(line.toString());
            if (m.find()) {
                String movieId =;

                String movieName =;

// Do something here

How to define a job with multiple inputs:

                        Job job5 = Job.getInstance(getConf(), "MovieRecommendation 5");
                        MultipleInputs.addInputPath(job5, new Path(args[1]+".t4"),
                                                    TextInputFormat.class, NameMovie.Map1.class);
                        MultipleInputs.addInputPath(job5, new Path(args[0]+"/movies.csv"),
                                                    TextInputFormat.class, NameMovie.Map2.class);
                        FileOutputFormat.setOutputPath(job5, new Path(args[1]+".t5"));
                            return 1 ;