Lab session 1

Due Tuesday, 1 October 2019, 11:59 PM
Late submissions: Only allowed for participants who have been granted an extension
Word adjacency networks

First steps with Hadoop.



I have updated the Template.zip:

  - fixed typos

  - included a comment to describe how to use multiple inputs 

  - gave a pattern to parse movies.csv

        private static final Pattern p = Pattern.compile("^([0-9]+),([^,\"]+|\"[^\"]+\"),(.+)$");

        public void map(LongWritable offset, Text line, Context context)
            throws IOException, InterruptedException {
            Matcher m = p.matcher(line.toString());
            if (m.find()) {
                String movieId = m.group(1);

                String movieName = m.group(2);

// Do something here

            }
How to define a job with multiple inputs:

                        Job job5 = Job.getInstance(getConf(), "MovieRecommendation 5");
                        job5.setJarByClass(this.getClass());
                        MultipleInputs.addInputPath(job5, new Path(args[1]+".t4"),
                                                    TextInputFormat.class, NameMovie.Map1.class);
                        MultipleInputs.addInputPath(job5, new Path(args[0]+"/movies.csv"),
                                                    TextInputFormat.class, NameMovie.Map2.class);
                        FileOutputFormat.setOutputPath(job5, new Path(args[1]+".t5"));
                        job5.setReducerClass(NameMovie.Reduce.class);
                        job5.setOutputKeyClass(Text.class);
                        job5.setOutputValueClass(Text.class);
                        if(!job5.waitForCompletion(true))
                            return 1 ;