Git Product home page Git Product logo

map-reduce's Introduction

Map-Reduce

File transfer to VM(already copied in cs4417-lab-5)

sftp -i cs4417-lab-5.pem [email protected]

Transfer the input files first(already copied in cs4417-lab-5)

put starbucks-locations-sort.csv

put movies.dat

Transfer the code files

put Part1.zip

put Part2.zip

put Part3.zip

exit

Login to VM

ssh -i cs4417-lab-5.pem [email protected]

Create input directory in hdfs(already created in cs4417-lab-5)

hadoop fs -mkdir /user/cloudera/inputAssignment1

Create sub-directory to place the input files

hadoop fs -mkdir /user/cloudera/inputAssignment1/Starbucks

hadoop fs -mkdir /user/cloudera/inputAssignment1/Movies

Copy input files to hdfs(already copied in cs4417-lab-5)

hadoop fs -copyFromLocal starbucks-locations-sort.csv /user/cloudera/inputAssignment1/Starbucks/starbucks-locations-sort.csv

hadoop fs -copyFromLocal movies.dat /user/cloudera/inputAssignment1/Movies/movies.dat

Display list of files in hdfs

hadoop fs -ls /user/cloudera/inputAssignment1/Starbucks/

hadoop fs -ls /user/cloudera/inputAssignment1/Movies/

Unzip all the code files

unzip Part1.zip

unzip Part2.zip

unzip Part3.zip

Execute Part1

cd Part1

Run hadoop job for Part1

hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar -mapper mapper.py -file mapper.py -reducer reducer.py -file reducer.py -input /user/cloudera/inputAssignment1/Starbucks -output /user/cloudera/outputAssignment1/Starbucks

Display Reducer output

hadoop fs -cat /user/cloudera/outputAssignment1/Starbucks/*

Write Reducer output to a file(required)

hadoop fs -getmerge /user/cloudera/outputAssignment1/Starbucks/* cityInformation

Execute the query(reads the cityInformation file)

python query.py

Remove output folder(if needed)

hadoop fs -rm /user/cloudera/outputAssignment1/Starbucks/*

hadoop fs -rmdir /user/cloudera/outputAssignment1/Starbucks

Execute Part2

cd ..

cd Part2

Execute the indexer(Reads the input file from '/home/cloudera/' path)

python indexer.py

Execute the query

python query.py

Execute Part3

cd ..

cd Part3

Run hadoop job for Part3

hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar -mapper mapper.py -file mapper.py -reducer reducer.py -file reducer.py -input /user/cloudera/inputAssignment1/Movies -output /user/cloudera/outputAssignment1/Movies

Display Reducer output

hadoop fs -cat /user/cloudera/outputAssignment1/Movies/*

Write Reducer output to a file(required)

hadoop fs -getmerge /user/cloudera/outputAssignment1/Movies/* invertedIndex

Execute the query(reads the invertedIndex file)

python query.py

Remove output folder(if needed)

hadoop fs -rm /user/cloudera/outputAssignment1/Movies/*

hadoop fs -rmdir /user/cloudera/outputAssignment1/Movies

map-reduce's People

Contributors

sbasak3 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.