- git clone [our repo]
- Go to our repo's directory(332project)
- To run master, type-> sbt "run master {number of workers}"
- To run worker, type-> sbt "run worker { masterIp:masterPort(Port initialized to 50051) } -I {path of input data} -O {output directory}
- Ex) If input directory exists in 332project/data1/input, {path of input data} will be /data1/input/ .
- Ex) master: sbt "run master 4" worker: sbt "run worker -I /data/input1/ /data/input2/ -O /data/output" You can add various input files such as /data1/input/ /data2/input/ .
- Result is sorted in {output directory} including several files.
- Test environment : vm servers (eg. 2.2.2.101:50051)
- Test with 10 lines, 1MB, 32MB blocks. Total size of blocks will be increased while test.
- Give input of one dir, severl dirs, and repeated dirs.
- Verify with valsort.
General setup
- gRPC communication server,client setup
- Implement Master, Worker class
Implement communication phase
- Implement server, client class
Implement Sampling phase (Master)
- Decide how to set key range
Implement Sorting phase (Worker) (Distributed/Parallel phase)
- Decide hot to prevent collapse (parallel programming)
Implement Partitioning phase (Worker) (Distributed/Parallel phase)
- Decide hot to prevent collapse (parallel programming)
Implement Shuffle & Merge phase (Worker & Master)
- Decide sorting(merging) algorithm