Git Product home page Git Product logo

largecopybenchmark's Introduction

Copy strategy Benchmarking

The scripts presented here try to solve the problem of copying big amount of files from one directory to another using single thread, multiple threads and processes. Definitions for each file and the results of tests are written below. Our strategy of testing will include copying 1,2 and 3GB of files from one directory to another. Measured execution times present here will differ from your computers since the conducted test heavily depends on the type of hardware that we use. Our initial guess about tests is, that parallel processes/threads will have a better running time.

Script structure

Different copy types are divided into modules and imported in the main test script. The multithreading and process methods use 4 threads/methods for copying the files. The naming convention for the generated files is file***.bin and all the files are seeded into the folder named from after which are copied to the folder named to. All the file names are segmented i.e. divided into 4 parts of names and each thread/process takes care of copying one of those parts. For example, if we generate 100 files (which is the case) we will divide it to parts like 0->25, 26->50,51->75,76->99 and for example first thread will copy first 25 files, the second one next 25 files and so on. In order to conduct a test run the fileCopyBenchmark.py script by using:

pyton ./fileCopyBenchmark.py

Or you can make it executable and simply run it like ./fileCopyBenchmark.py. For making the file executable you may need to run:

chmod +x fileCopyBenchmark.py
File Description
fileCopyBenchmark.py The main script file that runs all tests and prints results
regularCopy.py Module for copying files with single thread
threadCopy.py Module for copying files with multiple threads
processCopy.py Module for copying files with multiple processes
genFiles.sh Shell script for generating the data to be copied

Testing results

There are 4 types of situations tested and those situations differ in file size that has to be copied. The performance for each case is shown in the table below. All the tests were executed 20 times and a sample of 20 execution times was collected. Mean of the execution time is chosen to be an estimator.

Size Single Thread 4 Threads 4 Processes
1GB 1.880059161150001 1.2220340130999994 1.1681056276499995
2GB 3.744077671799997 2.1066945315500023 2.2774515955555555
3GB 6.689873728549995 4.0677183790499966 4.0242131774499966
5GB 12.44549148619999 10.202657355899984 8.3506162059000022

Conclusion

We can see that our initial guess was right, as all the parallelized methods had better performance. We can also notice that the difference between single-threaded and parallelized implementations starts to get noticeable as the size of transferred files increases. We can also notice from the last row of the table that processes start to gain the advantage over threads when the size increases too.

largecopybenchmark's People

Contributors

armansujoyan avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.