Git Product home page Git Product logo

cluster-scheduler-simulator's Introduction

Cluster scheduler simulator overview

This simulator can be used to prototype and compare different cluster scheduling strategies and policies. It generates synthetic cluster workloads from empirical parameter distributions (thus generating unique workloads even from a small amount of input data), simulates their scheduling and execution using a discrete event simulator, and finally permits analysis of scheduling performance metrics.

The simulator was originally written as part of research on the "Omega" shared-state cluster scheduling architecture at Google. A paper on Omega, published at EuroSys 2013, uses of this simulator for the comparative evaluation of Omega and other alternative architectures (referred to as a "lightweight" simulator there) [1]. As such, the simulators design is somewhat geared towards the comparative evaluation needs of this paper, but it does also permit more general experimentation with:

  • scheduling policies and logics (i.e. "what machine should a task be bound to?"),
  • resource models (i.e. "how are machines represented for scheduling, and how are they shared between tasks?"),
  • shared-cluster scheduling architectures (i.e. "how can multiple independent schedulers be supported for a large shared, multi-framework cluster?").

While the simulator will simulate job arrival, scheduler decision making and task placement, it does not simulate the actual execution of the tasks or variation in their runtime due to shared resources.

Downloading, building, and running

The source code for the simulator is available in a Git repository hosted on Google Code. Instructions for downloading can be found at at https://code.google.com/p/cluster-scheduler-simulator/source/checkout.

The simulator is written in Scala, and requires the Simple Build Tool (sbt) to run. A copy of sbt is package with the source code, but you will need the following prerequisites in order to run the simulator:

  • a working JVM (openjdk-6-jre and openjdk-6-jdk packages in mid-2013 Ubuntu packages),
  • a working installation of Scala (scala Ubuntu package),
  • Python 2.6 or above and matplotlib 1.0 or above for generation of graphs (python-2.7 and python-matplotlib Ubuntu packages).

Once you have ensured that all of these exist, simply type bin/sbt run from the project home directory in order to run the simulator:

$ bin/sbt run
[...]
[info] Compiling 9 Scala sources and 1 Java source to ${WORKING_DIR}/target/scala-2.9.1/classes...
[...]
[info] Running Simulation 

RUNNING CLUSTER SIMULATOR EXPERIMENTS
------------------------
[...]

Using command line flags

The simulator can be passed some command-line arugments via configuration flags, such as --thread-pool-size NUM_THREADS_INT and --random-seed SEED_VAL_INT. To view all options run:

$ bin/sbt "run --help"

Note that when passing command line options to the sbt run command you need to include the word run and all of the options that follow it within a single set of quotes. sbt can also be used via the sbt console by simply running bin/sbt which will drop you at a prompt. If you are using this sbt console option, you do not need to put quotes around the run command and any flags you pass.

Configuration file

If a file conf/cluster-sim-env.sh exists, it will be sourced in the shell before the simulator is run. This was added as a way of setting up the JVM (e.g. heap size) for simulator runs. Check out conf/cluster-sim-env.sh.template as a starting point; you will need to uncomment and possibly modify the example configuration value set in that template file (and, of course, you will need to create a copy of the file removing the ".template" suffix).

Configuring experiments

The simulation is controlled by the experiments configured in the src/main/scala/Simulation.scala setup file. Comments in the file explain how to set up different workloads, workload-to-scheduler mappings and simulated cluster and machine sizes.

Most of the workload setup happens in src/main/scala/Workloads.scala, so read through that file and make modifications there to have the simulator read from a trace file of your own (see more below about the type of trace files the simulator uses, and the example files included).

Workloads in the simulator are generated from empirical parameter distributions. These are typically based on cluster snapshots (at a point in time) or traces (sequences of events over time). We unfortunately cannot provide the full input data used for our experiments with the simulator, but we do provide example input files in the traces subdirectory, illustrating the expected data format (further explained in the local README file in traces). The following inputs are required:

  • initial cluster state: when the simulation starts, the simulated cluster obviously cannot start off empty. Instead, we pre-load it with a set of running jobs (and tasks) at this point in time. These jobs start before the beginning of simulation, and may end during the simulation or after. The example file traces/example-init-cluster-state.log shows the input format for the jobs in the initial cluster state, as well as the departure events of those of them which end during the simulation. The resource footprints of tasks generated at simulation runtime will also be sampled from the distribution of resource footprints of tasks in the initial cluster state.
  • job parameters: the simulator samples three key parameters for each job from empirical distributions (i.e. randomly picks values from a large set):
    1. Job sizes (traces/job-distribution-traces/example_csizes_cmb.log): the number of tasks in the generated job. We assume for simplicity that all tasks in a job have the same resource footprint.
    2. Job inter-arrival times (traces/job-distribution-traces/example_interarrival_cmb.log): the time in between job arrivals for each workload (in seconds). The value drawn from this distribution indicates how many seconds elapse until another job arrives, i.e. the "gaps" in between jobs.
    3. Job runtimes (traces/job-distribution-traces/example_runtimes_cmb.log): total job runtime. For simplicity, we assume that all tasks in a job run for exactly this long (although if a task gets scheduled later, it will also finish later).

For further details, see traces/README.txt and traces/job-distribution-traces/README.txt.

Please note that the resource amounts specified in the example data files, and the example cluster machines configured in Simulation.scala do not reflect Google configurations. They are made-up numbers, so please do not quote them or try to interpret them!

A possible starting point for generating realistic input data is the public Google cluster trace [2, 3]. It should be straightforward to write scripts that extract the relevant data from the public trace's event logs. Although we do not provide such scripts, it is worth noting that the "cluster C" workload in the EuroSys paper [1] represents the same workload as the public trace. (If you do write scripts for converting the public trace into simulator format, please let us know, and we will happily include them in the simulator code release!)

Experimental results: post-processing

Experimental results are stored in serialized Protocol Buffers in the experiment_results directory at the root of the source tree by default: one subdirectory for each experiment, and with a unique name identifying the experimental setup as well as the start time. The schemas for the .protobuf files are stored in src/main/protocolbuffers.

A script for post-processing and graphing experimental results is located in src/main/python/graphing-scripts, and src/main/python also contains scripts for converting the protobuf-encoded results into ASCII CSV files. See the README file in the graphing-scripts directory for detailed explanation.

NOTES

Changing and compiling the protocol buffers

If you make changes to the protocol buffer file (in src/main/protocolbuffers), you will need to recompile them, which will generate updated Java files in src/main/java. To do so, you must install the protcol buffer compiler and run src/main/protocolbuffers/compile_protobufs.sh, which itself calls protoc (which it assumes is on your PATH).

Known issues

  • The schedulePartialJobs option is used in the current implementation of the MesosScheduler class. Partial jobs are always scheduled (even if this flag is set to false). Hence the mesosSimulatorSingleSchedulerZeroResourceJobsTest currently fails to pass.

Contributing, Development Status, and Contact Info

Please use the Google Code project issue tracker for all bug reports, pull requests and patches, although we are unlikely to be able to respond to feature requests. You can also send any kind of feedback to the developers, Andy Konwinski and Malte Schwarzkopf.

References

[1] Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek and John Wilkes. Omega: flexible, scalable schedulers for large compute clusters. In Proceedings of the 8th European Conference on Computer Systems (EuroSys 2013).

[2] Charles Reiss, Alexey Tumanov, Gregory Ganger, Randy Katz and Michael Kotzuch. Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis. In Proceedings of the 3rd ACM Symposium on Cloud Computing (SoCC 2012).

[3] Google public cluster workload traces. https://code.google.com/p/googleclusterdata/.

cluster-scheduler-simulator's People

Contributors

johnwilkes avatar ms705 avatar viirya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cluster-scheduler-simulator's Issues

In CoreClusterSimulation you can replace "to" by "until"

You can replace "to" by "until" in a for-comprehension at 
CoreClusterSimulation. The effect is that it can stop looping just before 
'length' (achieving the same effect of 'for (i<- exp to list.length -1)'), but 
taking advantage of a Scala's built-in construction and cleaner syntax.

Original issue reported on code.google.com by [email protected] on 9 Jun 2013 at 5:07

Attachments:

An problem while trying to run simlator

Hi,

I am learning about the simulator and trying to compare with mesos, omega via this simulator,
after i completed installing the related software like openjdk6, scala, python,sbt, and then
run by ./sbt run, but encountered below error:

i am not sure if this is a problem of the simulator tool or my environment, would you please

help me to resolve this error, thanks so much!

root@ubuntu:~/mySimu/cluster-scheduler-simulator/bin# ./sbt run
Getting net.java.dev.jna jna 3.2.3 ...

:: problems summary ::
:::: WARNINGS
Your proxy requires authentication.

    Your proxy requires authentication.

            module not found: net.java.dev.jna#jna;3.2.3

    ==== local: tried

      /root/.ivy2/local/net.java.dev.jna/jna/3.2.3/ivys/ivy.xml

    ==== typesafe-ivy-releases: tried

      http://repo.typesafe.com/typesafe/ivy-releases/net.java.dev.jna/jna/3.2.3/ivys/ivy.xml

    ==== Maven Central: tried

      http://repo1.maven.org/maven2/net/java/dev/jna/jna/3.2.3/jna-3.2.3.pom

    ==== sonatype-snapshots: tried

      https://oss.sonatype.org/content/repositories/snapshots/net/java/dev/jna/jna/3.2.3/jna-3.2.3.pom

            ::::::::::::::::::::::::::::::::::::::::::::::

            ::          UNRESOLVED DEPENDENCIES         ::

            ::::::::::::::::::::::::::::::::::::::::::::::

            :: net.java.dev.jna#jna;3.2.3: not found

            ::::::::::::::::::::::::::::::::::::::::::::::

:::: ERRORS
Server access Error: Connection timed out url=https://oss.sonatype.org/content/repositories/snapshots/net/java/dev/jna/jna/3.2.3/jna-3.2.3.pom

:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
unresolved dependency: net.java.dev.jna#jna;3.2.3: not found
Error during sbt execution: Error retrieving required libraries
(see /root/.sbt/boot/update.log for complete log)
Error: Could not retrieve jna 3.2.3

Security Policy violation Binary Artifacts

This issue was automatically created by Allstar.

Security Policy Violation
Project is out of compliance with Binary Artifacts policy: binaries present in source code

Rule Description
Binary Artifacts are an increased security risk in your repository. Binary artifacts cannot be reviewed, allowing the introduction of possibly obsolete or maliciously subverted executables. For more information see the Security Scorecards Documentation for Binary Artifacts.

Remediation Steps
To remediate, remove the generated executable artifacts from the repository.

Artifacts Found

  • bin/sbt-launch-0.11.3-2.jar
  • sbt/sbt-launch-0.13.7.jar

Additional Information
This policy is drawn from Security Scorecards, which is a tool that scores a project's adherence to security best practices. You may wish to run a Scorecards scan directly on this repository for more details.


Allstar has been installed on all Google managed GitHub orgs. Policies are gradually being rolled out and enforced by the GOSST and OSPO teams. Learn more at http://go/allstar

This issue will auto resolve when the policy is in compliance.

Issue created by Allstar. See https://github.com/ossf/allstar/ for more information. For questions specific to the repository, please contact the owner or maintainer.

Issue right at the start

Hi,

I'm hitting a roadblock even attempting to initially run 'bin/sbt run'. I get the following message:

error: error while loading AnnotatedElement, class file '/Library/Java/JavaVirtualMachines/jdk1.8.0_91.jdk/Contents/Home/jre/lib/rt.jar(java/lang/reflect/AnnotatedElement.class)' is broken
(bad constant pool tag 18 at byte 76)
error: error while loading CharSequence, class file '/Library/Java/JavaVirtualMachines/jdk1.8.0_91.jdk/Contents/Home/jre/lib/rt.jar(java/lang/CharSequence.class)' is broken
(bad constant pool tag 18 at byte 10)
[error] Type error in expression
Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? r

I searched the problem and found that Java 8 may be incompatible, however, upon downgrading to 7, I had even more errors.

Any clues as to what is wrong?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.