peelframework / peel Goto Github PK

Peel is a framework that helps you to define, execute, analyze, and share experiments for distributed systems and algorithms.

Home Page: http://peel-framework.org

License: Apache License 2.0

Scala 75.13% HTML 20.01% Shell 4.12% Java 0.74%

peel's Introduction

Peel Experiments Execution Framework

Peel is a framework that helps you to define, execute, analyze, and share experiments for distributed systems and algorithms.

For more information and technical documentation about the project, please visit peel-framework.org.

Check the Motivation section on our website to understand the problems Peel will solve for you.

Check the Getting Started guide and the Bundle Basics section.

Main Features

Peel offers the following features for your experiments.

Unified Design. Specify and maintain collections of experiments using a simple, DI-based configuration.
Automated Execution. Automate the experiment execution lifecycle.
Automated Analysis. Extracts, transforms, and loads results into an RDBMS.
Result Sharing. Share your bundles and migrating to other evaluation environments without additional effort.

Supported Systems

System	Version	System bean ID
HDFS	1.2.1	`hdfs-1.2.1`
HDFS	2.4.1	`hdfs-2.4.1`
HDFS	2.7.1	`hdfs-2.7.1`
HDFS	2.7.2	`hdfs-2.7.2`
HDFS	2.7.3	`hdfs-2.7.3`
HDFS	2.8.0	`hdfs-2.8.0`
Flink	0.8.0	`flink-0.8.0`
Flink	0.8.1	`flink-0.8.1`
Flink	0.9.0	`flink-0.9.0`
Flink	0.10.0	`flink-0.10.0`
Flink	0.10.1	`flink-0.10.1`
Flink	0.10.2	`flink-0.10.2`
Flink	1.0.0	`flink-1.0.0`
Flink	1.0.1	`flink-1.0.1`
Flink	1.0.2	`flink-1.0.2`
Flink	1.0.3	`flink-1.0.3`
Flink	1.1.0	`flink-1.1.0`
Flink	1.1.1	`flink-1.1.1`
Flink	1.1.2	`flink-1.1.2`
Flink	1.1.3	`flink-1.1.3`
Flink	1.1.4	`flink-1.1.4`
Flink	1.2.0	`flink-1.2.0`
Flink	1.2.1	`flink-1.2.1`
Flink	1.3.0	`flink-1.3.0`
Flink	1.3.1	`flink-1.3.1`
Flink	1.3.2	`flink-1.3.2`
Flink	1.4.0	`flink-1.4.0`
MapReduce	1.2.1	`mapred-1.2.1`
MapReduce	2.4.1	`mapred-2.4.1`
Spark	1.3.1	`spark-1.3.1`
Spark	1.4.0	`spark-1.4.0`
Spark	1.4.1	`spark-1.4.1`
Spark	1.5.1	`spark-1.5.1`
Spark	1.5.2	`spark-1.5.2`
Spark	1.6.0	`spark-1.6.0`
Spark	1.6.2	`spark-1.6.2`
Spark	2.0.0	`spark-2.0.0`
Spark	2.0.1	`spark-2.0.1`
Spark	2.0.2	`spark-2.0.2`
Spark	2.1.0	`spark-2.1.0`
Spark	2.1.1	`spark-2.1.1`
Spark	2.2.0	`spark-2.2.0`
Spark	2.2.1	`spark-2.2.1`
Zookeeper	3.4.5	`zookeeper-3.4.5`
Dstat	0.7.2	`dstat-0.7.2`
Dstat	0.7.3	`dstat-0.7.3`

Supported Commands

Command	Description
`db:import`	import suite results into an initialized database
`db:initialize`	initialize results database
`exp:run`	execute a specific experiment
`exp:setup`	set up systems for a specific experiment
`exp:teardown`	tear down systems for a specific experiment
`exp:config`	list the configuration of a specific experiment
`hosts:generate`	generate a hosts.conf file
`res:archive`	archive suite results to a tar.gz
`res:extract`	extract suite results from a tar.gz
`rsync:pull`	pull bundle from a remote location
`rsync:push`	push bundle to a remote location
`suite:run`	execute all experiments in a suite
`sys:setup`	set up a system
`sys:teardown`	tear down a system
`val:hosts`	validates correct hosts setup

peel's People

Contributors

Stargazers

Watchers

peel's Issues

Create Jekyll website

Create a static Jekyll website for the Peel 1.0.0 release.

Check the work in progress here.

Design

We use a cutstomized Foundation download based on ~~this Paletton palette~~ this Paletton palette. The colors are mapped as follows:

Paletton primary color ↦ Foundation primary color
Paletton primary color ↦ Foundation success color
Paletton secondary color ↦ Foundation secondary color
Paletton complement color ↦ Foundation alert color

Structure & Layout

The structure and contents are based on the Peel Manual.

Development & Local Preview

See the GitHub Jekyll documentation for instructions how to run and preview the website locally.

Fun with Linux

You'll need Ruby 2.0+ (bin + dev packages) as well as the bundler package in order to run as the GitHub documentation suggests. If you (like @aalexandrov) are running Ubuntu 14.04 LTS, you will find out that merely executing

sudo apt-get install ruby2.0 ruby2.0-dev bundler

doesn't really help due to a bug in the ruby2.0 packaging. To solve the problem do as suggested by agent-8131 after that.

You might also be asked to require a JavaScript runtime from this list, e.g. with

sudo apt-get install nodejs

Upon that, you should be able to install the required Ruby packages for local preview by running

sudo bundle install

You also need to install some gems:

gem install jekyll-redirect-from

from the gh-pages root folder.

Local Preview

Once you've managed to install the required Ruby bundles, you can preview the website locally with

bundle exec jekyll serve

Backlog

Fallback the main navigation to off-canvas for mobile (@andi3);
Fix the position of the manual navigation for browser layout (@andi3);
Add prev / next navigation to the manual (@andi3);
Add the missing content in the examples sections (~~@carabolic~~ @aalexandrov);
Create a blog post with a release announcement (@asteriosk, @aalexandrov);
Create a blog post with parallel data generation basics (@aalexandrov, @tilmannrabl ?);
Create social media accounts;
Add Google Analytics tracking (@andi3);
Impressum / Legal / Sponsors (@asteriosk);

Guidelines for collaborators:

Base your changes on the gh-pages branch;
One commit per TODO;
Push to your fork and post a link here;
I will be merging on a daily basis;

Async/bulk copy for distributed log collections

Currently, checking the current offset and transferring the logs is issued sequentially with independent ssh commands. As ssh commands are pretty heavy weight, we should investigate if we replace the current implementations with:

pssh command, but I'm not sure if the implementation is more efficient than establishing the independent ssh commands separately.
Asynchronously issue the ssh commands (as Futures). This -- maybe -- can also be seen as an addition to option 1, if it turns out pssh is more efficient.

Typo in peel example code on motivation page

Under solution on http://peel-framework.org/manual/motivation.html

-DarchetypeVersion=1.0-SNAPSHOT
needs to be
-DarchetypeVersion=1.0.0-rc1

Spark archive can't be extracted

So far the Spark archive (spark-1.0.0-bin-hadoop1.tgz) can't be extracted automatically and has to be extracted manually before running peel.

All Flink jobs not finishing successfully

With the current Flink version all jobs that are running from peel are not finishing successfully.
The reason is that Flink does not support the printing of the plan from the CLI anymore.

The method

override def isSuccessful = state.plnExitCode.getOrElse(-1) == 0 && state.runExitCode.getOrElse(-1) == 0

in FlinkExperiment will always return false.

A possible solution would be to change isSuccessful to only check for correct runExitCode and remove the check for plnExitCode.

Add file copy between two HDFS instances

This enables us to load data sets directly from static HDFS instances to the one used in the experiments. Currently we only have facilities to copy files from local, which is especially limiting in cases the data is to big to fit on a single machine.

As I see it there are basically two options:

Introduce a new method to HDFSFileSystem and new HDFSCopiedDataset
Rewrite copy on local for HDFSFileSystem to use the appropriate cmd depending on the provided URI

I would prefer the second solution.

Versioning Issue in archetype pom and maven central

Follwoing the instructions on the website does not work and leads to the following error when running mvn clean deploy

[ERROR] Failed to execute goal on project peel-bundle-peelextensions: 
Could not resolve dependencies for project com.acme.peel:peel-bundle-peelextensions:jar:1.0-SNAPSHOT: 
Could not find artifact org.peelframework:peel-core:jar:1.0-SNAPSHOT

The version of the module that is deployed at maven central is 1.0.0-beta.

Integrate Apache REEF into Peel

Apache REEF aims to be a standard library for developing distributed systems. Running experiments with REEF is on the critical path of at least two DIMA projects. So we may want to integrate it into Peel to ease experimenting.

Sanity check for password-less ssh

Virtually all systems depend on a password-less ssh connection, and this is also one of the most common causes for error. A sanity check for password-less ssh with proper log output can be either integrated in the beginning of most commands or as a special command.

Update Peel Documentation

A work in progress draft of the documentation is available here.

HDFS to HDFS copy does not work for different block sizes

When moving data from one HDFS instance to the other, the following distcp command

bin/hadoop distcp $src $dst

will fail due to bad CRC checks if the source and destination HDFS installations have different block size.

To fix the issue one has to do

bin/hadoop distcp -update -skipcrccheck  $src $dst

I think we should fix this in the HDFS2 system.

Fix error when running ./peel analyse without --skipInstances argument

Currently ./peel analyse results in an error if the argument --skipInstances is not used when analysing the results of Flink experiments. This is most likely due to the way the Flink logs are being parsed.

Improvements for the "Getting Started" page

I tried to follow the steps on the Getting Started page but couldn't get it to run. I think for potential new users it is important that all these steps work out of the box.
What I have observed so far:

Binary archive is not there (http://peel-framework.org/peel-empty-bundle-1.0-SNAPSHOT.tar.gz).
Version in the archetype pom.xml file does not fit the dependencies that are available in the maven repo (#60 should fix that).
Following the archetype setup, the command ./peel.sh suite:run wordcount.default fails with

No configuration setting found for key 'system.spark.config.defaults.spark.local'

Additionally, when I first tried to issue the command using the described steps (and variables for groupid, artifactid and root-package) it failed because it couldn't find a fitting flink-constructur. After using the archetype command from the Motivation page (bottom) it worked...

Add Contributer/Developer guide/help page to website

I think it would be helpful to add a contributers/developers page to the docs on the website.
This should explain how to setup a development environment for Peel. how to add Systems etc. and how to setup debugging for example in Intellij.

Distribution sampling for Zipf and Binomial is too slow

The commons-math3 distributions used in the reference data generator in the archetypes are really slow.

During a local test of an experiment suite on which I am working with @ggevay I am observing the following numbers for generating dataset.A with key cardinality 100000:

with Uniform key distribution, the job takes ~ 5 seconds
with Binomial key distribution, the job takes ~ 25 seconds
with Zipfian key distribution, the datagen job exceeded the allowed limit of 600 seconds.

The fix should be pushed to the peel-wordcount repository (see peelframework/peel-wordcount#1).

Remove unexpected files that appear in peel-analyser folder of peel-bundle

When peel is built, a peel-bundle is created in the target of peel-dist. The bundle contains a folder peel-analyser which will contain the database that is created when calling ./peel analyse. However, when building peel, this folder (which should probably be empty) contains the following files:

ExperimentRuns.jrxml
TaskInstancesHistogramm.jrxml
TestReport.jrxml
TwitterMingliangAllExperiments.jrxml
ReportExternalClass.jrxml
TwitterMingliang.jrxml

mvn clean deploy fails for archetype-generated project

I followed the instructions from the getting started guide.
After creating the archetype, mvn deploy failed with

[ERROR] /home/peel/pserver/src/peel-bundle/peel-bundle-peelextensions/src/main/scala/com/acme/benchmarks/example/cli/command/QueryRuntimes.scala:78: No implicit view available from String => anorm.ParameterValue.
[ERROR]         .on("suite" -> suite)
[ERROR]                     ^

(I am on wally with open jdk java 1.7)

Auto-download systems

For system archives that are publicly accessible, there should be a way to specify their location as a URL (together with an MD5 hash) and auto-download them as part of the setup phase of the execution lifecycle.

Prepare pom.xml files for Maven Central

In order to push peel-core and peel-extensions to the Maven Central repository, we need to follow the guidelines from the following Sonatype guides:

I already opened a JIRA issue requesting access to the eu.stratosphere group according to the instructions in (1).

I expect this to be resolved later this week, but we can already start aligning the pom.xml files according to the guidelines in (2).

Delete dataset folder if datageneration job fails

Keep ExperimentsDefinitions.scala in source form

Currently, ExperimentsDefinitions.scala is and packaged as part of the extensions package.

I suggest to move it as a resource and keep it in source format when packaging the bundle binary.

We can use twitter-util or Scala reflection to compile and load the class programatically at runtime. This should give the user more flexibility to do some last moment changes on the server side after running rsync:push (which is already possible using the XML syntax).

Integrate arithmetic expressions for experiments

This can be done in the following steps:

Define a "bracket" syntax for the complex expressions, e. g.

<% ${foo.bar} * ${foo.baz} %>

Wrap the cb.resolve() method in the config package in a private def resolve() in the same package.
Extend the resolve method - use the ScriptEngineManager to evaluate <% ... %> expressions on config resolve.

The issue is up for grabs.

JOB Lifespan

What do you think about a JOB lifespan? For my experiments on fault tolerance I need Flink to be set up freshly before every job run, since I am killing nodes during job execution. Is that something other people might need as well?

Add factory bean for experiments

Suites of scale-out experiments contain entries with similar structure:

...
<bean id="foobar.exp" class="eu.stratosphere.peel.core.beans.experiment.ExperimentSuite">
    <constructor-arg name="experiments">
        <list>
            <!-- 10 nodes -->
            <bean parent="experiment.foobar">
                <constructor-arg name="name" value="dop80"/>
                <constructor-arg name="config">
                    <value>
                            experiment.timeout = 3600
                            system.default.config.slaves = ${system.default.config.slaves.010}
                            system.default.config.parallelism.total = 80
                    </value>
                </constructor-arg>
            </bean>
            <!-- 20 nodes -->
            <bean parent="experiment.foobar">
                <constructor-arg name="name" value="dop160"/>
                <constructor-arg name="config">
                    <value>
                            experiment.timeout = 3600
                            system.default.config.slaves = ${system.default.config.slaves.020}
                            system.default.config.parallelism.total = 160
                    </value>
                </constructor-arg>
            </bean>
            <!-- 40 nodes -->
            <bean parent="experiment.foobar">
                <constructor-arg name="name" value="dop320"/>
                <constructor-arg name="config">
                    <value>
                            experiment.timeout = 3600
                            system.default.config.slaves = ${system.default.config.slaves.040}
                            system.default.config.parallelism.total = 320
                    </value>
                </constructor-arg>
            </bean>
            <!-- 80 nodes -->
            <bean parent="experiment.foobar">
                <constructor-arg name="name" value="dop640"/>
                <constructor-arg name="config">
                    <value>
                            experiment.timeout = 3600
                            system.default.config.slaves = ${system.default.config.slaves.080}
                            system.default.config.parallelism.total = 640
                    </value>
                </constructor-arg>
            </bean>
        </list>
    </constructor-arg>
</bean>
...

We can minimize the XML code required to specify such experiments and improve readability using a dedicated experiment sequence bean. The above code will then look like:

...
<bean id="foobar.exp" class="eu.stratosphere.peel.core.beans.experiment.ExperimentSuite">
    <constructor-arg name="experiments">
        <list>
            <bean class="eu.stratosphere.peel.core.beans.experiment.ExperimentSequence">
                <constructor-arg name="parameters">
                    <map>
                        <entry key="dop"/>
                            <list>
                                <value>10</value>
                                <value>20</value>
                                <value>40</value>
                                <value>80</value>
                            </list>
                        </entry>
                    </map>
                </constructor-arg>
                <constructor-arg name="name" value="dop<% ${parameter.dop} * ${runtime.cpu.cores} %>"/>
                <constructor-arg name="config">
                    <value>
                            experiment.timeout = 3600
                            system.default.config.slaves = ${system.default.config.slaves.${parameter.dop}}
                            system.default.config.parallelism.total = ${parameter.dop} * ${runtime.cpu.cores}
                    </value>
                </constructor-arg>
            </bean>
        </list>
    </constructor-arg>
</bean>
...

This issue depends on #15.

Bag 'fat-jar' profile for datagen module in archetypes

The fat-jar profile in the *-datagen modules in all archetypes is missing the following entries:

    <profiles>
        <profile>
            <id>fat-jar</id>
            <activation>
            </activation>
            <dependencies>
                <!--Scala -->
                <dependency>
                    <groupId>org.scala-lang</groupId>
                    <artifactId>scala-library</artifactId>
                    <version>${scala.version}</version>
                    <scope>compile</scope>
                </dependency>
                <dependency>
                    <groupId>org.scala-lang</groupId>
                    <artifactId>scala-compiler</artifactId>
                    <version>${scala.version}</version>
                    <scope>compile</scope>
                </dependency>
                <dependency>
                    <groupId>org.scala-lang</groupId>
                    <artifactId>scala-reflect</artifactId>
                    <version>${scala.version}</version>
                    <scope>compile</scope>
                </dependency>

                <!-- current dependencies (Flink) -->
            </dependencies>
        </profile>
    </profiles>

Without that, running the WordCountGenerator main method from the IDE throws an error.

Peel Release Planning

Zookeeper does not initialize myid file

Currently zookeeper does not initialize the myid file on each node and therefore setup fails on cluster setup.

Problems with wc:query-runtimes

When I try to use the wc:query-runtimes function peel provides, after initializing and updating it with the data like mentioned here: http://peel-framework.org/getting-started.html
I get the following error, which basicly states, that the function "MEDIAN" is not supported:

Unexpected error: Function "MEDIAN" not found; SQL statement:
SELECT e.suite as suite,
e.name as name,
MEDIAN(er.time) as median_time,
MIN(er.time) as min_time,
MAX(er.time) as max_time
FROM experiment e,
experiment_run er
WHERE e.id = er.experiment_id
AND e.suite = ?
GROUP BY e.suite, e.name
ORDER BY e.suite, e.name [90022-187]
org.h2.jdbc.JdbcSQLException: Function "MEDIAN" not found; SQL statement:
SELECT e.suite as suite,
e.name as name,
MEDIAN(er.time) as median_time,
MIN(er.time) as min_time,
MAX(er.time) as max_time
FROM experiment e,
experiment_run er
WHERE e.id = er.experiment_id
AND e.suite = ?
GROUP BY e.suite, e.name
ORDER BY e.suite, e.name [90022-187]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
at org.h2.message.DbException.get(DbException.java:179)
at org.h2.message.DbException.get(DbException.java:155)
at org.h2.command.Parser.readJavaFunction(Parser.java:2351)
at org.h2.command.Parser.readFunction(Parser.java:2403)
at org.h2.command.Parser.readTerm(Parser.java:2737)
at org.h2.command.Parser.readFactor(Parser.java:2259)
at org.h2.command.Parser.readSum(Parser.java:2246)
at org.h2.command.Parser.readConcat(Parser.java:2216)
at org.h2.command.Parser.readCondition(Parser.java:2066)
at org.h2.command.Parser.readAnd(Parser.java:2038)
at org.h2.command.Parser.readExpression(Parser.java:2030)
at org.h2.command.Parser.parseSelectSimpleSelectPart(Parser.java:1942)
at org.h2.command.Parser.parseSelectSimple(Parser.java:1974)
at org.h2.command.Parser.parseSelectSub(Parser.java:1868)
at org.h2.command.Parser.parseSelectUnion(Parser.java:1689)
at org.h2.command.Parser.parseSelect(Parser.java:1677)
at org.h2.command.Parser.parsePrepared(Parser.java:433)
at org.h2.command.Parser.parse(Parser.java:305)
at org.h2.command.Parser.parse(Parser.java:277)
at org.h2.command.Parser.prepareCommand(Parser.java:242)
at org.h2.engine.Session.prepareLocal(Session.java:461)
at org.h2.engine.Session.prepareCommand(Session.java:403)
at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1189)
at org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:72)
at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:277)
at anorm.SimpleSql.getFilledStatement(SimpleSql.scala:59)
at anorm.SimpleSql$$anonfun$preparedStatement$1.apply(SimpleSql.scala:70)
at anorm.SimpleSql$$anonfun$preparedStatement$1.apply(SimpleSql.scala:70)
at resource.DefaultManagedResource.open(AbstractManagedResource.scala:106)
at resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:85)
at resource.ManagedResourceOperations$$anon$2.acquireFor(ManagedResourceOperations.scala:40)
at resource.DeferredExtractableManagedResource.acquireFor(AbstractManagedResource.scala:27)
at resource.ManagedResourceOperations$class.acquireAndGet(ManagedResourceOperations.scala:25)
at resource.DeferredExtractableManagedResource.acquireAndGet(AbstractManagedResource.scala:24)
at anorm.Sql$$anonfun$asTry$1.apply(Anorm.scala:302)
at anorm.Sql$$anonfun$asTry$1.apply(Anorm.scala:302)
at scala.util.Try$.apply(Try.scala:161)
at anorm.Sql$.asTry(Anorm.scala:302)
at anorm.WithResult$class.as(SqlResult.scala:120)
at anorm.SimpleSql.as(SimpleSql.scala:6)
at de.tuberlin.cit.experiments.iterations.cli.command.QueryRuntimes.run(QueryRuntimes.scala:87)
at org.peelframework.core.cli.Peel$.main(Peel.scala:104)
at org.peelframework.core.cli.Peel.main(Peel.scala)

Migrate to Java 8

We have to make sure that we consistently ask for Java 8 in all documentation and configuration for the project.

Create archetype for versioned bundles

Make this aligned with the command listed in the "Getting Started" guide.

The archetype should bootstrap the running example from the manual. You can use the tamara-msc-thesis bundle as a basis.

React earlier for parsing errors

Currently when the command can't be parsed, peel simple throw an Exception. The Exception should be catched earlier. Otherwise it will produce meaningless error message.

Introduce init/delete for blog storage in flink-0.9.0

Similar to the logic for the tmp directories used for spilling data, we need to integrate the logic to create/delete the directories used for the blob storage introduced in flink 0.9.0.

Add System and SingleJobExperiment implementations for Spark.

For first, focus only on Spark-1.0.0 and Hadoop 1.x.

Add system implementations

Add System and ExperimentRunner implementations for the following systems:

Local FileSystem
HDFS (v1, v2)
Hadoop MapReduce (v1, v2)
Stratosphere (v0.4, v0.5)

Zookeeper teardown doesn't tear down the system (probably isRunning() not working correctly)

When tearing down zookeeper, peel thinks that the system is already down.

Better error handling for HDFS data dir initialization

We're seeing a lot of 255 exit code errors on that. These are usually caused by some rights restrictions and logged properly in the HDFS logs. A try-catch block with a logger.error hint might be helpful.

Add 'hosts:generate' command

Add a Peel command that generates hosts.conf files.

The command should take the following parameters

master - the master URL
For the slaves URL generation (with Arparse4J option groups):
- either
  - slaves-pattern a URL pattern for the hosts (required), e.g. "cloud-%d.dima.tu-berlin.de
  - slaves-include a [min, max] range of numbers (required, can be more than one)
  - slaves-exlucde a [min, max] range of excludes (optional, can be more than one)
- or
  - slaves-file a file containing a pre-defined list of hosts
unit-size For deciding the unit size a in the geometric progression an^i of subhosts
A parallelism parameter indicating the default degree of parallelism per node
A memory parameter indicating the total available memory per node (in kB)

The generated file should look along these lines


###############################################################################
# Auto-generated hosts.conf
# Input command was:
#
# hosts:generate                           \
#  --master "master.acme.org"              \
#  --slaves-pattern "slave-%03d.acme.org"  \
#  --slaves-include [32,54]                \
#  --slaves-exclude [37,37]                \
#  --slaves-exclude [41,42]                \
#  --parallelism 8                         \
#  --memory 16426568                       \
#  --unit-size 5
#
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#

env {
    masters = ["master.acme.org"]
    parallelism.per-node = 8
    memory.per-node = 16426568
    slaves = {
        # all slaves
        all = {
            total = {
                hosts = 20
                parallelism = 160
                memory = 328531360
            }
            hosts = [
                "slave-032.acme.org",
                "slave-033.acme.org",
                "slave-034.acme.org",
                "slave-035.acme.org",
                "slave-036.acme.org",
                "slave-038.acme.org",
                "slave-039.acme.org",
                "slave-040.acme.org",
                "slave-043.acme.org",
                "slave-044.acme.org",
                "slave-045.acme.org",
                "slave-046.acme.org",
                "slave-047.acme.org",
                "slave-048.acme.org",
                "slave-049.acme.org",
                "slave-050.acme.org",
                "slave-051.acme.org",
                "slave-052.acme.org",
                "slave-053.acme.org",
                "slave-054.acme.org",
            ]
        }
        # only the top 5 hosts
        top005 = {
            total = {
                hosts = 5
                parallelism = 40
                memory = 82132840
            }
            hosts = [
                "slave-032.acme.org",
                "slave-033.acme.org",
                "slave-034.acme.org",
                "slave-035.acme.org",
                "slave-036.acme.org",
            ]
        }
        # only the top 10 hosts
        top010 = {
            total = {
                hosts = 10
                parallelism = 80
                memory = 164265680
            }
            hosts = [
                "slave-032.acme.org",
                "slave-033.acme.org",
                "slave-034.acme.org",
                "slave-035.acme.org",
                "slave-036.acme.org",
                "slave-032.acme.org",
                "slave-033.acme.org",
                "slave-034.acme.org",
                "slave-035.acme.org",
                "slave-036.acme.org",
                "slave-038.acme.org",
                "slave-039.acme.org",
                "slave-040.acme.org",
                "slave-043.acme.org",
                "slave-044.acme.org",
            ]
        }
        # only the top 20 hosts
        top020 = {
            total = {
                hosts = 20
                parallelism = 160
                memory = 328531360
            }
            hosts = [
                "slave-032.acme.org",
                "slave-033.acme.org",
                "slave-034.acme.org",
                "slave-035.acme.org",
                "slave-036.acme.org",
                "slave-038.acme.org",
                "slave-039.acme.org",
                "slave-040.acme.org",
                "slave-043.acme.org",
                "slave-044.acme.org",
                "slave-045.acme.org",
                "slave-046.acme.org",
                "slave-047.acme.org",
                "slave-048.acme.org",
                "slave-049.acme.org",
                "slave-050.acme.org",
                "slave-051.acme.org",
                "slave-052.acme.org",
                "slave-053.acme.org",
                "slave-054.acme.org",
            ]
        }
    }
}

Compiler Error When Using zinc

Error

When using zinc server for incremental builds Peel does not compile due some problem with org/peelframework/core/cli/command/rsync/package$FolderEntry with the following error message:

how can getCommonSuperclass() do its job if different class symbols get the same bytecode-level internal name: org/peelframework/core/cli/command/rsync/package$FolderEntry

This seems to be a know issue of the Scala compiler.

Reproduce

To reproduce this bug, simply start a zinc server with zinc start followed by mvn compile. The compilation might be started twice to trigger this error.

Fix

To fix this we might do one of the following

use a newer scala compiler (>= 2.11.0) or
move the FolderEntry case class from the package object to a utility module.

Full error message:

carabolic ~/D/r/peel$ mvn compile
[INFO] Scanning for projects...
[INFO] Inspecting build with total of 8 modules...
[INFO] Installing Nexus Staging features:
[INFO]   ... total of 8 executions of maven-deploy-plugin replaced with nexus-staging-maven-plugin
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO] 
[INFO] peel
[INFO] peel-archetypes
[INFO] peel-flinkspark-bundle
[INFO] peel-flink-bundle
[INFO] peel-spark-bundle
[INFO] peel-core
[INFO] peel-extensions
[INFO] peel-empty-bundle
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building peel 1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- build-helper-maven-plugin:1.7:add-source (add-source) @ peel-parent ---
[INFO] Source directory: /Users/carabolic/Development/repos/peel/src/main/scala added.
[INFO] 
[INFO] --- scala-maven-plugin:3.1.6:compile (scala-compile-first) @ peel-parent ---
[INFO] No sources to compile
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building peel-archetypes 1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- build-helper-maven-plugin:1.7:add-source (add-source) @ peel-archetypes ---
[INFO] Source directory: /Users/carabolic/Development/repos/peel/peel-archetypes/src/main/scala added.
[INFO] 
[INFO] --- scala-maven-plugin:3.1.6:compile (scala-compile-first) @ peel-archetypes ---
[INFO] No sources to compile
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building peel-flinkspark-bundle 1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- build-helper-maven-plugin:1.7:add-source (add-source) @ peel-flinkspark-bundle ---
[INFO] Source directory: /Users/carabolic/Development/repos/peel/peel-archetypes/peel-flinkspark-bundle/src/main/scala added.
[INFO] 
[INFO] --- maven-resources-plugin:2.7:resources (default-resources) @ peel-flinkspark-bundle ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 32 resources
[INFO] 
[INFO] --- scala-maven-plugin:3.1.6:compile (scala-compile-first) @ peel-flinkspark-bundle ---
[INFO] No sources to compile
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building peel-flink-bundle 1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- build-helper-maven-plugin:1.7:add-source (add-source) @ peel-flink-bundle ---
[INFO] Source directory: /Users/carabolic/Development/repos/peel/peel-archetypes/peel-flink-bundle/src/main/scala added.
[INFO] 
[INFO] --- maven-resources-plugin:2.7:resources (default-resources) @ peel-flink-bundle ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 30 resources
[INFO] 
[INFO] --- scala-maven-plugin:3.1.6:compile (scala-compile-first) @ peel-flink-bundle ---
[INFO] No sources to compile
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building peel-spark-bundle 1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- build-helper-maven-plugin:1.7:add-source (add-source) @ peel-spark-bundle ---
[INFO] Source directory: /Users/carabolic/Development/repos/peel/peel-archetypes/peel-spark-bundle/src/main/scala added.
[INFO] 
[INFO] --- maven-resources-plugin:2.7:resources (default-resources) @ peel-spark-bundle ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 30 resources
[INFO] 
[INFO] --- scala-maven-plugin:3.1.6:compile (scala-compile-first) @ peel-spark-bundle ---
[INFO] No sources to compile
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building peel-core 1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- build-helper-maven-plugin:1.7:add-source (add-source) @ peel-core ---
[INFO] Source directory: /Users/carabolic/Development/repos/peel/peel-core/src/main/scala added.
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ peel-core ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ peel-core ---
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- scala-maven-plugin:3.1.6:compile (scala-compile-first) @ peel-core ---
[INFO] Using zinc server for incremental compilation
[info] Compiling 60 Scala sources to /Users/carabolic/Development/repos/peel/peel-core/target/classes...
[error] uncaught exception during compilation: java.lang.AssertionError
java.lang.AssertionError: assertion failed: 
     while compiling: /Users/carabolic/Development/repos/peel/peel-core/src/main/scala/org/peelframework/core/util/shell.scala
        during phase: jvm
     library version: version 2.10.5
    compiler version: version 2.10.5
  reconstructed args: -classpath [!!REDACTED!!] -feature

  last tree to typer: Literal(Constant(java.nio.channels.WritableByteChannel))
              symbol: null
   symbol definition: null
                 tpe: Class(classOf[java.nio.channels.WritableByteChannel])
       symbol owners: 
      context owners: anonymous class anonfun$1 -> package util

== Enclosing template or block ==

Template( // val <local $anonfun>: <notype>, tree.tpe=org.peelframework.core.util.anonfun$1
  "scala.runtime.AbstractFunction0", "scala.Serializable" // parents
  ValDef(
    private
    "_"
    <tpt>
    <empty>
  )
  // 4 statements
  DefDef( // final def apply(): String
    <method> final <triedcooking>
    "apply"
    []
    List(Nil)
    <tpt> // tree.tpe=String
    Apply( // def s(args: Seq): String in class StringContext, tree.tpe=String
      new StringContext(scala.this.Predef.wrapRefArray(Array[String]{"# ", ""}.$asInstanceOf[Array[Object]]()))."s" // def s(args: Seq): String in class StringContext, tree.tpe=(args: Seq)String
      Apply( // implicit def genericWrapArray(xs: Object): collection.mutable.WrappedArray in class LowPriorityImplicits, tree.tpe=collection.mutable.WrappedArray
        scala.this."Predef"."genericWrapArray" // implicit def genericWrapArray(xs: Object): collection.mutable.WrappedArray in class LowPriorityImplicits, tree.tpe=(xs: Object)collection.mutable.WrappedArray
        ArrayValue( // tree.tpe=Array[Object]
          <tpt> // tree.tpe=Object
          List(
            Apply( // final def format(x$1: java.util.Date): String in class DateFormat, tree.tpe=String
              TimeStamps$$anonfun$1.this."dateFormat$1"."format" // final def format(x$1: java.util.Date): String in class DateFormat, tree.tpe=(x$1: java.util.Date)String
              Apply( // def <init>(): java.util.Date in class Date, tree.tpe=java.util.Date
                new java.util.Date."<init>" // def <init>(): java.util.Date in class Date, tree.tpe=()java.util.Date
                Nil
              )
            )
          )
        )
      )
    )
  )
  DefDef( // final def apply(): Object
    <method> final <bridge>
    "apply"
    []
    List(Nil)
    <tpt> // tree.tpe=Object
    Apply( // final def apply(): String, tree.tpe=String
      TimeStamps$$anonfun$1.this."apply" // final def apply(): String, tree.tpe=()String
      Nil
    )
  )
  ValDef( // private[this] val dateFormat$1: java.text.SimpleDateFormat
    private <local> <synthetic> <paramaccessor> <triedcooking>
    "dateFormat$1"
    <tpt> // tree.tpe=java.text.SimpleDateFormat
    <empty>
  )
  DefDef( // def <init>(arg$outer: org.peelframework.core.util.OutputStreamProcessLogger,dateFormat$1: java.text.SimpleDateFormat): org.peelframework.core.util.anonfun$1
    <method> <triedcooking>
    "<init>"
    []
    // 1 parameter list
    ValDef( // $outer: org.peelframework.core.util.OutputStreamProcessLogger
      <param>
      "$outer"
      <tpt> // tree.tpe=org.peelframework.core.util.OutputStreamProcessLogger
      <empty>
    )
    ValDef( // dateFormat$1: java.text.SimpleDateFormat
      <param> <synthetic> <triedcooking>
      "dateFormat$1"
      <tpt> // tree.tpe=java.text.SimpleDateFormat
      <empty>
    )
    <tpt> // tree.tpe=org.peelframework.core.util.anonfun$1
    Block( // tree.tpe=Unit
      // 2 statements
      Assign( // tree.tpe=Unit
        TimeStamps$$anonfun$1.this."dateFormat$1" // private[this] val dateFormat$1: java.text.SimpleDateFormat, tree.tpe=java.text.SimpleDateFormat
        "dateFormat$1" // dateFormat$1: java.text.SimpleDateFormat, tree.tpe=java.text.SimpleDateFormat
      )
      Apply( // def <init>(): scala.runtime.AbstractFunction0 in class AbstractFunction0, tree.tpe=scala.runtime.AbstractFunction0
        TimeStamps$$anonfun$1.super."<init>" // def <init>(): scala.runtime.AbstractFunction0 in class AbstractFunction0, tree.tpe=()scala.runtime.AbstractFunction0
        Nil
      )
      ()
    )
  )
)

== Expanded type of tree ==

ConstantType(value = Constant(java.nio.channels.WritableByteChannel))

how can getCommonSuperclass() do its job if different class symbols get the same bytecode-level internal name: org/peelframework/core/cli/command/rsync/package$FolderEntry
    at scala.tools.nsc.backend.jvm.GenASM$JBuilder.javaName(GenASM.scala:548)
    at scala.tools.nsc.backend.jvm.GenASM$JBuilder.addInnerClasses(GenASM.scala:637)
    at scala.tools.nsc.backend.jvm.GenASM$JMirrorBuilder.genMirrorClass(GenASM.scala:2978)
    at scala.tools.nsc.backend.jvm.GenASM$AsmPhase.run(GenASM.scala:114)
    at scala.tools.nsc.Global$Run.compileUnitsInternal(Global.scala:1583)
    at scala.tools.nsc.Global$Run.compileUnits(Global.scala:1557)
    at scala.tools.nsc.Global$Run.compileSources(Global.scala:1553)
    at scala.tools.nsc.Global$Run.compile(Global.scala:1662)
    at xsbt.CachedCompiler0.run(CompilerInterface.scala:116)
    at xsbt.CachedCompiler0.run(CompilerInterface.scala:95)
    at xsbt.CompilerInterface.run(CompilerInterface.scala:26)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:101)
    at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:47)
    at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:41)
    at sbt.compiler.AggressiveCompile$$anonfun$3$$anonfun$compileScala$1$1.apply$mcV$sp(AggressiveCompile.scala:97)
    at sbt.compiler.AggressiveCompile$$anonfun$3$$anonfun$compileScala$1$1.apply(AggressiveCompile.scala:97)
    at sbt.compiler.AggressiveCompile$$anonfun$3$$anonfun$compileScala$1$1.apply(AggressiveCompile.scala:97)
    at sbt.compiler.AggressiveCompile.sbt$compiler$AggressiveCompile$$timed(AggressiveCompile.scala:162)
    at sbt.compiler.AggressiveCompile$$anonfun$3.compileScala$1(AggressiveCompile.scala:96)
    at sbt.compiler.AggressiveCompile$$anonfun$3.apply(AggressiveCompile.scala:139)
    at sbt.compiler.AggressiveCompile$$anonfun$3.apply(AggressiveCompile.scala:86)
    at sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:38)
    at sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:36)
    at sbt.inc.IncrementalCommon.cycle(IncrementalCommon.scala:31)
    at sbt.inc.Incremental$$anonfun$1.apply(Incremental.scala:39)
    at sbt.inc.Incremental$$anonfun$1.apply(Incremental.scala:38)
    at sbt.inc.Incremental$.manageClassfiles(Incremental.scala:66)
    at sbt.inc.Incremental$.compile(Incremental.scala:38)
    at sbt.inc.IncrementalCompile$.apply(Compile.scala:26)
    at sbt.compiler.AggressiveCompile.compile2(AggressiveCompile.scala:153)
    at sbt.compiler.AggressiveCompile.compile1(AggressiveCompile.scala:70)
    at com.typesafe.zinc.Compiler.compile(Compiler.scala:201)
    at com.typesafe.zinc.Compiler.compile(Compiler.scala:183)
    at com.typesafe.zinc.Compiler.compile(Compiler.scala:174)
    at com.typesafe.zinc.Main$.run(Main.scala:98)
    at com.typesafe.zinc.Nailgun$.zinc(Nailgun.scala:93)
    at com.typesafe.zinc.Nailgun$.nailMain(Nailgun.scala:82)
    at com.typesafe.zinc.Nailgun.nailMain(Nailgun.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at com.martiansoftware.nailgun.NGSession.run(NGSession.java:280)
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] peel ............................................... SUCCESS [  2.091 s]
[INFO] peel-archetypes .................................... SUCCESS [  0.067 s]
[INFO] peel-flinkspark-bundle ............................. SUCCESS [  1.033 s]
[INFO] peel-flink-bundle .................................. SUCCESS [  0.514 s]
[INFO] peel-spark-bundle .................................. SUCCESS [  0.274 s]
[INFO] peel-core .......................................... FAILURE [ 31.706 s]
[INFO] peel-extensions .................................... SKIPPED
[INFO] peel-empty-bundle .................................. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 37.475 s
[INFO] Finished at: 2016-02-23T14:28:41+01:00
[INFO] Final Memory: 16M/107M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.1.6:compile (scala-compile-first) on project peel-core: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.1.6:compile failed. CompileFailed -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :peel-core

Error when flink.conf file does not contain variable 'system.flink.config.yaml.taskmanager.tmp.dirs'

The flink.reference.conf file does not contain this variable and the Flink.scala class tries to access it (line 35), resulting in an error when it is not defined by the user in the flink.conf file.

Implement cancelJob() methods

The Experiment class provides a cancelJob() method to force job cancellation if the experiment run exceeds the granted time limit.

The method is yet to be implemented in all of the following systems:

Flink
Spark
MapReduce

Wait on system shutdown until every process is down.

The stop() method in most system beans currently just executes a system stop command through the shell and returns.

When multiple experiments configured with the Experiment lifespan use the same system, re-starting might be too fast. The stop() method should therefore explicitly block and wait until all system services are down.

wordcount.default suite fails on second attempt

Following the instruction on the getting started page I could get peel to work and run the wordcount.default suite once (except for the spark part that will fail as mentioned in #61) but when I try it a second time the following error is thrown:

...
15-11-11 15:29:48 [INFO] +-- Loading experiment configuration
15-11-11 15:29:48 [INFO] +-- Loading current runtime values as configuration
15-11-11 15:29:48 [INFO] +-- Loading system properties as configuration
15-11-11 15:29:48 [INFO] `-- Resolving configuration
15-11-11 15:29:48 [INFO] Executing experiments in suite
15-11-11 15:29:48 [INFO] ############################################################
15-11-11 15:29:48 [INFO] Tearing down systems with SUITE lifespan
Unexpected error: No configuration setting found for key 'system'
com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'system'
    at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:147)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
    at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
    at org.peelframework.hadoop.beans.system.HDFS2.isRunning(HDFS2.scala:132)
    at org.peelframework.core.beans.system.System.tearDown(System.scala:84)
    at org.peelframework.core.cli.command.suite.Run$$anonfun$run$3.apply(Run.scala:187)
    at org.peelframework.core.cli.command.suite.Run$$anonfun$run$3.apply(Run.scala:186)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at org.peelframework.core.cli.command.suite.Run.run(Run.scala:186)
    at org.peelframework.core.cli.Peel$.main(Peel.scala:104)
    at org.peelframework.core.cli.Peel.main(Peel.scala)

The issue is reproducible and has been confirmed on a mac and linux machine.

Add controller logic to the experiment / system run() methods.

The Archetype org.peelframework:peel-flink-bundle generates empty directory

When creating a new bundle using mvn archetype:generate as follow:

mvn archetype:generate -B -DarchetypeCatalog=local -Dpackage="org.peelframework.carabolic" -DgroupId="org.peelframework.carabolic" -DartifactId="example" -DarchetypeGroupId=org.peelframework -DarchetypeArtifactId="peel-flink-bundle" -DarchetypeVersion="1.0-SNAPSHOT"

creates an empty peel-wordcount-spark-jobs directory.

├── LICENSE
├── README.md
├── example-bundle
│   ├── pom.xml
│   └── src
├── example-datagens
│   ├── pom.xml
│   └── src
├── example-flink-jobs
│   ├── pom.xml
│   └── src
├── example-peelextensions
│   ├── pom.xml
│   └── src
├── peel-wordcount-spark-jobs
└── pom.xml

Add support for Spark 1.5.2 and 1.6.0

Add Spark bean definitions and reference configs for versions 1.5.2 and 1.6.0.

@harrygav: maybe you can take over this issue with some assistance from @carabolic.

The PRs should look quite similar to this one:

a338539

Peel's LogCollection messed up by NFS?

Peel's LogCollection is responsible for copying the part of the log (of a system) corresponding to a specific run into the results folder. However, when a run is short enough (in my case ~30s) it happens that only 0 bytes are copied for some workers. I could verify it for a custom peel extension as well as for the HDFS2 system running on a cluster of 11 nodes (wally006-017).

I assume that the problem is that the underlying NFS does not keep up with the synchronization of the logs so that the master node has some old state for a log file (maybe client write caching?).

I know that the problem is not Peel specific. However, since Peel's LogCollection assumes a setup with some kind of file synchronization between nodes, this behaviour should (at least) be considered.

Possible fix: Create a similar LogCollection (maybe in parallel to the current one) which relies on copying the log files from workers to master (via scp f.e.). Such a LogCollection would also be useful for system with a lot of log files (since they may otherwise create too much pressure on the NFS).

Migrate systems to DistributedLogCollection

The regular LogCollection logic does not work due to large latencies when reading remotely via NFS.

We already migrated Flink and DStat to the DistributedLogCollection trait, and I suggest to do so with all other systems which produce logs on multiple hosts.