Git Product home page Git Product logo

simianarmy's Introduction

NetflixOSS Lifecycle Build Status License

PROJECT STATUS: RETIRED

The Simian Army project is no longer actively maintained. Some of the Simian Army functionality has been moved to other Netflix projects:

  • A newer version of Chaos Monkey is available as a standalone service.
  • Swabbie is a new standalone service that will replace the functionality provided by Janitor Monkey.
  • Conformity Monkey functionality will be rolled into other Spinnaker backend services.

DESCRIPTION

The Simian Army is a suite of tools for keeping your cloud operating in top form. Chaos Monkey, the first member, is a resiliency tool that helps ensure that your applications can tolerate random instance failures

DETAILS

Please see the wiki.

SUPPORT

Simian Army Google group

Because the project is no longer maintained, there is a good chance that nobody will be able to answer a support question.

LICENSE

Copyright 2012-2016 Netflix, Inc.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

simianarmy's People

Contributors

abasiri avatar alfasin avatar algra avatar archthegit avatar benjchristensen avatar boldfield avatar codiverse avatar coryb avatar dipthegeezer avatar dmourati avatar drgranit avatar ebukoski avatar elegantmush avatar ervansetiawan avatar gnethercutt avatar gorzell avatar ingmarkrusch avatar jeyrschabu avatar justinsb avatar lorin avatar mgeis avatar michaelnflx avatar mlafeldt avatar nicktgr15 avatar pebo avatar radonsky avatar robfletcher avatar robzienert avatar rspieldenner avatar tmack8001 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

simianarmy's Issues

java.lang.NullPointerException on BasicJanitorRuleEngine.java:76 with a leashed janitor monkey

I've been running Simian Army, specifically janitor monkey, Leashed.

I get the following

java.lang.NullPointerException
    at java.util.Date.getMillisOf(Date.java:956)
    at java.util.Date.after(Date.java:929)
    at com.netflix.simianarmy.basic.janitor.BasicJanitorRuleEngine.isValid(BasicJanitorRuleEngine.java:76)
    at com.netflix.simianarmy.janitor.AbstractJanitor.markResources(AbstractJanitor.java:209)
    at com.netflix.simianarmy.basic.janitor.BasicJanitorMonkey.doMonkeyBusiness(BasicJanitorMonkey.java:95)
    at com.netflix.simianarmy.Monkey.run(Monkey.java:134)
    at com.netflix.simianarmy.Monkey$1.run(Monkey.java:155)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

From what I understand, BasicJanitorRuleEngine.java:76 ends up with the java.lang.NullPointerException due to the fact that the clone.getExpectedTerminationTime() is null because there have been no successful un-leashed runs of Janitor Monkey.

screenshot 2015-05-08 15 32 51

Will need to make this code handle null values better, as java.util.Date.after can't handle comparing to a null date.

SimianArmy build fails on :test

I just cloned SimianArmy from git://github.com/Netflix/SimianArmy.git today, and try to build it by running ".\gradlew build --info"

But got build failed, 261 tests completed, 5 failed

Here's the exception on the console::test (Thread[main,5,main]) started.
:test
Executing task ':test' (up-to-date check took 0.952 secs) due to:
No history is available.
Starting process 'Gradle Test Executor 1'. Working directory: D:\SimianArmy\SimianArmy Command: C:\Program Files\Java\jdk1.7.0_71\bin\java.exe -Djava.security.manager=jarjar.org.gradle.process.internal.child.BootstrapSecurityManager -Dfile.encoding=windows-1252 -ea -cp C:\Users\fengqh.gradle\caches\1.12\workerMain\gradle-worker.jar jarjar.org.gradle.process.internal.launcher.GradleWorkerMain
Successfully started process 'Gradle Test Executor 1'
Gradle Test Executor 1 started executing tests.
2014-10-29 20:46:18.262 - ERROR AbstractJanitor - [AbstractJanitor.java:298] Failed to clean up the resource 11.
java.lang.RuntimeException: Magic number of id.
at com.netflix.simianarmy.janitor.TestAbstractJanitor.cleanup(TestAbstractJanitor.java:81)
at com.netflix.simianarmy.janitor.AbstractJanitor.cleanupResources(AbstractJanitor.java:293)
at com.netflix.simianarmy.janitor.TestAbstractJanitor.testJanitorWithCleanupFailure(TestAbstractJanitor.java:223)

Killprocess doesnt kill the process in instance

Hi All,

I am trying to test the "KillProcess" failure in my set up. Killprocess.sh files have the following code:

!/bin/bash

Script for KillProcesses Chaos Monkey

cat << EOF > /tmp/kill_loop.sh

!/bin/bash

while true;
do
pkill -KILL -f java
pkill -KILL -f python
sleep 1
done
EOF

nohup /bin/bash /tmp/kill_loop.sh &

I can still see Java and python process running in the instance even after this failure is injected.
I tried to add additional process like calculator. It doesnt work as expected. When I manually execute the file, I can see the process getting killed.

Please could anyone suggest if I should make any code changes.

Thanks

Add support for Amazon API client proxy configuration

Currently there is no facility for passing proxy configuration to the various Amazon clients utilized by SimianArmy. Support for using proxies would be greatly appreciated by those of us stuck behind them... (pull request to follow).

Build fails

Build fails with latest code

C:\GitHub\SimianArmy [master]> ./gradlew jettyRun
Downloading http://services.gradle.org/distributions/gradle-1.5-bin.zip
.......................................................................
.......................................................................
.......................................................................
.......................................................................
.......................................................................
.......................................................................
.......................................................................
.......................................................................
.......................................................................
.......................................................................
.......................................................................
.......................................................................

C:\GitHub\SimianArmy\src\main\java\com\netflix\simianarmy\basic\LocalDbRecorder.java:35: cannot find symbol
symbol : class Utils
location: package org.mapdb
import org.mapdb.Utils;
^
C:\GitHub\SimianArmy\src\main\java\com\netflix\simianarmy\basic\LocalDbRecorder.java:79: cannot find symbol
symbol : variable Utils
location: class com.netflix.simianarmy.basic.LocalDbRecorder
dbFile = (dbFilename == null)? Utils.tempDbFile() : new File(dbFilename);
^
2 errors
:compileJava FAILED

FAILURE: Build failed with an exception.

jettyRun execution stops at 75% with AmazonHttpClient - [AmazonHttpClient.java:448] Unable to execute HTTP request: connect timed out java.net.SocketTimeoutException: connect timed out

Unable to execute jettyRun successfully due to below mentioned error:

Building 75% > :jettyRun > Starting
INFO AmazonHttpClient - [AmazonHttpClient.java:448] Unable to execute HTTP request: connect timed out
java.net.SocketTimeoutException: connect timed out

INFO c.n.s.basic.BasicMonkeyServer - Adding Janitor Monkey.
unavailable
java.lang.NullPointerException
at com.netflix.simianarmy.MonkeyRunner.replaceMonkey(MonkeyRunner.java:140)
at com.netflix.simianarmy.basic.BasicMonkeyServer.addMonkeysToRun(BasicMonkeyServer.java:57)
at com.netflix.simianarmy.basic.BasicMonkeyServer.init(BasicMonkeyServer.java:78)
at javax.servlet.GenericServlet.init(GenericServlet.java:241)
at org.mortbay.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:440)
at org.mortbay.jetty.servlet.ServletHolder.doStart(ServletHolder.java:263)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:685)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1272)
at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:489)
at org.gradle.api.plugins.jetty.internal.JettyPluginWebAppContext.doStart(JettyPluginWebAppContext.java:112)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
at org.mortbay.jetty.Server.doStart(Server.java:224)

SimianArmy build was successful.

build.gradle cleanup

I think log4j should be runtime not compile, and the servlet-api should be compile:

// needed for findbug Missing Class: org.slf4j.impl.StaticLoggerBinder
providedCompile 'org.slf4j:slf4j-log4j12:1.6.1'
providedCompile 'javax.servlet:servlet-api:2.5'

Difficulty in performing chaos monkey burn CPU strategy

I am using Netflix Simian Army tool. I am trying to perform Chaos monkey Burn CPU strategy but i am not able to perform. I am performing burn CPU on ubuntu AWS EC2 instance. what could be the reason.The output is given below

These are the lines which i have mentioned in chaos properties configuration file simianarmy.chaos.ssh.user= ubuntu

simianarmy.chaos.ssh.key = D:\Intern.pem

simianarmy.chaos.burncpu.enabled = true

2015-04-01 23:14:00.911 - INFO MonkeyRunner - [MonkeyRunner.java:56] Starting CHAOS Monkey
2015-04-01 23:14:01.943 - INFO Monkey - [Monkey.java:132] CHAOS Monkey Running ...
2015-04-01 23:14:01.951 - INFO MonkeyRunner - [MonkeyRunner.java:56] Starting VOLUME_TAGGING Monkey
2015-04-01 23:14:01.964 - INFO AWSClient - [AWSClient.java:266] Getting all auto-scaling groups in region ap-northeast-1.
2015-04-01 23:14:02.185 - INFO Monkey - [Monkey.java:132] VOLUME_TAGGING Monkey Running ...
2015-04-01 23:14:02.185 - INFO VolumeTaggingMonkey - [VolumeTaggingMonkey.java:138] Volume tagging monkey is not enabled. You can set simianarmy.volumeTagging.enabled to true to enable it.
2015-04-01 23:14:02.185 - INFO Monkey - [Monkey.java:138] Reporting what I did...

2015-04-01 23:14:02.211 - INFO MonkeyRunner - [MonkeyRunner.java:56] Starting JANITOR Monkey
2015-04-01 23:14:02.412 - INFO Monkey - [Monkey.java:132] JANITOR Monkey Running ...
2015-04-01 23:14:02.415 - INFO BasicJanitorMonkey - [BasicJanitorMonkey.java:218] JanitorMonkey disabled, set simianarmy.janitor.enabled=true
2015-04-01 23:14:02.415 - INFO Monkey - [Monkey.java:138] Reporting what I did...

2015-04-01 23:14:02.440 - INFO MonkeyRunner - [MonkeyRunner.java:56] Starting CONFORMITY Monkey
2015-04-01 23:14:02.643 - INFO Monkey - [Monkey.java:132] CONFORMITY Monkey Running ...
2015-04-01 23:14:02.644 - INFO BasicConformityMonkey - [BasicConformityMonkey.java:244] Conformity Monkey is disabled, set simianarmy.conformity.enabled=true
2015-04-01 23:14:02.651 - INFO Monkey - [Monkey.java:138] Reporting what I did...

2015-04-01 23:14:03.291 - INFO AWSClient - [AWSClient.java:287] Got 1 auto-scaling groups in region ap-northeast-1.
2015-04-01 23:14:05.039 - INFO BasicChaosMonkey - [BasicChaosMonkey.java:276] Group monkey [type ASG] enabled [prob 6.0]
2015-04-01 23:14:05.051 - INFO BasicChaosInstanceSelector - [BasicChaosInstanceSelector.java:83] Group monkey [type ASG] has disabled probability: 0.0
2015-04-01 23:14:05.051 - INFO BasicChaosInstanceSelector - [BasicChaosInstanceSelector.java:65] Randomly selecting 1 from 1 instances, excluding null
2015-04-01 23:14:16.862 - INFO Monkey - [Monkey.java:138] Reporting what I did...

After this line i am not getting anything

Publish Artifact

Is it possible to publish the war artifact the way some of the other netflix projects do? This would allow for things like making a dockerfile.

Is opting out of the janitor persistent across SimianArmy restart?

I thought I had opted out of an ASG being cleaned up:

curl http://localhost:8080/simianarmy/api/v1/janitor -d '{"eventType":"OPTOUT","resourceId":"ct_engine-CTE-133-20130711124528"}'

but at some point the ASG was removed. I stop and start SimianArmy frequently (we turn off most of our dev & qa apps during non-work hours), and I was wondering if the problem was that such an opt out is not persistent across restarts?

thanks, Mitchell

Unable to create SimpleDB

I've set the ARN in client.properties to: simianarmy.client.aws.assumeRoleArn = arn:aws:iam::116738426468:user/simianarmy. This user is in a group that has Allow * privileges.

WARN  SimpleDBRecorder - [SimpleDBRecorder.java:287] Error while trying to auto-create SimpleDB domain
com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain
    at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
    at com.amazonaws.services.simpledb.AmazonSimpleDBClient.invoke(AmazonSimpleDBClient.java:950)
    at com.amazonaws.services.simpledb.AmazonSimpleDBClient.listDomains(AmazonSimpleDBClient.java:660)
    at com.amazonaws.services.simpledb.AmazonSimpleDBClient.listDomains(AmazonSimpleDBClient.java:912)
    at com.netflix.simianarmy.aws.SimpleDBRecorder.init(SimpleDBRecorder.java:275)
    at com.netflix.simianarmy.basic.BasicSimianArmyContext.createRecorder(BasicSimianArmyContext.java:159)
    at com.netflix.simianarmy.basic.BasicSimianArmyContext.<init>(BasicSimianArmyContext.java:122)
    at com.netflix.simianarmy.basic.BasicChaosMonkeyContext.<init>(BasicChaosMonkeyContext.java:50)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:534)
    at java.lang.Class.newInstance(Class.java:374)
    at com.netflix.simianarmy.MonkeyRunner.factory(MonkeyRunner.java:229)
    at com.netflix.simianarmy.MonkeyRunner.replaceMonkey(MonkeyRunner.java:145)
    at com.netflix.simianarmy.basic.BasicMonkeyServer.addMonkeysToRun(BasicMonkeyServer.java:53)
    at com.netflix.simianarmy.basic.BasicMonkeyServer.init(BasicMonkeyServer.java:78)
    at javax.servlet.GenericServlet.init(GenericServlet.java:241)
    at org.mortbay.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:440)
    at org.mortbay.jetty.servlet.ServletHolder.doStart(ServletHolder.java:263)
    at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
    at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:685)
    at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
    at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1272)
    at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
    at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:489)
    at org.gradle.api.plugins.jetty.internal.JettyPluginWebAppContext.doStart(JettyPluginWebAppContext.java:112)
    at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
    at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
    at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
    at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
    at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
    at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
    at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
    at org.mortbay.jetty.Server.doStart(Server.java:224)
    at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
    at org.gradle.api.plugins.jetty.internal.Jetty6PluginServer.start(Jetty6PluginServer.java:111)
    at org.gradle.api.plugins.jetty.AbstractJettyRunTask.startJettyInternal(AbstractJettyRunTask.java:238)
    at org.gradle.api.plugins.jetty.AbstractJettyRunTask.startJetty(AbstractJettyRunTask.java:191)
    at org.gradle.api.plugins.jetty.AbstractJettyRunTask.start(AbstractJettyRunTask.java:162)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at org.gradle.internal.reflect.JavaMethod.invoke(JavaMethod.java:63)
    at org.gradle.api.internal.project.taskfactory.AnnotationProcessingTaskFactory$StandardTaskAction.doExecute(AnnotationProcessingTaskFactory.java:219)
    at org.gradle.api.internal.project.taskfactory.AnnotationProcessingTaskFactory$StandardTaskAction.execute(AnnotationProcessingTaskFactory.java:212)
    at org.gradle.api.internal.project.taskfactory.AnnotationProcessingTaskFactory$StandardTaskAction.execute(AnnotationProcessingTaskFactory.java:201)
    at org.gradle.api.internal.AbstractTask$TaskActionWrapper.execute(AbstractTask.java:533)
    at org.gradle.api.internal.AbstractTask$TaskActionWrapper.execute(AbstractTask.java:516)
    at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.executeAction(ExecuteActionsTaskExecuter.java:80)
    at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.executeActions(ExecuteActionsTaskExecuter.java:61)
    at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.execute(ExecuteActionsTaskExecuter.java:46)
    at org.gradle.api.internal.tasks.execution.PostExecutionAnalysisTaskExecuter.execute(PostExecutionAnalysisTaskExecuter.java:35)
    at org.gradle.api.internal.tasks.execution.SkipUpToDateTaskExecuter.execute(SkipUpToDateTaskExecuter.java:64)
    at org.gradle.api.internal.tasks.execution.ValidatingTaskExecuter.execute(ValidatingTaskExecuter.java:58)
    at org.gradle.api.internal.tasks.execution.SkipEmptySourceFilesTaskExecuter.execute(SkipEmptySourceFilesTaskExecuter.java:42)
    at org.gradle.api.internal.tasks.execution.SkipTaskWithNoActionsExecuter.execute(SkipTaskWithNoActionsExecuter.java:52)
    at org.gradle.api.internal.tasks.execution.SkipOnlyIfTaskExecuter.execute(SkipOnlyIfTaskExecuter.java:53)
    at org.gradle.api.internal.tasks.execution.ExecuteAtMostOnceTaskExecuter.execute(ExecuteAtMostOnceTaskExecuter.java:43)
    at org.gradle.api.internal.AbstractTask.executeWithoutThrowingTaskFailure(AbstractTask.java:289)
    at org.gradle.execution.taskgraph.AbstractTaskPlanExecutor$TaskExecutorWorker.executeTask(AbstractTaskPlanExecutor.java:79)
    at org.gradle.execution.taskgraph.AbstractTaskPlanExecutor$TaskExecutorWorker.processTask(AbstractTaskPlanExecutor.java:63)
    at org.gradle.execution.taskgraph.AbstractTaskPlanExecutor$TaskExecutorWorker.run(AbstractTaskPlanExecutor.java:51)
    at org.gradle.execution.taskgraph.DefaultTaskPlanExecutor.process(DefaultTaskPlanExecutor.java:23)
    at org.gradle.execution.taskgraph.DefaultTaskGraphExecuter.execute(DefaultTaskGraphExecuter.java:86)
    at org.gradle.execution.SelectedTaskExecutionAction.execute(SelectedTaskExecutionAction.java:29)
    at org.gradle.execution.DefaultBuildExecuter.execute(DefaultBuildExecuter.java:61)
    at org.gradle.execution.DefaultBuildExecuter.access$200(DefaultBuildExecuter.java:23)
    at org.gradle.execution.DefaultBuildExecuter$2.proceed(DefaultBuildExecuter.java:67)
    at org.gradle.execution.DryRunBuildExecutionAction.execute(DryRunBuildExecutionAction.java:32)
    at org.gradle.execution.DefaultBuildExecuter.execute(DefaultBuildExecuter.java:61)
    at org.gradle.execution.DefaultBuildExecuter.execute(DefaultBuildExecuter.java:54)
    at org.gradle.initialization.DefaultGradleLauncher.doBuildStages(DefaultGradleLauncher.java:166)
    at org.gradle.initialization.DefaultGradleLauncher.doBuild(DefaultGradleLauncher.java:113)
    at org.gradle.initialization.DefaultGradleLauncher.run(DefaultGradleLauncher.java:81)
    at org.gradle.launcher.exec.InProcessBuildActionExecuter$DefaultBuildController.run(InProcessBuildActionExecuter.java:64)
    at org.gradle.launcher.cli.ExecuteBuildAction.run(ExecuteBuildAction.java:33)
    at org.gradle.launcher.cli.ExecuteBuildAction.run(ExecuteBuildAction.java:24)
    at org.gradle.launcher.exec.InProcessBuildActionExecuter.execute(InProcessBuildActionExecuter.java:35)
    at org.gradle.launcher.exec.InProcessBuildActionExecuter.execute(InProcessBuildActionExecuter.java:26)
    at org.gradle.launcher.cli.RunBuildAction.run(RunBuildAction.java:50)
    at org.gradle.internal.Actions$RunnableActionAdapter.execute(Actions.java:171)
    at org.gradle.launcher.cli.CommandLineActionFactory$ParseAndBuildAction.execute(CommandLineActionFactory.java:201)
    at org.gradle.launcher.cli.CommandLineActionFactory$ParseAndBuildAction.execute(CommandLineActionFactory.java:174)
    at org.gradle.launcher.cli.CommandLineActionFactory$WithLogging.execute(CommandLineActionFactory.java:170)
    at org.gradle.launcher.cli.CommandLineActionFactory$WithLogging.execute(CommandLineActionFactory.java:139)
    at org.gradle.launcher.cli.ExceptionReportingAction.execute(ExceptionReportingAction.java:33)
    at org.gradle.launcher.cli.ExceptionReportingAction.execute(ExceptionReportingAction.java:22)
    at org.gradle.launcher.Main.doAction(Main.java:46)
    at org.gradle.launcher.bootstrap.EntryPoint.run(EntryPoint.java:45)
    at org.gradle.launcher.Main.main(Main.java:37)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at org.gradle.launcher.bootstrap.ProcessBootstrap.runNoExit(ProcessBootstrap.java:50)
    at org.gradle.launcher.bootstrap.ProcessBootstrap.run(ProcessBootstrap.java:32)
    at org.gradle.launcher.GradleMain.main(GradleMain.java:23)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at org.gradle.wrapper.BootstrapMainStarter.start(BootstrapMainStarter.java:30)
    at org.gradle.wrapper.WrapperExecutor.execute(WrapperExecutor.java:127)
    at org.gradle.wrapper.GradleWrapperMain.main(GradleWrapperMain.java:55)

Problems building the army

Trying to build the army just now but the build script stumbles on this pesky error that comes up while building "findbugs". This breaks the build and leads to high CPU and MEM utilization. I have captured the error message here:


06:49:34.469 [DEBUG] [system.out] [DEBUG] [org.gradle.process.internal.child.AtionExecutionWorker tarting Gradle Worker 1.
06:49:34.477 [DEBUG] [system.out] 06:49:34.477 [DEBUG] [org.gradle.api.plugins.uality.internal.findbugs.FindBugsWorkerServer xecuting findbugs worker.
06:49:34.502 [DEBUG] [system.out] 06:49:34.501 [DEBUG] [org.gradle.messaging.reote.internal.Router] Received route available. Message: [ConsumerAvailable id: 4ef27df-d6ca-4a87-b878-34462ca228fb, displayName: message server, channel: org.radle.api.plugins.quality.internal.findbugs.FindBugsWorkerClientProtocol]
Scanning archives (139 / 169)out] 06:49:41.423 [QUIET] [system.out] Scanning archives (169 / 169)
2 analysis passes to perform.out] 06:49:42.208 [QUIET] [system.out]
Pass 1: Analyzing classes (38 / 348) - 10% completeET] [system.out] lPass 1: Analyzing classes (77 / 348) - 22% complete nPass 1: Analyzing classes (116 / 348) - 33% complete cPass 1: Analyzing classes (194 / 348) - 55% complete 3Pass 1: Analyzing classes (232 / 348) - 66% complete iPass 1: Analyzing classes (271 / 348) - 77% complete lPass 1: Analyzing classes (319 / 348) - 88% complete 0Pass 1: Analyzing classes (348 / 348) - 100% complete
06:51:51.733 [DEBUG] [org.gradle.process.internal.DefaultExecHandle] Changing state to: FAILED
06:51:51.733 [INFO] [org.gradle.process.internal.DefaultExecHandle] Process 'Gradle Worker 1' finished with exit value 137 (state: FAILED)


I had the software build and run fine early last week so this could be related to the latest commits.

My setup is: Amazon Linux 64-bit on a micro instance.

Error while doing gradle-1.6 build for simian army

Hi,

We are working on one of simian army POC project and we have taken source code from netflix.
We have setup enviroment in our local machine and try to build using gradle.
We have added version of gradle-wrapper is 1.6. We are executing the gradle build command then it goes for downloading file on
http://dist.codehaus.org/gradle/gradle-0.9-rc-3-bin.zip url and then it gives below error.

A problem occurred evaluating script.
Cause: No signature of method: org.gradle.api.internal.initialization.DefaultScr
iptHandler.maven() is applicable for argument types: (buildscript_gradle_8e14430
1b5a9a410ba0da9dfa59bd03b$_run_closure1_closure3) values: [buildscript_gradle_8e
144301b5a9a410ba0da9dfa59bd03b$_run_closure1_closure3@159d510]
Possible solutions: wait(), any(), every(), wait(long), any(groovy.lang.Closure)
, grep(java.lang.Object)

Please help us in building code using gradle.

HTTP/1.1 401 Unauthorized error while connecting to ec2.eu-central via chaos monkey.

2015-01-19 05:37:05.028 - DEBUG SLF4JLogger - [SLF4JLogger.java:61] Sending request -1804412292: POST https://ec2.eu-central-1.amazonaws.com/ HTTP/1.1
2015-01-19 05:37:05.029 - DEBUG SLF4JLogger - [SLF4JLogger.java:61] >> "Action=DescribeInstances&Signature=Ao5fLfM%2B/rOcbdll0LF0K2F9U8NBlgd%2BAwuFk83GOxo%3D&SignatureMethod=HmacSHA256&SignatureVersion=2&Timestamp=2015-01-19T10%3A37%3A05.024Z&Version=2010-06-15&AWSAccessKeyId=xxx"
2015-01-19 05:37:05.029 - DEBUG SLF4JLogger - [SLF4JLogger.java:61] >> POST https://ec2.eu-central-1.amazonaws.com/ HTTP/1.1
2015-01-19 05:37:05.029 - DEBUG SLF4JLogger - [SLF4JLogger.java:61] >> Host: ec2.eu-central-1.amazonaws.com
2015-01-19 05:37:05.030 - DEBUG SLF4JLogger - [SLF4JLogger.java:61] >> Content-Type: application/x-www-form-urlencoded
2015-01-19 05:37:05.030 - DEBUG SLF4JLogger - [SLF4JLogger.java:61] >> Content-Length: 225
2015-01-19 05:37:05.609 - DEBUG SLF4JLogger - [SLF4JLogger.java:61] Receiving response -1804412292: HTTP/1.1 401 Unauthorized
2015-01-19 05:37:05.610 - DEBUG SLF4JLogger - [SLF4JLogger.java:61] << HTTP/1.1 401 Unauthorized

It is using signature version v2 and eu-central-1 requires v4.
How can we set signature version to v4? Or How to exclude this region to remove this error? Did anyone resolve this issue? Any help would greatly be appreciated.

Writing Conformity Rule Unit Test

Hi

I'm in the process of writing my own conformity rule which I may possibly patch back. However I've noticed that there doesn't appear to be any unit testing in this area. Just curious as to how I would go about testing the rule and if there was a framework for it.

Thanks

Dip

Chaos Monkey Error for sending emails.

Hi,

When I enable these properties in chaos.properties below. The chaos monkey fails to terminate my instances. But do when it is commented out. I'm able to get my instances terminated. Is there something else I'm doing wrong?

"# Set the source email that sends the termination notification"
simianarmy.chaos.notification.sourceEmail = [email protected]

"# Enable notification for Chaos termination for all instance groups"
simianarmy.chaos.notification.global.enabled = true

"# Set the destination email the termination notification is sent to for all instance groups"
simianarmy.chaos.notification.global.receiverEmail = [email protected]

10:58:43.540 [pool-1-thread-1] INFO com.netflix.simianarmy.Monkey - CHAOS Monkey Running ...
10:58:43.540 [pool-1-thread-1] INFO c.n.simianarmy.client.aws.AWSClient - Getting all auto-scaling groups in region us-west-2.
10:58:43.885 [pool-1-thread-1] INFO c.n.simianarmy.client.aws.AWSClient - Got 2 auto-scaling groups in region us-west-2.
10:58:44.150 [pool-1-thread-1] INFO c.n.s.basic.chaos.BasicChaosMonkey - Group chroma [type ASG] enabled [prob 100.0]
10:58:44.151 [pool-1-thread-1] INFO c.n.s.b.c.BasicChaosInstanceSelector - Group chroma [type ASG] got lucky: 0.7186039819429166 > 0.5454545454545459
10:58:44.151 [pool-1-thread-1] INFO c.n.s.b.c.BasicChaosInstanceSelector - Randomly selecting 4 from 3 instances, excluding null
10:58:44.287 [pool-1-thread-1] INFO c.n.s.b.c.BasicChaosEmailNotifier - sending termination notification to global email address [email protected]
10:58:44.287 [pool-1-thread-1] DEBUG c.n.simianarmy.aws.AWSEmailNotifier - Sending email with subject 'Instance i-b0aa2f78 of ASG chroma is being terminated by Chaos monkey.
Chaos type: ShutdownInstance.' to [email protected]
10:58:44.930 [pool-1-thread-1] ERROR c.n.s.basic.chaos.BasicChaosMonkey - failed to terminate instance i-b0aa2f78
java.lang.RuntimeException: Failed to send email to [email protected]
at com.netflix.simianarmy.aws.AWSEmailNotifier.sendEmail(AWSEmailNotifier.java:86) ~[main/:na]
at com.netflix.simianarmy.basic.chaos.BasicChaosEmailNotifier.buildAndSendEmail(BasicChaosEmailNotifier.java:133) ~[main/:na]
at com.netflix.simianarmy.basic.chaos.BasicChaosEmailNotifier.sendTerminationGlobalNotification(BasicChaosEmailNotifier.java:76) ~[main/:na]
at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.sendTerminationNotification(BasicChaosMonkey.java:445) [main/:na]
at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.terminateInstance(BasicChaosMonkey.java:374) [main/:na]
at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.doMonkeyBusiness(BasicChaosMonkey.java:127) [main/:na]
at com.netflix.simianarmy.Monkey.run(Monkey.java:134) [main/:na]
at com.netflix.simianarmy.Monkey$1.run(Monkey.java:155) [main/:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_79]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) [na:1.7.0_79]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) [na:1.7.0_79]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
Caused by: com.amazonaws.AmazonServiceException: Invalid message subject: Instance i-b0aa2f78 of ASG chroma is being terminated by Chaos monkey.
Chaos type: ShutdownInstance. (Service: AmazonSimpleEmailService; Status Code: 400; Error Code: InvalidParameterValue; Request ID: 29d7d55d-2b1b-11e5-932b-0f2aae868986)
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1032) ~[aws-java-sdk-core-1.8.11.jar:na]
at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:687) ~[aws-java-sdk-core-1.8.11.jar:na]
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:441) ~[aws-java-sdk-core-1.8.11.jar:na]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:292) ~[aws-java-sdk-core-1.8.11.jar:na]
at com.amazonaws.services.simpleemail.AmazonSimpleEmailServiceClient.invoke(AmazonSimpleEmailServiceClient.java:1390) ~[aws-java-sdk-1.8.11.jar:na]
at com.amazonaws.services.simpleemail.AmazonSimpleEmailServiceClient.sendEmail(AmazonSimpleEmailServiceClient.java:1129) ~[aws-java-sdk-1.8.11.jar:na]
at com.netflix.simianarmy.aws.AWSEmailNotifier.sendEmail(AWSEmailNotifier.java:84) ~[main/:na]
... 14 common frames omitted
10:58:44.931 [pool-1-thread-1] INFO com.netflix.simianarmy.Monkey - Reporting what I did...

10:58:44.932 [pool-1-thread-1] ERROR com.netflix.simianarmy.Monkey - CHAOS Monkey Error:
java.lang.RuntimeException: failed to terminate instance i-b0aa2f78
at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.handleTerminationError(BasicChaosMonkey.java:202) ~[main/:na]
at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.terminateInstance(BasicChaosMonkey.java:387) ~[main/:na]
at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.doMonkeyBusiness(BasicChaosMonkey.java:127) ~[main/:na]
at com.netflix.simianarmy.Monkey.run(Monkey.java:134) ~[main/:na]
at com.netflix.simianarmy.Monkey$1.run(Monkey.java:155) ~[main/:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_79]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) [na:1.7.0_79]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) [na:1.7.0_79]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
Caused by: java.lang.RuntimeException: Failed to send email to [email protected]

at com.netflix.simianarmy.aws.AWSEmailNotifier.sendEmail(AWSEmailNotifier.java:86) ~[main/:na]
at com.netflix.simianarmy.basic.chaos.BasicChaosEmailNotifier.buildAndSendEmail(BasicChaosEmailNotifier.java:133) ~[main/:na]
at com.netflix.simianarmy.basic.chaos.BasicChaosEmailNotifier.sendTerminationGlobalNotification(BasicChaosEmailNotifier.java:76) ~[main/:na]
at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.sendTerminationNotification(BasicChaosMonkey.java:445) ~[main/:na]
at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.terminateInstance(BasicChaosMonkey.java:374) ~[main/:na]
... 10 common frames omitted

Caused by: com.amazonaws.AmazonServiceException: Invalid message subject: Instance i-b0aa2f78 of ASG chroma is being terminated by Chaos monkey.
Chaos type: ShutdownInstance. (Service: AmazonSimpleEmailService; Status Code: 400; Error Code: InvalidParameterValue; Request ID: 29d7d55d-2b1b-11e5-932b-0f2aae868986)
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1032) ~[aws-java-sdk-core-1.8.11.jar:na]
at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:687) ~[aws-java-sdk-core-1.8.11.jar:na]
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:441) ~[aws-java-sdk-core-1.8.11.jar:na]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:292) ~[aws-java-sdk-core-1.8.11.jar:na]
at com.amazonaws.services.simpleemail.AmazonSimpleEmailServiceClient.invoke(AmazonSimpleEmailServiceClient.java:1390) ~[aws-java-sdk-1.8.11.jar:na]
at com.amazonaws.services.simpleemail.AmazonSimpleEmailServiceClient.sendEmail(AmazonSimpleEmailServiceClient.java:1129) ~[aws-java-sdk-1.8.11.jar:na]
at com.netflix.simianarmy.aws.AWSEmailNotifier.sendEmail(AWSEmailNotifier.java:84) ~[main/:na]
... 14 common frames omitted

Buil11:02:09.982 [Shutdown] INFO com.netflix.simianarmy.MonkeyRunner - Stopping CHAOS Monkey
11:02:09.983 [Shutdown] INFO com.netflix.simianarmy.MonkeyRunner - Stopping VOLUME_TAGGING Monkey
11:02:09.983 [Shutdown] INFO com.netflix.simianarmy.MonkeyRunner - Stopping JANITOR Monkey
11:02:09.983 [Shutdown] INFO com.netflix.simianarmy.MonkeyRunner - Stopping CONFORMITY Monkey
11:02:09.983 [Shutdown] INFO c.n.s.basic.BasicMonkeyServer - Stopping Chaos Monkey.
11:02:09.983 [Shutdown] INFO c.n.s.basic.BasicMonkeyServer - Stopping volume tagging Monkey.
11:02:09.983 [Shutdown] INFO c.n.s.basic.BasicMonkeyServer - Stopping Janitor Monkey.

BUILD SUCCESSFUL

Total time: 8 mins 55.868 secs
tly@tly-VirtualBox:~/SimianArmy$ ./gradlew jettyRun

Configuring > 1/1 projects

Thanks in advance,

Tony

Broken Cloudbees build

Hi there,
it seems as if the cloudbees build fails without any code changes since build 318 (https://netflixoss.ci.cloudbees.com/job/SimianArmy-master/318/console). Looks like there was a fresh fetch of all dependencies which makes me believe there might be a newer version of any of the dependencies breaking the tests here. It also breaks if I try it locally but I couldn't figure out the exact differences to the last successful build. Any idea?

Running Chaos Monkey across multiple AWS accounts

Hi, we're looking at how best to set up chaos monkey to run across many AWS accounts. Do you have a suggested pattern for this at the moment? Our immediate thought were to either

  • provide chaos monkey with a list of accounts and have it assume a role with temporary credentials in each of those accounts (where the accounts allow it to).
  • run a chaos monkey install in every account we maintain

Of these my immediate preference is for the first, as we already run a fair number of accounts and that will leave us running many independent monkeys, but i'm interested in anyones thoughts on whether this has been done or proposed in the past.

Support for notification via SNS

It seems odd that the only notification of terminations is via SES, not SNS.

Suggested feature: for each termination, send a JSON message to some configured SNS topic, including at least the instance id terminated, and the arn of the Autoscaling Group of which it was a member.

isolate AWS logic from rest of system

Although there's a specific aws package, the enum of target VMs and other bits of the BasicMonkey do use AWS-logic to enumerate target VMs, and catch AWS-specific exceptions.

If these could be isolated into infrastructure specific packages, it'd be easier to direct chaos at virtualbox, VMWare, power-fencing devices on the LAN, etc.

Gradle build Failed

I have just clone a copy of SimianArmy from github and trigger a ./gradlew build -x test. The job failed with:-

Inferred version: 2.5.0-SNAPSHOT

FAILURE: Build failed with an exception.

  • What went wrong:
    A problem occurred configuring root project 'simianarmy'.

    Failed to notify project evaluation listener.

  • Try:
    Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Any help would be appreciated.

build error

I've changed my JDK to 6, when running ./gradlew build, I got the following errors:
/SimianArmy/src/main/java/com/netflix/simianarmy/client/aws/chaos/ASGChaosCrawler.java:61: error: incompatible types: inference variable E has incompatible bounds
return EnumSet.allOf(Types.class);
^
equality constraints: Types
upper bounds: Enum<CAP#1>,Enum
where E is a type-variable:
E extends Enum declared in method allOf(Class)
where CAP#1 is a fresh type-variable:
CAP#1 extends Enum<CAP#1> from capture of ?
1 error
:compileJava FAILED

SimpleDB?

Quick Start document mentions using SimpleDB - something that's not available anymore within AWS.

Compiling Java 6 bytecode using Java 7

Hi Michael

We've just rolled out Java 7 across most of our instances and are aiming to move towards that and have hit a snag when working with Simian Army. It's more related to gradle but wondered if you might have a few ideas on resolving it. We get the following error when we try to compile:

:compileJavawarning: [options] bootstrap class path not set in conjunction with -source 1.6

We have a few work arounds;

  • disabling warnings as errors seems to work
  • Also setting java_home explicitly to java6.
  • There is also some talk on the inter-web about setting the bootstrapclasspath.

If you have some suggestions we'd appreciate it.

Thanks.

Dip

Gradle Build Failure - Installation failure

If i follow the installation guide, the installation fails on a Ubuntu 12.04 LTS server.

./gradlew build

warning: [options] bootstrap class path not set in conjunction with -source 1.6
error: warnings found and -Werror specified
1 error
1 warning

FAILURE: Build failed with an exception.

Running ./gradlew build --debug:

14:51:26.467 [INFO] [org.gradle.api.internal.tasks.compile.jdk6.Jdk6JavaCompiler] Compiling with JDK 6 Java compiler API.
14:51:27.587 [ERROR] [system.err] warning: [options] bootstrap class path not set in conjunction with -source 1.6
14:51:27.652 [ERROR] [system.err] error: warnings found and -Werror specified
14:51:27.655 [ERROR] [system.err] 1 error
14:51:27.657 [ERROR] [system.err] 1 warning
14:51:27.684 [DEBUG] [org.gradle.logging.internal.DefaultLoggingConfigurer] Finished configuring with level: DEBUG, configurers: [org.gradle.logging.internal.OutputEventRenderer@47ef303b, org.gradle.logging.internal.logback.LogbackLoggingConfigurer@28479662, org.gradle.logging.internal.JavaUtilLoggingConfigurer@73e21096]

$JAVA_HOME is set correctly. Can you help me here?

Bug in instance selection when frequency == DAYS (divide by zero)

Hi -- I was testing out a modification to the chaos monkey, and got an exception against the following line of code when I set the frequency to DAYS (with probability to 1.0, and frequency set to 1).

In BasicChaosInstanceSelector.selectOneInstance

Validate.isTrue(probability < 1)

The probability can actually end up being Double.POSITIVE_INFINITY.

Here's how....

Even if you have monkeyTime turned on, if you run the monkey with DAYS as its unit of frequency, the BasicChaosMonkey gets an open and close hour for the day in its constructor. It then computes the millisecond diff between the two, in its constructor.

long units = freqUnit.convert(close.getTimeInMillis() - open.getTimeInMillis(), TimeUnit.MILLISECONDS);
runsPerDay = units / ctx.scheduler().frequency();

Even though I'm using DAYS, this works out to a 6 hour block. The conversion from MILLISECONDS to DAYS == 0.25, but as this computation returns a long, the fractional part is dropped, and units now equals zero.

This means (see next line) that it will run zero times per day (0/1 == 0).

Now, when doMonkeyBusiness() runs, it comes up with a selection coefficient by dividing the supplied probability by the runsPerDay.

Collection<String> instances = context().chaosInstanceSelector().select(group, prob / runsPerDay);

1.0 / 0 == Double.POSITIVE_INFINITY.

The validation check now fails, because POSITIVE_INFINITY > 1, but the code is looking for a value < 1.

To get it running, I allow POSITIVE_INFINITY., as well as a probability of 1.

So, to fix this, do change the probability check to

Validate.isTrue(probability < 1 || probability == Double.POSITIVE_INFINITY)

or do we do something else.

As it stands, running the schedule using DAYS does not work. I can prove this with a unit test.

@Test
    public void testFullProbability() {
        TestChaosMonkeyContext ctx = new TestChaosMonkeyContext("fullProbability.properties") {
            @Override
            public MonkeyScheduler scheduler() {
                return new MonkeyScheduler() {
                    @Override
                    public int frequency() {
                        return 1;
                    }

                    @Override
                    public TimeUnit frequencyUnit() {
                        return TimeUnit.DAYS;
                    }

                    @Override
                    public void start(Monkey monkey, Runnable run) {
                        Assert.assertEquals(monkey.type().name(), monkey.type().name(), "starting monkey");
                        run.run();
                    }

                    @Override
                    public void stop(Monkey monkey) {
                        Assert.assertEquals(monkey.type().name(), monkey.type().name(), "stopping monkey");
                    }
                };
            }
            ;
        };
        ChaosMonkey chaos = new BasicChaosMonkey(ctx);
        chaos.start();
        chaos.stop();
        List<InstanceGroup> selectedOn = ctx.selectedOn();
        List<String> terminated = ctx.terminated();
        Assert.assertEquals(selectedOn.size(), 4);
        Assert.assertEquals(selectedOn.get(0).type(), TestChaosMonkeyContext.CrawlerTypes.TYPE_A);
        Assert.assertEquals(selectedOn.get(0).name(), "name0");
        Assert.assertEquals(selectedOn.get(1).type(), TestChaosMonkeyContext.CrawlerTypes.TYPE_A);
        Assert.assertEquals(selectedOn.get(1).name(), "name1");
        Assert.assertEquals(selectedOn.get(2).type(), TestChaosMonkeyContext.CrawlerTypes.TYPE_B);
        Assert.assertEquals(selectedOn.get(2).name(), "name2");
        Assert.assertEquals(selectedOn.get(3).type(), TestChaosMonkeyContext.CrawlerTypes.TYPE_B);
        Assert.assertEquals(selectedOn.get(3).name(), "name3");
        Assert.assertEquals(terminated.size(), 4);
    }

Note that the test report shows a failure because 4 instance were not terminated, but the reason termination stops is the validation error against infinity (just look at the console).

I verified in the debugger that runs per day is zero, and probability for that run is infinity (both in the test case and actual server execution).

Probably a better fix is this:

if (TimeUnit.DAYS == ctx.scheduler().frequencyUnit()) {
    runsPerDay = ctx.scheduler().frequency();
} else {
    TimeUnit freqUnit = ctx.scheduler().frequencyUnit();
    long units = freqUnit.convert(close.getTimeInMillis() - open.getTimeInMillis(), TimeUnit.MILLISECONDS);
    runsPerDay = units / ctx.scheduler().frequency();
}

I'm going to submit a PR do apply the second fix.

Matt

Rate Exceeded / Throttling Janitor Monkey

One of our AWS accounts is running Janitor Monkey daily and almost every day we get the following error:
com.amazonaws.AmazonServiceException: Rate exceeded (Service: AmazonCloudWatch; Status Code: 400; Error Code: Throttling; Request ID: c7248d1f-c116-11e4-a9a1-7fc9f8d26e2c)

Could there be a better way of handling this error in Simian Army? Perhaps a sleep/retry interval could be passed that if this error is encountered rather than error and exit, it'll sleep the appropriate amount of time and try again, until the rate limit clears.

Anyone else run into this and/or have suggestions?

SimianArmy war in Tomcat6 - SLF4J error (missing jar)

Hi,

I followed the quickstart guide and managed to build SimianArmy war without problems. However when I'm trying to run simianarmy in tomcat6 I get the following error in catalina.out.

LF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

So, I can't get any logs. If I manually add slf4j-log4j12-1.6.0.jar in the lib folder of the webapp then it's ok but I'd like to be there in the first place.

I tried to modify build.gradle in order to include slf4j-log4j12-1.6.0.jar as a compile dependency but then one of the tests is failing (although the war file seems to be generated)

This is the error that I get when I build with slf4j as a compile dependency:

:pmdTest
:test
SLF4J: This version of SLF4J requires log4j version 1.2.12 or later. See also http://www.slf4j.org/codes.html#log4j_version
ERROR cleanupResources, Failed to clean up the resource 11.
java.lang.RuntimeException: Magic number of id.
at com.netflix.simianarmy.janitor.TestAbstractJanitor.cleanup(TestAbstractJanitor.java:80)
at com.netflix.simianarmy.janitor.AbstractJanitor.cleanupResources(AbstractJanitor.java:292)
at com.netflix.simianarmy.janitor.TestAbstractJanitor.testJanitorWithCleanupFailure(TestAbstractJanitor.java:222)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:691)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:883)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1208)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
at org.testng.TestRunner.privateRun(TestRunner.java:758)
at org.testng.TestRunner.run(TestRunner.java:613)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329)
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291)
at org.testng.SuiteRunner.run(SuiteRunner.java:240)
at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53)
at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:87)
at org.testng.TestNG.runSuitesSequentially(TestNG.java:1137)
at org.testng.TestNG.runSuitesLocally(TestNG.java:1062)
at org.testng.TestNG.run(TestNG.java:974)
at org.gradle.api.internal.tasks.testing.testng.TestNGTestClassProcessor.stop(TestNGTestClassProcessor.java:105)
at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.stop(SuiteTestClassProcessor.java:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at org.gradle.messaging.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at org.gradle.messaging.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at $Proxy2.stop(Unknown Source)
at org.gradle.api.internal.tasks.testing.worker.TestWorker.stop(TestWorker.java:113)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at org.gradle.messaging.remote.internal.TypeCastDispatch.dispatch(TypeCastDispatch.java:30)
at org.gradle.messaging.remote.internal.WorkerProtocol.handleIncoming(WorkerProtocol.java:53)
at org.gradle.messaging.remote.internal.WorkerProtocol.handleIncoming(WorkerProtocol.java:31)
at org.gradle.messaging.remote.internal.ProtocolStack$ProtocolStage.handleIncoming(ProtocolStack.java:167)
at org.gradle.messaging.remote.internal.ProtocolStack$BottomStage.handleIncoming(ProtocolStack.java:277)
at org.gradle.messaging.remote.internal.ProtocolStack$BottomConnection$1.run(ProtocolStack.java:299)
at org.gradle.messaging.remote.internal.ProtocolStack$ExecuteRunnable.dispatch(ProtocolStack.java:120)
at org.gradle.messaging.remote.internal.ProtocolStack$ExecuteRunnable.dispatch(ProtocolStack.java:116)
at org.gradle.messaging.dispatch.AsyncDispatch.dispatchMessages(AsyncDispatch.java:132)
at org.gradle.messaging.dispatch.AsyncDispatch.access$000(AsyncDispatch.java:33)
at org.gradle.messaging.dispatch.AsyncDispatch$1.run(AsyncDispatch.java:72)
at org.gradle.internal.concurrent.DefaultExecutorFactory$StoppableExecutorImpl$1.run(DefaultExecutorFactory.java:66)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Thanks,
Nick

Build Error

Build error in windows -

:test
2014-12-17 21:57:03.329 - ERROR AbstractJanitor - [AbstractJanitor.java:298] Failed to clean up the resource 11.
java.lang.RuntimeException: Magic number of id.
at com.netflix.simianarmy.janitor.TestAbstractJanitor.cleanup(TestAbstractJanitor.java:81)
at com.netflix.simianarmy.janitor.AbstractJanitor.cleanupResources(AbstractJanitor.java:293)
at com.netflix.simianarmy.janitor.TestAbstractJanitor.testJanitorWithCleanupFailure(TestAbstractJanitor.java:223)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)

Gradle test > com.netflix.simianarmy.basic.TestBasicCalendar.testGetBusinessDayWihHoliday FAILED
java.lang.AssertionError at TestBasicCalendar.java:185

Gradle test > com.netflix.simianarmy.basic.TestBasicCalendar.testGetBusinessDayWihHolidayNextYear FAILED
java.lang.AssertionError at TestBasicCalendar.java:204

Gradle test > com.netflix.simianarmy.basic.TestBasicCalendar.testGetBusinessDayWihWeekend FAILED
java.lang.AssertionError at TestBasicCalendar.java:165

Gradle test > com.netflix.simianarmy.basic.TestBasicCalendar.testGetBusinessDayWihoutGap FAILED
java.lang.AssertionError at TestBasicCalendar.java:145

gradle build bug

When I run ./gradlew build, it shows some errors during the step "compileJava"
Such as:
warning: [deprecation] org.apache.http.impl.client.AutoRetryHttpClient in org.apache.http.impl.client has been deprecated

warning: [deprecation] org.apache.http.impl.client.DefaultHttpClient in org.apache.http.impl.client has been deprecated

warning: [deprecation] org.apache.http.params.BasicHttpParams in org.apache.http.params has been deprecated

....

10 warnings
:compileJava FAILED
FAILURE: Build failed with an exception.

  • What went wrong:
    Execution failed for task ':compileJava'.

    Compilation failed; see the compiler error output for details.

After investigation, the issue is related with the version of library "httpclient" downloaded.
In file build.gradle, it shows
compile 'org.apache.httpcomponents:httpclient:4.2.1'

But from the log, it is actually downloading version 4.3
Download http://repo1.maven.org/maven2/org/apache/httpcomponents/httpclient/4.3/httpclient-4.3.jar

Is there any solution to fix this problem?

Thanks!

Using simianarmy war file with Tomcat (instead of jetty)

Hi,

I'm trying to use chaos monkey with tomcat and I'm having some problems.

When jetty is used I don't get any errors and the catalina.out looks like this:

INFO: Scanning for root resource and provider classes in the packages:
com.netflix.simianarmy.resources
May 2, 2013 4:34:09 PM com.sun.jersey.api.core.ScanningResourceConfig logClasses
INFO: Root resource classes found:
class com.netflix.simianarmy.resources.chaos.ChaosMonkeyResource
class com.netflix.simianarmy.resources.janitor.JanitorMonkeyResource
May 2, 2013 4:34:09 PM com.sun.jersey.api.core.ScanningResourceConfig init
INFO: No provider classes found.
May 2, 2013 4:34:10 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9.1 09/14/2011 02:36 PM'
2013-05-02 16:34:12.118 - INFO BasicMonkeyServer - [BasicMonkeyServer.java:115] using client properties /client.properties
2013-05-02 16:34:12.126 - INFO BasicMonkeyServer - [BasicMonkeyServer.java:96] using standard client com.netflix.simianarmy.basic.BasicChaosMonkeyContext
2013-05-02 16:34:12.130 - INFO BasicMonkeyServer - [BasicMonkeyServer.java:51] Adding Chaos Monkey.

However when I use tomcat I get the following error and the whole initialization procedure looks different.

INFO: Scanning for root resource and provider classes in the packages:
com.netflix.simianarmy.resources
May 8, 2013 3:21:11 PM com.sun.jersey.api.core.ScanningResourceConfig logClasses
INFO: Root resource classes found:
class com.netflix.simianarmy.resources.chaos.ChaosMonkeyResource
class com.netflix.simianarmy.resources.janitor.JanitorMonkeyResource
May 8, 2013 3:21:11 PM com.sun.jersey.api.core.ScanningResourceConfig init
INFO: No provider classes found.
May 8, 2013 3:21:11 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9.1 09/14/2011 02:36 PM'
2013-05-08 15:21:17.443 - INFO ChaosMonkeyResource - [ChaosMonkeyResource.java:90] Creating a new Chaos monkey instance for the resource.
2013-05-08 15:21:17.451 - ERROR MonkeyRunner - [MonkeyRunner.java:234] monkeyFactory error, cannot make monkey from com.netflix.simianarmy.chaos.ChaosMonkey with null
java.lang.InstantiationException: com.netflix.simianarmy.chaos.ChaosMonkey
at java.lang.Class.newInstance0(Class.java:359)
at java.lang.Class.newInstance(Class.java:327)
at com.netflix.simianarmy.MonkeyRunner.factory(MonkeyRunner.java:218)
at com.netflix.simianarmy.MonkeyRunner.factory(MonkeyRunner.java:199)
at com.netflix.simianarmy.resources.chaos.ChaosMonkeyResource.(ChaosMonkeyResource.java:91)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
at com.sun.jersey.server.spi.component.ResourceComponentConstructor._construct(ResourceComponentConstructor.java:191)
at com.sun.jersey.server.spi.component.ResourceComponentConstructor.construct(ResourceComponentConstructor.java:179)
at com.sun.jersey.server.impl.resource.SingletonFactory$Singleton.init(SingletonFactory.java:137)
at com.sun.jersey.server.impl.application.WebApplicationImpl$10.f(WebApplicationImpl.java:584)
at com.sun.jersey.server.impl.application.WebApplicationImpl$10.f(WebApplicationImpl.java:581)
at com.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:193)
at com.sun.jersey.server.impl.application.WebApplicationImpl.getResourceComponentProvider(WebApplicationImpl.java:581)
at com.sun.jersey.server.impl.application.WebApplicationImpl.initiateResource(WebApplicationImpl.java:658)
at com.sun.jersey.server.impl.application.WebApplicationImpl.initiateResource(WebApplicationImpl.java:653)
at com.sun.jersey.server.impl.application.RootResourceUriRules.(RootResourceUriRules.java:124)
at com.sun.jersey.server.impl.application.WebApplicationImpl._initiate(WebApplicationImpl.java:1298)
at com.sun.jersey.server.impl.application.WebApplicationImpl.access$700(WebApplicationImpl.java:169)
at com.sun.jersey.server.impl.application.WebApplicationImpl$13.f(WebApplicationImpl.java:775)
at com.sun.jersey.server.impl.application.WebApplicationImpl$13.f(WebApplicationImpl.java:771)
at com.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:193)
at com.sun.jersey.server.impl.application.WebApplicationImpl.initiate(WebApplicationImpl.java:771)
at com.sun.jersey.server.impl.application.WebApplicationImpl.initiate(WebApplicationImpl.java:766)
at com.sun.jersey.spi.container.servlet.ServletContainer.initiate(ServletContainer.java:488)
at com.sun.jersey.spi.container.servlet.ServletContainer$InternalWebComponent.initiate(ServletContainer.java:318)
at com.sun.jersey.spi.container.servlet.WebComponent.load(WebComponent.java:609)
at com.sun.jersey.spi.container.servlet.WebComponent.init(WebComponent.java:210)
at com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:373)
at com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:556)
at javax.servlet.GenericServlet.init(GenericServlet.java:212)
at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1173)
at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:993)
at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:4187)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4496)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1041)
at org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:964)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
at org.apache.catalina.core.StandardService.start(StandardService.java:516)
at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
at org.apache.catalina.startup.Catalina.start(Catalina.java:593)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
2013-05-08 15:21:17.544 - INFO BasicMonkeyServer - [BasicMonkeyServer.java:115] using client properties /client.properties

Allow SimianArmy to operate across AWS accounts

There is a requirement for the Simian Army to operate across accounts - starting from running with Instance Profile credentials and then assuming a role in another AWS account.

This doesn't seem to be implemented at present, do you know if it has been considered in the past or there is already some supporting functionality?

Have started work and realised that there are some differences between the Monkeys in how they deal with AWS Clients, this has lead to some concerns about the implementation and if a pull request would be accepted.

In particular the Conformity rules create their own clients and don't have access to the Context configuration (although this can be passed in the constructor) and the SDB clients are long lived which could lead to some complications with credential rotation (not an issue with the AWSClient methods as they create new clients on each use).

Will continue work to get an implementation together, this is a heads up and getting a feel for if you'd accept a pull request. Any suggestions or direction appreciated.

simian army build failes on :test

Im using the gradle wrapper that comes with simian army and it just stuck here:

:test FAILED
FAILURE: Build failed with an exception.

  • What went wrong:
    Execution failed for task ':test'.

    There were failing tests. See the report at: file:///root/SimianArmy/build/reports/tests/index.html

  • Try:
    Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.
    BUILD FAILED

it's wired that in index.html, it says 0 tests, 0 failures ...
if I do --debug, event logger is here:

18:19:16.575 [DEBUG] [TestEventLogger] Gradle Worker 1 FAILED
18:19:16.577 [DEBUG] [TestEventLogger] org.gradle.api.internal.tasks.testing.TestSuiteExecutionException: Could not complete execution for test process 'Gradle Worker 1'.
18:19:16.583 [DEBUG] [TestEventLogger] at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.stop(SuiteTestClassProcessor.java:60)
18:19:16.583 [DEBUG] [TestEventLogger] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
18:19:16.584 [DEBUG] [TestEventLogger] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
18:19:16.584 [DEBUG] [TestEventLogger] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
18:19:16.585 [DEBUG] [TestEventLogger] at java.lang.reflect.Method.invoke(Method.java:606)
18:19:16.585 [DEBUG] [TestEventLogger] at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
18:19:16.585 [DEBUG] [TestEventLogger] at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
18:19:16.586 [DEBUG] [TestEventLogger] at org.gradle.messaging.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
18:19:16.586 [DEBUG] [TestEventLogger] at org.gradle.messaging.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
18:19:16.587 [DEBUG] [TestEventLogger] at com.sun.proxy.$Proxy2.stop(Unknown Source)
18:19:16.587 [DEBUG] [TestEventLogger] at org.gradle.api.internal.tasks.testing.worker.TestWorker.stop(TestWorker.java:113)
18:19:16.588 [DEBUG] [TestEventLogger] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
18:19:16.588 [DEBUG] [TestEventLogger] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
18:19:16.588 [DEBUG] [TestEventLogger] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
18:19:16.589 [DEBUG] [TestEventLogger] at java.lang.reflect.Method.invoke(Method.java:606)
18:19:16.589 [DEBUG] [TestEventLogger] at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
18:19:16.589 [DEBUG] [TestEventLogger] at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
18:19:16.590 [DEBUG] [TestEventLogger] at org.gradle.messaging.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:355)
18:19:16.590 [DEBUG] [TestEventLogger] at org.gradle.internal.concurrent.DefaultExecutorFactory$StoppableExecutorImpl$1.run(DefaultExecutorFactory.java:66)
18:19:16.628 [DEBUG] [TestEventLogger] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
18:19:16.628 [DEBUG] [TestEventLogger] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
18:19:16.628 [DEBUG] [TestEventLogger] at java.lang.Thread.run(Thread.java:744)
18:19:16.629 [DEBUG] [TestEventLogger]
18:19:16.629 [DEBUG] [TestEventLogger] Caused by:
18:19:16.629 [DEBUG] [TestEventLogger] java.lang.VerifyError: Expecting a stackmap frame at branch target 55
18:19:16.630 [DEBUG] [TestEventLogger] Exception Details:
18:19:16.630 [DEBUG] [TestEventLogger] Location:
18:19:16.631 [DEBUG] [TestEventLogger] com/netflix/simianarmy/aws/conformity/rule/SameZonesInElbAndAsg.haveSameZones(Ljava/util/List;Ljava/util/List;)Z @25: ifnull
18:19:16.631 [DEBUG] [TestEventLogger] Reason:
18:19:16.631 [DEBUG] [TestEventLogger] Expected stackmap frame at this location.

This is gradle -d:


Gradle 1.0-milestone-3

Gradle build time: Thursday, September 8, 2011 4:06:52 PM UTC
Groovy: 1.8.6
Ant: Apache Ant(TM) version 1.8.2 compiled on December 3 2011
Ivy: non official version
JVM: 1.6.0_27 (Sun Microsystems Inc. 20.0-b12)
OS: Linux 3.2.0-55-virtual amd64

I run out of ideas...

Many thanks,
Hao

exception on empty instance group

Exception: n must be positive
java.lang.IllegalArgumentException: n must be positive
at java.util.Random.nextInt(Random.java:265)
at com.netflix.simianarmy.basic.chaos.BasicChaosInstanceSelector.select(BasicChaosInstanceSelector.java:61)
at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.doMonkeyBusiness(BasicChaosMonkey.java:89)

If the instance group size is 0 then we call Random.nextInt(0) which throws an exception, the argument must be > 0.

We should return null when instance group size is 0.

How do I enable email notification.

Hi, i'm successfully running Chaos monkey and terminating instances. I'm not, however, receiving email notifications. I've set myself, and my group up as authenticated AWS email recipients, and modified chaos.properties with those emails.

I don't see anything in the documentation, and after perusing the code, it's not obvious what is missing.

thanks, Mitchell

Janitor Monkey Using Edda - Snapshot Issue

When using Edda with Janitor Monkey, Janitor Monkey is flagging Snapshots that are not mine, but also ones that have the following properties: "ownerAlias":"amazon","ownerId":"947081328633".

Janitor monkey flagged about 5000 Snapshots that are not even in my account.

Archaius Support?

Is there some reason the monkeys aren't using archaius instead of java.util.Properties?

We now have to come up with some alternate method to configure per-environment rather than doing it just like every other app we have set up.

terminate on demand configuration problem

Did anyone try on demand termination? I think there is a typo between configuration property and BasicChaosMonkey.java.

In configuration property chaos.properties,the sample property is
//## on-demand termination of chaos monkey
//##simianarmy.chaos.terminationOndemand.enabled = false

while in the java code, it is
Line 123: String prop = NS + "terminateOndemand.enabled";

check the word "termination" and "terminate".

Gradle Build Failure

[DEBUG] [org.gradle.configuration.project.BuildScriptProcessor] Timing: Running the build script took 3.716 secs
[ERROR] [org.gradle.BuildExceptionReporter]
[ERROR] [org.gradle.BuildExceptionReporter] FAILURE: Build failed with an exception.
[ERROR] [org.gradle.BuildExceptionReporter]
[ERROR] [org.gradle.BuildExceptionReporter] * What went wrong:
[ERROR] [org.gradle.BuildExceptionReporter] nebula/plugin/netflixossproject/NetflixOssProjectPlugin : Unsupported major.minor version 51.0
[ERROR] [org.gradle.BuildExceptionReporter]
[ERROR] [org.gradle.BuildExceptionReporter] * Try:
[ERROR] [org.gradle.BuildExceptionReporter] Run with --stacktrace option to get the stack trace.
[LIFECYCLE] [org.gradle.BuildResultLogger]
[LIFECYCLE] [org.gradle.BuildResultLogger] BUILD FAILED
[LIFECYCLE] [org.gradle.BuildResultLogger]
[LIFECYCLE] [org.gradle.BuildResultLogger] Total time: 6.445 secs

java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)

CPU Burn Issue

below is my property set

simianarmy.chaos.burnmoney = false

simianarmy.chaos.ssh.user=
simianarmy.chaos.ssh.key=

Note this key works with manuall SSH via CMD line

Am getting below exception

2015-04-02 12:00:19.523 - WARN ChaosInstance - [ChaosInstance.java:105] Error making SSH connection to instance
com.google.inject.CreationException: Guice creation errors:

  1. org.jclouds.rest.config.SyncToAsyncHttpApiProvider<org.jclouds.rest.HttpClient, A> cannot be used as a key; It is not fully specified.

  2. org.jclouds.rest.config.SyncToAsyncHttpApiProvider<org.jclouds.ec2.EC2Client, A> cannot be used as a key; It is not fully specified.

  3. org.jclouds.rest.RestContext<org.jclouds.ec2.EC2Client, A> cannot be used as a key; It is not fully specified.

  4. No implementation for org.jclouds.rest.HttpClient was bound.
    at org.jclouds.rest.config.BinderUtils.bindHttpApiProvider(BinderUtils.java:109)

  5. No implementation for org.jclouds.ec2.EC2Client was bound.
    at org.jclouds.rest.config.BinderUtils.bindHttpApiProvider(BinderUtils.java:109)

5 errors
at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
at com.google.inject.Guice.createInjector(Guice.java:95)
at org.jclouds.ContextBuilder.buildInjector(ContextBuilder.java:407)
at org.jclouds.ContextBuilder.buildInjector(ContextBuilder.java:331)
at org.jclouds.ContextBuilder.buildView(ContextBuilder.java:622)
at org.jclouds.ContextBuilder.buildView(ContextBuilder.java:602)
at com.netflix.simianarmy.client.aws.AWSClient.getJcloudsComputeService(AWSClient.java:728)
at com.netflix.simianarmy.client.aws.AWSClient.connectSsh(AWSClient.java:746)
at com.netflix.simianarmy.chaos.ChaosInstance.connectSsh(ChaosInstance.java:123)
at com.netflix.simianarmy.chaos.ChaosInstance.canConnectSsh(ChaosInstance.java:101)
at com.netflix.simianarmy.chaos.ScriptChaosType.canApply(ScriptChaosType.java:61)
at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.pickChaosType(BasicChaosMonkey.java:141)
at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.doMonkeyBusiness(BasicChaosMonkey.java:121)
at com.netflix.simianarmy.Monkey.run(Monkey.java:134)
at com.netflix.simianarmy.Monkey$1.run(Monkey.java:155)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-04-02 12:00:19.539 - WARN ScriptChaosType - [ScriptChaosType.java:62] Strategy disabled because SSH credentials failed
2015-04-02 12:00:19.539 - WARN ScriptChaosType - [ScriptChaosType.java:62] Strategy disabled because SSH credentials failed

Chaos Monkey Hourly Probability bug

I discovered an interesting issue involving the way BasicChaosMonkey determines probability.

My simianarmy.properites file looks like:

simianarmy.scheduler.frequency = 1
simianarmy.scheduler.frequencyUnit = HOURS
simianarmy.scheduler.threads = 1
simianarmy.calendar.openHour = 0
simianarmy.calendar.closeHour = 24
simianarmy.calendar.timezone = America/Los_Angeles
# override to force monkey time, useful for debugging off hours
simianarmy.calendar.isMonkeyTime = true

Basically I'm running the monkeys every hour of the day. Although, Chaos Monkey is the only Monkey I have running.

Now in my chaos.properties file I have:

simianarmy.chaos.ASG.probability = 1.0

Meaning that Chaos Monkey should shut down an instance at least once per day

With I ran this setup I encountered the following output:

INFO  BasicChaosMonkey - [BasicChaosMonkey.java:306] Group testing-XXXXXXX [type ASG] enabled [prob 1.1]
INFO  BasicChaosInstanceSelector - [BasicChaosInstanceSelector.java:89] Group testing-XXXXX-XXXXXXX [type ASG] got lucky: 0.518069134797909 > 0.04583333333333334
INFO  BasicChaosInstanceSelector - [BasicChaosInstanceSelector.java:65] Randomly selecting 0 from 2 instances, excluding null

So even though a termination was triggered, it tried to select 0 instances to terminate.

At this point I went to the source code to try to find why it would be trying to select 0 instances.

I found the print statement in the selectNInstances function located in:

src/main/java/com/netflix/simianarmy/basic/chaos/BasicChaosInstanceSelector.java

64  private Collection<String> selectNInstances(Collection<String> instances, int n, String selected) {
65      logger().info("Randomly selecting {} from {} instances, excluding {}",
66          new Object[] {n, instances.size(), selected});
67      List<String> copy = Lists.newArrayList();
68      for (String instance : instances) {
69          if (!instance.equals(selected)) {
70              copy.add(instance);
71          }
72      }
73      if (n >= copy.size()) {
74          return copy;
75      }
76      Collections.shuffle(copy);
77      return copy.subList(0, n);
78  }

The function was somehow getting passed 0 for the value of int n

I found that the select function from the same file was the one calling selectNInstances:

54    public Collection<String> select(InstanceGroup group, double probability) {
55        int n = ((int) probability);
56        String selected = selectOneInstance(group, probability - n);
57        Collection<String> result = selectNInstances(group.instances(), n, selected);
58        if (selected != null) {
59            result.add(selected);
60        }
61       return result;
62     }

The value of int n is just probability converted from a long to an int.

Tracing back deeper into the code I found that select was being called by the doMonkeyBusiness() function found in:
src/main/java/com/netflix/simianarmy/basic/chaos/BasicChaosMonkey.java

135    public void doMonkeyBusiness() {
136        context().resetEventReport();
137        cfg.reload();
138        if (!isChaosMonkeyEnabled()) {
139            return;
140        }
141        for (InstanceGroup group : context().chaosCrawler().groups()) {
142            if (isGroupEnabled(group)) {
143                if (isMaxTerminationCountExceeded(group)) {
144                    continue;
145                }
146                double prob = getEffectiveProbability(group);
147                Collection<String> instances = context().chaosInstanceSelector().select(group, prob / runsPerDay);
148                for (String inst : instances) {
149                    ChaosType chaosType = pickChaosType(context().cloudClient(), inst);
150                    if (chaosType == null) {
151                        // This is surprising ... normally we can always just terminate it
152                        LOGGER.warn("No chaos type was applicable to the instance: {}", inst);
153                        continue;
154                    }
155                    terminateInstance(group, inst, chaosType);
156                }
157            }
158        }
159    }

Notice in particular line 147:

Collection<String> instances = context().chaosInstanceSelector().select(group, prob / runsPerDay);

It passes probability as prob/runsPerDay

For my situation runsPerDay is 24 and prob is 1.0 meaning that probability becomes 1/24 or .04

The problem is then back in the select function of src/main/java/com/netflix/simianarmy/basic/chaos/BasicChaosInstanceSelector.java at line 55:

int n = ((int) probability);

When the long probability is being converted to an int, it gets rounded from 0.04 down to 0.

To confirm that this was indeed the case, I changed my base probability to 24.0 and chaos monkey worked as intended.

No reporting from most monkeys?

Hi, I finally got the proper credentials and everything seems to be working, except for one thing -- while I get a lot of "Conformity monkey execution summary" emails I don't get any notifications from the other monkeys. In particular I've been waiting for a janitor monkey report notifications but it has so far failed to come. I've gone through logs and there doesn't seem to be an exception anywhere that I know of except for the log4j "multiple bindings" one, and I just confirmed that my config files have the other monkeys enabled.

Again this is on Amazon Linux (small instance). Any idea what could be going on?

Janitor Monkey not cleaning up volumes

Hi, I have quite a large number of detached volumes (> 1k), but the janitor monkey is not cleaning them up. Looking at the logs, for each of the detached volumes it shows something like:

2013-12-18 17:17:29.479 - INFO OldDetachedVolumeRule - [OldDetachedVolumeRule.java:107] Volume vol-43f5846a is not tagged with the Janitor meta information, ignore.

I've been unable to find out anything about what this means, and how to rectify it. I would like the Janitor to cleanup volumes like this, does anyone know how?

thanks, Mitchell

Error making SSH connection to instance

When I enable the other options like burncpu, which require SSH to instance. They are giving below error:
2015-01-16 04:54:55.177 - WARN ChaosInstance - [ChaosInstance.java:105] Error making SSH connection to instance
com.google.inject.CreationException: Guice creation errors:

  1. org.jclouds.rest.config.SyncToAsyncHttpApiProvider<org.jclouds.rest.HttpClient, A> cannot be used as a key; It is not fully specified.

  2. org.jclouds.rest.config.SyncToAsyncHttpApiProvider<org.jclouds.ec2.EC2Client, A> cannot be used as a key; It is not fully specified.

  3. org.jclouds.rest.RestContext<org.jclouds.ec2.EC2Client, A> cannot be used as a key; It is not fully specified.

  4. No implementation for org.jclouds.rest.HttpClient was bound.
    at org.jclouds.rest.config.BinderUtils.bindHttpApiProvider(BinderUtils.java:109)

  5. No implementation for org.jclouds.ec2.EC2Client was bound.
    at org.jclouds.rest.config.BinderUtils.bindHttpApiProvider(BinderUtils.java:109)

5 errors
at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
at com.google.inject.Guice.createInjector(Guice.java:95)
at org.jclouds.ContextBuilder.buildInjector(ContextBuilder.java:407)
at org.jclouds.ContextBuilder.buildInjector(ContextBuilder.java:331)
at org.jclouds.ContextBuilder.buildView(ContextBuilder.java:622)
at org.jclouds.ContextBuilder.buildView(ContextBuilder.java:602)
at com.netflix.simianarmy.client.aws.AWSClient.getJcloudsComputeService(AWSClient.java:728)
at com.netflix.simianarmy.client.aws.AWSClient.connectSsh(AWSClient.java:746)
at com.netflix.simianarmy.chaos.ChaosInstance.connectSsh(ChaosInstance.java:123)
at com.netflix.simianarmy.chaos.ChaosInstance.canConnectSsh(ChaosInstance.java:101)
at com.netflix.simianarmy.chaos.ScriptChaosType.canApply(ScriptChaosType.java:61)
at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.pickChaosType(BasicChaosMonkey.java:141)
at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.doMonkeyBusiness(BasicChaosMonkey.java:121)
at com.netflix.simianarmy.Monkey.run(Monkey.java:134)
at com.netflix.simianarmy.Monkey$1.run(Monkey.java:155)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

  • Can you please help. I already gave the ssh keypath and ssh user in chaos.properties file

    simianarmy.chaos.ssh.user = root
    simianarmy.chaos.ssh.key = ~/Mridul.pem

Thanks in Advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.