Hi -- I was testing out a modification to the chaos monkey, and got an exception against the following line of code when I set the frequency to DAYS (with probability to 1.0, and frequency set to 1).
In BasicChaosInstanceSelector.selectOneInstance
Validate.isTrue(probability < 1)
The probability can actually end up being Double.POSITIVE_INFINITY.
Here's how....
Even if you have monkeyTime turned on, if you run the monkey with DAYS as its unit of frequency, the BasicChaosMonkey gets an open and close hour for the day in its constructor. It then computes the millisecond diff between the two, in its constructor.
long units = freqUnit.convert(close.getTimeInMillis() - open.getTimeInMillis(), TimeUnit.MILLISECONDS);
runsPerDay = units / ctx.scheduler().frequency();
Even though I'm using DAYS, this works out to a 6 hour block. The conversion from MILLISECONDS to DAYS == 0.25, but as this computation returns a long, the fractional part is dropped, and units now equals zero.
This means (see next line) that it will run zero times per day (0/1 == 0).
Now, when doMonkeyBusiness() runs, it comes up with a selection coefficient by dividing the supplied probability by the runsPerDay.
Collection<String> instances = context().chaosInstanceSelector().select(group, prob / runsPerDay);
1.0 / 0 == Double.POSITIVE_INFINITY.
The validation check now fails, because POSITIVE_INFINITY > 1, but the code is looking for a value < 1.
To get it running, I allow POSITIVE_INFINITY., as well as a probability of 1.
So, to fix this, do change the probability check to
Validate.isTrue(probability < 1 || probability == Double.POSITIVE_INFINITY)
or do we do something else.
As it stands, running the schedule using DAYS does not work. I can prove this with a unit test.
@Test
public void testFullProbability() {
TestChaosMonkeyContext ctx = new TestChaosMonkeyContext("fullProbability.properties") {
@Override
public MonkeyScheduler scheduler() {
return new MonkeyScheduler() {
@Override
public int frequency() {
return 1;
}
@Override
public TimeUnit frequencyUnit() {
return TimeUnit.DAYS;
}
@Override
public void start(Monkey monkey, Runnable run) {
Assert.assertEquals(monkey.type().name(), monkey.type().name(), "starting monkey");
run.run();
}
@Override
public void stop(Monkey monkey) {
Assert.assertEquals(monkey.type().name(), monkey.type().name(), "stopping monkey");
}
};
}
;
};
ChaosMonkey chaos = new BasicChaosMonkey(ctx);
chaos.start();
chaos.stop();
List<InstanceGroup> selectedOn = ctx.selectedOn();
List<String> terminated = ctx.terminated();
Assert.assertEquals(selectedOn.size(), 4);
Assert.assertEquals(selectedOn.get(0).type(), TestChaosMonkeyContext.CrawlerTypes.TYPE_A);
Assert.assertEquals(selectedOn.get(0).name(), "name0");
Assert.assertEquals(selectedOn.get(1).type(), TestChaosMonkeyContext.CrawlerTypes.TYPE_A);
Assert.assertEquals(selectedOn.get(1).name(), "name1");
Assert.assertEquals(selectedOn.get(2).type(), TestChaosMonkeyContext.CrawlerTypes.TYPE_B);
Assert.assertEquals(selectedOn.get(2).name(), "name2");
Assert.assertEquals(selectedOn.get(3).type(), TestChaosMonkeyContext.CrawlerTypes.TYPE_B);
Assert.assertEquals(selectedOn.get(3).name(), "name3");
Assert.assertEquals(terminated.size(), 4);
}
Note that the test report shows a failure because 4 instance were not terminated, but the reason termination stops is the validation error against infinity (just look at the console).
I verified in the debugger that runs per day is zero, and probability for that run is infinity (both in the test case and actual server execution).
Probably a better fix is this:
if (TimeUnit.DAYS == ctx.scheduler().frequencyUnit()) {
runsPerDay = ctx.scheduler().frequency();
} else {
TimeUnit freqUnit = ctx.scheduler().frequencyUnit();
long units = freqUnit.convert(close.getTimeInMillis() - open.getTimeInMillis(), TimeUnit.MILLISECONDS);
runsPerDay = units / ctx.scheduler().frequency();
}
I'm going to submit a PR do apply the second fix.
Matt