evvo-labs / evvo Goto Github PK
View Code? Open in Web Editor NEWSolve multi-objective optimization problems with distributed evolutionary algorithms
License: Apache License 2.0
Solve multi-objective optimization problems with distributed evolutionary algorithms
License: Apache License 2.0
Is your documentation request related to a problem? Please describe.
Describe the documentation you'd like
An explanation on how runAsync won't block, and the future will resolve when the call to run ends.
Creator, mutator, deletor, and objective functions are closures, and therefore not serializable without effort.
SIP 21 "Spores" solves this, and we should use it behind-the-scenes to convert user-provided closures to Spores. Without Spores, we have to demand more of end users, which is unacceptable.
We added scalastyle, but magic numbers should be allowed in test/ but not in src. Configure the style checker differently for different folders.
We need to test the Akka cluster deployment system on real networked servers, not just locally. This may require deeper integration with AWS/GCP for an automated, replicatable environment.
Maybe we want to suggest, but not require, an organization based on:
In the 1980's, there was an idea for an AI Architecture going around called Blackboard Systems: You imagine a problem being worked on "at the blackboard" and different students (knowledge sources) are coming up to the board, making a contribution, then sitting back down. Multi-agent systems in an evolutionary context are a little bit like blackboard systems except
I personally think that building intelligence into agents is an interesting idea but maybe over-rated. The power of evolutionary systems stems not from trying to make sure agents pick the right things to work on, but rather that all agents are collectively driving the population forward by (usually) working on trying to improve the best solutions available - the Pareto frontier. We ensure this simply by regularly culling dominated solutions. But who knows - its an exciting area of research to imagine agents that can learn when to run and on what solutions they are most likely to have a positive impact. Maybe this will make applications based on our framework run faster and produce better results.
We made some initial choices about default behaviors: pick N solutions. Apply a mutation function to all N solutions - output N modified solutions. Wait for S seconds. I think it's reasonable to think of a mutator as a function: Soln(N)=>Soln(N). But N might be 1, or some fixed value C, or maybe a range, or maybe ALL. Maybe an agent wants random solutions or maybe it wants solutions that satisfy some filter function: Soln=>Boolean.
So we picked N=32 and decided that the mutator function should run on each of the 32 retrieved solutions and output N modified solutions. This is so random! Here are some other possibilities:
And having now produced one or more solutions as output, maybe we let the agents decide what to actually submit so as to avoid, for example, filling up the population with dominated or near-duplicate solutions. The default filtering strategy is just NoFilter - submit everything.
So maybe in the end an agent is defined by
Deleters might have a selection strategy and a function Soln=>Boolean (keep or discard?), and maybe an activation strategy as well (run when population > 300 and its been >5 seconds since my last run)
Creators might have activation and multiplexing (?) on a function that takes the problem instance and produces a solution.
An outstanding issue is that if we provide this sort of pluggable template, do we still want to allow a power user the option to have complete control over the behavior of the agent. If so, we might want the notion of a "Custom" agent whose actions are entirely dictated by the programmer who implements some sort of run() method and it's completely open-ended as far as what that agent can do.
After all, he's committed code to the codebase!
A good sample problem that highlights the need for multi-objective optimization is optimizing machine learning models for accuracy as well as multiple definitions of fairness (see AIF360 for notions of fairness).
Describe the bug
The SLF4J loggers defined in logback.xml
are commented out (I believe because they were't working).
To Reproduce
Run, there is no log file
Expected behavior
The log file specified in logback.xml
should be created and full of logs when the program is run with loggers enabled.
We found that running multiple islands locally is faster than one island. We don't have explicit evidence that emigration is happening.
We should write a test for emigration that ensure it is happening, and solutions are actually being transferred between islands.
Maybe we need an integration testing framework that will run a specific problem and then search the logs for specific keywords so we can assert that emigration logs are being generated.
Co-authored-by @drassaby
The title says it all. We should also update CONTRIBUTING.MD with a message saying that you should add your name to CONTRIBUTORS.md on your first commit?
This is required for people to use the library
When this happens, the README needs to be updated with instructions on how to download it, currently there is only a TODO.
We should implement bitstrings with basic creators, mutators, and deletors built in, so that users only have to provide a fitness function. I think the calling API at https://github.com/jenetics/jenetics in the README is a good starting point.
It is easier to create new issues (both for current maintainers and new ones) if we provide issue templates. Some that I think would be useful:
StopAfter
would read well, and convenience constructors would help it read even smoother. Eg:
islandManager.run(StopAfter.duration(1.second))
or
islandManager.run(StopAfter.pointIsDominated(Map(
"ObjectiveA" -> 100,
"ObjectiveB" -> 777,
…)
Whatever the format of our examples, whether a GUI or just running on the command line, we'll need to publish executables that run the example. This is necessary for v0, so that new visitors have something to download and play around with.
It also is required for professor matching - running from IntelliJ won't cut it, especially if we want to use the Discovery cluster
This will make it easier to: ensure all commits pass scalastyle, let us apply the scalariform formatter to all changed files on commit, etc. Also, because it reads configuration from a file that will be in this repo, onboarding new contributors will be easy - this is why I put this task in the "Open Source" project.
Overcommit docs are here https://github.com/sds/overcommit
Is your documentation request related to a problem? Please describe.
Our README.md
is quickly falling out of date. While some of this is related to old code examples that will be easier to maintain when #67 is completed (for example, currently in master (1001baf) the call construct an IslandManager
should be a call to a LocalIslandManager
), some are more conceptual changes in the framework (local vs remote code execution).
Describe the documentation you'd like
Please review the whole README
for these inconsistencies. All code examples should be runnable too.
Additional context
This should definitely be done before v0.
Co-authored with @julian-zucker
Is your feature request related to a problem? Please describe.
It's difficult to construct a Scored, even if you have the Objectives and Solution you want to score.
Describe the solution you'd like
A convenience constructor (or a factory method on a companion object) that will take multiple Objectives and a Solution and produce the Scored for me.
Our model as shown in docs/ARCHITECTURE.md
is out of date. The IslandManager no longer hosts the actor system but instead acts as an Akka router. Also, there is no text describing the IslandManager
in ARCHITECTURE.md
Describe the bug
Travis isn't running tests on builds. (I think we disabled this when we had to package the jar 100 times while setting up dockerization/remoting).
To Reproduce
Look at any travis build:
DiscoverySuite:
Run completed in 289 milliseconds.
Total number of tests run: 0
Suites: completed 1, aborted 0
Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0
No tests were executed.
[INFO]
[INFO] --- scalatest-maven-plugin:2.0.0:test (alltests) @ evvo ---
Discovery starting.
Discovery completed in 191 milliseconds.
Run starting. Expected test count is: 0
DiscoverySuite:
Run completed in 302 milliseconds.
Total number of tests run: 0
Suites: completed 1, aborted 0
Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0
No tests were executed.
Expected behavior
All tests should be run on mvn build, and therefore run by travis before approval.
Is your documentation request related to a problem? Please describe.
I want to know how to deploy Evvo RemoteActorSystems to GCP Compute Engine instances.
Describe the documentation you'd like
Please add a document in doc/
on how to set up these machines.
Additional context
Is your feature request related to a problem? Please describe.
Right now, StopAfter can only wait a pre-specified time, which is not helpful if you just need one solution that scores better than a specific point.
Describe the solution you'd like
Add a new method (named pointDominated
?) on the StopAfter companion object that takes a point (a Map[(String, OptimizationDirection), Double
) and produces a StopAfter
that will stop when a point is dominated.
(Passing it like this is a bad API, but until we fix #76, we won't have a way to create just the score part, so let's hold off.)
Describe alternatives you've considered
We could also pass a paretofrontier and stop when any point is dominated, but that feels less usable.
Additional context
It's okay to break the old calling API! We don't need to create StopAfter by constructor, having methods on the object for making StopAfters with each of duration
, pointDominated
, etc works.
Change to objective
Is your feature request related to a problem? Please describe.
We aren't sure whether Evvo is actually faster than simple one-threaded or one-machine thread-parallel solutions. We should test a cluster against common benchmarks, against other libraries that aren't network-parallel.
Describe the solution you'd like
We should run against common multi-objective benchmarks, and publish the results in our documentation/brag more generally!
We need performance benchmarks better than the existing list-sorting tests. Ideally, something like TSP, although we will also want to record how long it takes to find a solution that is acceptable. Our current benchmarking strategy of "how long until a perfect solution is found in 100% of runs" doesn't reflect the nature of optimization problems.
This deletor will take a function from two points in Euclidean space (the score space) to a single double, and a cutoff. It will remove any points closer than the cutoff to another point in the sample it chooses.
Is your feature request related to a problem? Please describe.
Right now, we have to manually build and deploy docker images, so they will drift out of date pretty fast.
Describe the solution you'd like
We should have travis build a docker image (tagged with latest) on a merge into master, and tag docker images for our releases.
Describe alternatives you've considered
I don't know if there are other ways to build docker images automatically, so I haven't really considered alternatives. Centralizing into Travis feels intuitive to me though.
Right now, all logging is pre-configured in logback.xml, and we don't have a way for end users to modify their logging configuration. We should do that.
Structured logging will allow us to visualize and troubleshoot more easily.
We need the actual data for professor matching, not just the data we generate. Producing a better schedule (according to our fitness functions) than the schedule that CCIS actually produced will show the viability of this system for generating professor matchings.
Not the parts that include fame and fortune, but the idea for evvo going forward. This will solidify our priorities and give new contributors an idea of where their time would be best spent.
Running this remotely on GCP was painful. Let's make it less painful, by dockerizing! This will make a cluster management tool much, much easier to write.
We should use test data to validate our solution, before we pester the administrators for data any further.
General idea on how to generate this data:
We need to add logging for:
numInvocations hitting a milestone
size of sets being deleted by deletors/created by creators and mutators
There are others to be sure. Anything that could be used to visualize agent performance goes here - average fitness increase, etc.
We need to provide a tool for running remote actor systems on multiple machines and then running islands on them.
We'll create a basic version for v0, and when that is complete make new issues with the enhancements we want.
IslandManagers are actors so that they can respond to messages about emigration of solutions. However, this means that you cannot construct them explicitly, and you need an implicit ActorSystem to call IslandManager.from(…)
.
This is a confusing API, and even we forget exactly what steps are required to construct an IslandManager. We should allow the constructor to be called directly by making it have an instance of an actor instead of being an actor.
Is your documentation request related to a problem? Please describe.
In the clustering setup and Docker docs, we use the word "Cluster" to describe what is really a "Node", as in "CLUSTER_IP".
Describe the documentation you'd like
Update all the documentation and code comments to be consistent and correct.
Additional context
There's probably similar mistakes in comments in IslandManager
The -w true
flag will do this for us from the command line / in travis, <failOnWarning>true</failOnWarning>
will do it in maven.
There are many issues:
groupId is
com.diatom`artifactId
is diatom
We need a sane default, something like "always send the pareto frontier, and merge the pareto frontier with your population". But we will also want different strategies for different problems, so it should be configurable within a few defaults - send pareto frontier/random subset, how often to send the pareto frontier, etc.
We have the basics of nodes communicating to each other, but we need islands to send each other solutions and merge those solutions with their populations to prove it's possible.
Is your documentation request related to a problem? Please describe.
Code examples in our documentation can fall behind the actual code. If an API changes, the example may no longer run. This is confusing for users.
Describe the documentation you'd like
Not really "documentation I'd like", but I want a tool that will run all scala code examples in our docs, like doctest
for python. This tool should be bundled in overcommit and travis.
Additional context
In PR #64, we had to update documentation, and could easily have forgotten. Automating the checking of documentation is good, and will make us less afraid to put code examples in documentation!
We will add the Apache license, pending @rachlin 's approval
The title pretty much says it all
Create an optimizer for arranging guests at tables. I've already done this, in java, for Diatom's pareto framework so the task here is just to translate it to scala and evvo.
Inputs: Guest list with attribute tags (Single, Groom's Friends, Democrat), avoidance / next-to constraints, table specs (# tables, # chairs per table).
Output: An assignment of guest to table and specific seat.
This would make a great example for Evvo: The Definitive Guide
Currently, in order to future-proof our code and allow for multiple implementations of classes, we borrowed a Java-ism and use traits everywhere, with case classes that extend those traits.
For example, in this file, we have a trait TCreatorFunc, and a case class CreatorFunc:
trait TCreatorFunc[Sol] extends TAgentFunc with CreatorFunctionType[Sol] {
val create: CreatorFunctionType[Sol]
override def apply(): TraversableOnce[Sol] = create()
}
case class CreatorFunc[Sol](create: CreatorFunctionType[Sol],
name: String)
extends TCreatorFunc[Sol] with CreatorFunctionType[Sol]
The trait doesn't actually help. For now, we can use the case class everywhere, and if we want to use multiple implementations of CreatorFunc in the future (which seems unlikely), we can replace it with a trait (or abstract base class) then.
This will enable future-proofed case-class matching as well:
If this works before:
scala> case class A(x: Int)
scala> A(3) match {
| case A(i) => println(i)
| }
3
We can define a trait B
that A
can extend (although let's call the new class C
, to indicate that it has changed from A
which does not extend B
).
scala> trait B {
| def x: Int
| }
|
| object B {
| def unapply(b: B): Option[Int] = b.x
| }
And the new case class, which is only different from A in that it extends B:
scala> case class C(x: Int) extends B
Now, we can still use this case class in match expressions contingent on B.
scala> C(3) match {
| case B(x) => println(x)
| }
3
So, we can drop-in the trait as a replacement if we have to. This would reduce our total lines-of-code significantly, make new contributors less confused (using traits like this is not idiomatic Scala - it's so easy to do this drop-in replacement that we don't need to be as afraid of future changes as we would be in Java), and make it clear where there are multiple implementations of a class/trait/interface and when there are case classes which are operating as data objects.
Classic scheduling problem for process manufacturing: batch orders on 1 or more machines by grade. Maximize on-time delivery and machine load balancing while minimizing grade transitions. Would be a good example for the book.
Describe the bug
https://travis-ci.org/evvo-labs/evvo/builds/539364416
overcommit failed during this Travis build, but didn't fail locally.
To Reproduce
Flaky, haven't reproduced.
Expected behavior
If overcommit --run
locally passes, the overcommit stage in travis should also pass.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.