morinim / vita Goto Github PK
View Code? Open in Web Editor NEWVita - Genetic Programming Framework
License: Mozilla Public License 2.0
Vita - Genetic Programming Framework
License: Mozilla Public License 2.0
C++17 has the standard std::filesystem::path
class to represent paths on a filesystem.
We should use this class to store paths.
A list of interested functions / structures:
environment.statistics.dir
merge_path
in the utility filedataframe::read_xrff(const std::string &)
, dataframe::read_csv(const std::string &)
, dataframe::read(const std::string &)
(in the current form they seem to describe functions parsing an XML document contained in a string)src_problem::read
, src_problem::setup_symbols
, src_problem::src_problem
examples/forex/trade_simulator.h
merge_path
utility function is now obsoleteThe current signature is misleading: it seems to describe a function parsing an XML document contained in a string.
With C++17, we get class template argument deduction. It is based on template argument deduction for function templates and allows us to get rid of the need for clumsy make_XXX
functions.
The typical implementation should be performance equivalent.
Check small objects optimization:
Implementations are encouraged to avoid dynamic allocations for small objects, but such an optimization may only be applied to types that for which
std::is_nothrow_move_constructible
returns true.
Kahan summation algorithm significantly reduces the numerical error in the total obtained by adding a sequence of finite precision floating point numbers, compared to the obvious approach. This is done by keeping a separate running compensation (a variable to accumulate small errors).
This could be a good integration for distribution class.
Hello,
I want to use this library in my own project, but I didn't want to place all source files in my project. Therefore, I compile a static Library (libvita.a) as README introducing. But when I want to include it in my project, it raised errors that I didn't have the header file (The namesapces are not declared.).
So, does it exsit one header file which includes all interfaces or I need to include all .h files in my project together with the library file.
Thanks.
An individual that has been replaced may, depending on his fitness, be sent to Valhalla from whence it may return to the population in some later generation.
A similar idea, performing an N
-runs evolution, is to send to Valhalla the best individuals of the first N -m
runs, reusing them in the last m
runs (the Ragnarök? :-)
C++17 simplify nested namespace definition:
namespace A::B::C {}
is equivalent to
namespace A { namespace B { namespace C {
} } }
docopt is clearer and simpler.
Moreover it's a small library that can be embedded removing another Boost-dependency.
The wiki references a Visual Studio guide as TBA. However, it doesn't seem like Vita compiles at all when generating a solution from CMake (VS19 & C++14). If it does compile, could a guide be added, and if not, document it as non-functioning.
C++17 introduces the [[maybe_unused]]
attribute for the same purpose of our UNUSED
macro (defined in compatibility_patch.h
).
Classes should specify their invariants: what is true before and after executing any public method.
The function used to check the consistency of an object (debug()
) should be renamed is_valid()
(the most frequently used name);
Eliminate clutter and redundant checks.
A common pattern to implement invariants in classes is for the constructor of the class to throw an exception if the invariant is not satisfied. Since methods preserve the invariants, they can assume the validity of the invariant and need not explicitly check for it.
int A::method(B &b, int v)
{
Expects(b.debug()); // REDUNDANT CHECK
Expects(v > 0); // REQUIRED
// ...
// embarrassing code
// ...
Ensures(b.debug()); // REDUNDANT CHECK
Ensures(v > 0); // REQUIRED
return v;
}
The debug()
/is_valid()
method should exist only in the debug build, thus it does not add to code bloat.
References:
One, crucial use case are error codes. For other guidelines about where to apply see: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0600r0.pdf
Before evaluating an individual, we could check if identical individuals (clones) are already present in the population.
When the number of clones (
n
) is greater than zero, the actual fitness assigned to the individual is multiplied byS
(the parameter is called the clone scaling factor).While a continous range of values is possible, in many programs
S
is set either to1
(no clone scaling) or to0
(clone extermination).
(from "Evolving Assembly Programs: How Games Help Microprocessor Validation". Corno, Sanchez, Squillero)
Because of the hash table based fitness scoring, in Vita we cannot assign different fitness values to syntactically equivalent individuals.
Anyway the hash table can be augmented with information used to calculate an approximation of n
and the evaluator_proxy
can be modified to use these information.
std::byte
is a distinct type implementing the concept of byte. A byte is not an integer or a character and therefore not open to programmer errors.
The EUPL is the European Union Public Licence, published by the European Commission. It
has been studied and drafted as from 2005 and launched in January 2007. The current
version is the multilingual EUPL v1.1 (January 2009).
The EUPL is also a "share alike”" (or copyleft) licence resulting from the aim to avoid exclusive appropriation of the covered software. The EUPL is share alike on both source and object code.
Add something like the Apache Software Foundation Corporate Contributor License Agreement.
I ran Vita on my dataset and the formula it output was:
[000] FADD 5.0 [033]
[033] FSUB [051] [042]
[042] FADD X78 X122
[051] FSUB [052] X78
[052] FMUL X122 X37
What is the best way for me to get predictions from that? Do you have predictions function, or a function to convert that output to standard formula format? I do understand when it says
[000] FADD 5.0 [033]
it is saying to add 5 to formula #33, and formula #33 is shown on the next line of the results.
But I would need to write my own script to convert your formula format to a formula I can use to make actual predictions, wouldn't I?
std::sample
is computationally simpler and more direct than the current approach (std::shuffle
+ std::move
).
E.g. is_floating_point_v<T>
instead of is_floating_point<T>::value
Probably the best way is via boost::serialization
(but we have to give more careful consideration to Google Protocol Buffer).
Some resources to check:
CMake v3.9 finally supports LTO.
Here's an example code to show how it works:
cmake_minimum_required(VERSION 3.9.4)
include(CheckIPOSupported)
check_ipo_supported(RESULT supported OUTPUT error)
add_executable(example Example.cpp)
if (supported)
message(STATUS "IPO / LTO enabled")
set_property(TARGET example PROPERTY INTERPROCEDURAL_OPTIMIZATION TRUE)
else ()
message(STATUS "IPO / LTO not supported: <${error}>")
endif()
See also:
Once you split up the data into train, validation and test set, chances are close to 100% that your already skewed data becomes even more unbalanced for at least one of the three resulting sets.
Think about it: let’s say your data set contains 1000 records and of those 20 are labelled as “fraud”. As soon as you split up the data set into train, validation and test set, it is possible that the train set contains the majority of the “fraud”-records or maybe even all of them. Although not as likely, the same could happen for the validation or test set, which is even worse because then the machine learning algorithm has no chance at all to learn the hidden patterns of “fraud”-records.
This is why you shouldn’t use random sampling when constructing the three different data sets, but stratified sampling instead. It assures that the train, validation and test sets are well balanced. Therewith the already existing problem of skewed classes is not intensified, which is what you want when creating high quality models.
The behavior of std::advance
is undefined if the specified sequence of increments would require that a non-incrementable iterator (such as the past-the-end iterator) is incremented.
This may happen when n > 1
.
Also when n
is greater than 1
the arithmetic mean is wrong (considering all the elements of the dataset instead of the correct subset).
What is the best way to run Vita multicore?
xorshift64* / xorshift1024* (http://vigna.di.unimi.it/ftp/papers/xorshift.pdf) are very fast PRNGs with interesting properties and could replace Mersenne Twister.
Accuracy is not useful when the two classes are of very different sizes (for example, if there were 95 cats and only 5 dogs in the data set, the classifier could easily be biased into classifying all the samples as cats. The overall accuracy would be 95%, but in practice the classifier would have a 100% recognition rate for the cat class but a 0% recognition rate for the dog class).
The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary (two-class) classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes. The MCC is in essence a correlation coefficient between the observed and predicted binary classifications; it returns a value between −1 and +1. A coefficient of +1 represents a perfect prediction, 0 no better than random prediction and −1 indicates total disagreement between prediction and observation.
We can use the averaged table of confusion for multiclass problems (see http://en.wikipedia.org/wiki/Confusion_matrix).
I was wondering if anyone has tried Ji Ni's replacement for
x/y
withx/sqrt(1+y*y)
?This seems like a neat idea and I am interested to learn if it works well for others too.
Bill Langdon
[From Yahoo Groups - Genetic Programming]
gcc 7
has added a default fallthrough warning (-Wimplicit-fallthrough
enabled by -Wextra
).
To suppress this warning C++17 provides a standard way: the [[fallthrough]];
attribute.
In C++11 / C++14, with gcc, it's also possible to add a //-fallthrough
comment to silence the warning. This is the current approach but the C++17 attribute specifier sequence is a better option.
Which is the optimal age_gap / layers / individuals / aging scheme combination of ALPS algorithm?
We should probably make it a function of dataset size / generation...
CMake can generate pkg-config .pc files for packages. The .pc file serves as a Rosetta stone for many build systems.
A good basic reference for pkg-config .pc syntax is helpful.
Relevant code:
# this fragment generates build/my_package.pc
#
# cmake -B build -DCMAKE_INSTALL_PREFIX=~/mylib
cmake_minimum_required(VERSION 3.0)
project(mylib
LANGUAGES C
HOMEPAGE_URL https://github.invalid/username/mylib
DESCRIPTION "example library"
VERSION 1.1.2)
add_library(mine mine.c)
add_library(support support.c)
set(target1 mine)
set(target2 support)
set(pc_libs_private)
set(pc_req_private)
set(pc_req_public)
configure_file(my_package.pc.in my_package.pc @ONLY)
and
# this template is filled-in by CMake `configure_file(... @ONLY)`
# the `@....@` are filled in by CMake configure_file(),
# from variables set in your CMakeLists.txt or by CMake itself
#
# Good tutoral for understanding .pc files:
# https://people.freedesktop.org/~dbn/pkg-config-guide.html
prefix="@CMAKE_INSTALL_PREFIX@"
exec_prefix="${prefix}"
libdir="${prefix}/lib"
includedir="${prefix}/include"
Name: @PROJECT_NAME@
Description: @CMAKE_PROJECT_DESCRIPTION@
URL: @CMAKE_PROJECT_HOMEPAGE_URL@
Version: @PROJECT_VERSION@
Requires: @pc_req_public@
Requires.private: @pc_req_private@
Cflags: -I"${includedir}"
Libs: -L"${libdir}" -l@target1@ -l@target2@
Libs.private: -L"${libdir}" -l@target1@ -l@target2@ @pc_libs_private@
Possible problems: How to generate .pc (pkg-config) file supporting –prefix of the cmake –install?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.