Git Product home page Git Product logo

morinim / vita Goto Github PK

View Code? Open in Web Editor NEW
36.0 5.0 6.0 15.76 MB

Vita - Genetic Programming Framework

License: Mozilla Public License 2.0

Emacs Lisp 0.14% C++ 93.54% Python 4.71% C 0.47% CMake 0.37% MQL5 0.77% Shell 0.01%
genetic-programming genetic-algorithms differential-evolution classification symbolic-regression machine-learning artificial-intelligence cpp software-agents evolutionary-algorithms

vita's People

Contributors

daemonib avatar morinim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

vita's Issues

Use std::filesystem::path and operator/

C++17 has the standard std::filesystem::path class to represent paths on a filesystem.

We should use this class to store paths.

A list of interested functions / structures:

  • environment.statistics.dir
  • merge_path in the utility file
  • dataframe::read_xrff(const std::string &), dataframe::read_csv(const std::string &), dataframe::read(const std::string &) (in the current form they seem to describe functions parsing an XML document contained in a string)
  • src_problem::read, src_problem::setup_symbols, src_problem::src_problem
  • paths in examples/forex/trade_simulator.h
  • merge_path utility function is now obsolete

Replace vita::any with std::any

The typical implementation should be performance equivalent.

Check small objects optimization:

Implementations are encouraged to avoid dynamic allocations for small objects, but such an optimization may only be applied to types that for which std::is_nothrow_move_constructible returns true.

(from http://en.cppreference.com/w/cpp/utility/any)

Kahan summation algorithm for class distribution

Kahan summation algorithm significantly reduces the numerical error in the total obtained by adding a sequence of finite precision floating point numbers, compared to the obvious approach. This is done by keeping a separate running compensation (a variable to accumulate small errors).

This could be a good integration for distribution class.

Usage of library file (libvita.a)

Hello,

I want to use this library in my own project, but I didn't want to place all source files in my project. Therefore, I compile a static Library (libvita.a) as README introducing. But when I want to include it in my project, it raised errors that I didn't have the header file (The namesapces are not declared.).

So, does it exsit one header file which includes all interfaces or I need to include all .h files in my project together with the library file.
Thanks.

Valhalla

An individual that has been replaced may, depending on his fitness, be sent to Valhalla from whence it may return to the population in some later generation.

A similar idea, performing an N-runs evolution, is to send to Valhalla the best individuals of the first N -m runs, reusing them in the last m runs (the Ragnarök? :-)

Use C++17 nested namespaces

C++17 simplify nested namespace definition:

namespace A::B::C {}

is equivalent to

namespace A { namespace B { namespace C {
} } }

Windows Visual Studio + CMake support/guide

The wiki references a Visual Studio guide as TBA. However, it doesn't seem like Vita compiles at all when generating a solution from CMake (VS19 & C++14). If it does compile, could a guide be added, and if not, document it as non-functioning.

Remove UNUSED macro [C++17]

C++17 introduces the [[maybe_unused]] attribute for the same purpose of our UNUSED macro (defined in compatibility_patch.h).

Clean up code checking class invariants

Classes should specify their invariants: what is true before and after executing any public method.

  • The function used to check the consistency of an object (debug()) should be renamed is_valid() (the most frequently used name);

  • Eliminate clutter and redundant checks.

    A common pattern to implement invariants in classes is for the constructor of the class to throw an exception if the invariant is not satisfied. Since methods preserve the invariants, they can assume the validity of the invariant and need not explicitly check for it.

    int A::method(B &b, int v)
    {
      Expects(b.debug());  // REDUNDANT CHECK
      Expects(v > 0);      // REQUIRED
    
      // ...
      // embarrassing code
      // ...
    
      Ensures(b.debug());  // REDUNDANT CHECK
      Ensures(v > 0);      // REQUIRED
    
      return v;
    }
  • The debug()/is_valid() method should exist only in the debug build, thus it does not add to code bloat.

References:

Clone scaling

Before evaluating an individual, we could check if identical individuals (clones) are already present in the population.

When the number of clones (n) is greater than zero, the actual fitness assigned to the individual is multiplied by S (the parameter is called the clone scaling factor).

While a continous range of values is possible, in many programs S is set either to 1 (no clone scaling) or to 0 (clone extermination).

(from "Evolving Assembly Programs: How Games Help Microprocessor Validation". Corno, Sanchez, Squillero)

Because of the hash table based fitness scoring, in Vita we cannot assign different fitness values to syntactically equivalent individuals.

Anyway the hash table can be augmented with information used to calculate an approximation of n and the evaluator_proxy can be modified to use these information.

Use `std::byte` instead of `char`

std::byte is a distinct type implementing the concept of byte. A byte is not an integer or a character and therefore not open to programmer errors.

Relicensing the project with EUPL

The EUPL is the European Union Public Licence, published by the European Commission. It
has been studied and drafted as from 2005 and launched in January 2007. The current
version is the multilingual EUPL v1.1 (January 2009).

The EUPL is also a "share alike”" (or copyleft) licence resulting from the aim to avoid exclusive appropriation of the covered software. The EUPL is share alike on both source and object code.

Making Predictions

I ran Vita on my dataset and the formula it output was:
[000] FADD 5.0 [033]
[033] FSUB [051] [042]
[042] FADD X78 X122
[051] FSUB [052] X78
[052] FMUL X122 X37

What is the best way for me to get predictions from that? Do you have predictions function, or a function to convert that output to standard formula format? I do understand when it says
[000] FADD 5.0 [033]
it is saying to add 5 to formula #33, and formula #33 is shown on the next line of the results.
But I would need to write my own script to convert your formula format to a formula I can use to make actual predictions, wouldn't I?

CMake support for link time optimization (aka IPO)

CMake v3.9 finally supports LTO.

Here's an example code to show how it works:

cmake_minimum_required(VERSION 3.9.4)

include(CheckIPOSupported)
check_ipo_supported(RESULT supported OUTPUT error)

add_executable(example Example.cpp)

if (supported)
  message(STATUS "IPO / LTO enabled")
  set_property(TARGET example PROPERTY INTERPROCEDURAL_OPTIMIZATION TRUE)
else ()
  message(STATUS "IPO / LTO not supported: <${error}>")
endif()

See also:

Stratified sampling

Once you split up the data into train, validation and test set, chances are close to 100% that your already skewed data becomes even more unbalanced for at least one of the three resulting sets.

Think about it: let’s say your data set contains 1000 records and of those 20 are labelled as “fraud”. As soon as you split up the data set into train, validation and test set, it is possible that the train set contains the majority of the “fraud”-records or maybe even all of them. Although not as likely, the same could happen for the validation or test set, which is even worse because then the machine learning algorithm has no chance at all to learn the hidden patterns of “fraud”-records.

This is why you shouldn’t use random sampling when constructing the three different data sets, but stratified sampling instead. It assures that the train, validation and test sets are well balanced. Therewith the already existing problem of skewed classes is not intensified, which is what you want when creating high quality models.

Undefined behavior invoking `sum_of_errors_impl` with step argument greater than `1`

The behavior of std::advance is undefined if the specified sequence of increments would require that a non-incrementable iterator (such as the past-the-end iterator) is incremented.

This may happen when n > 1.

Also when n is greater than 1 the arithmetic mean is wrong (considering all the elements of the dataset instead of the correct subset).

Matthews correlation coefficient

Accuracy is not useful when the two classes are of very different sizes (for example, if there were 95 cats and only 5 dogs in the data set, the classifier could easily be biased into classifying all the samples as cats. The overall accuracy would be 95%, but in practice the classifier would have a 100% recognition rate for the cat class but a 0% recognition rate for the dog class).

The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary (two-class) classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes. The MCC is in essence a correlation coefficient between the observed and predicted binary classifications; it returns a value between −1 and +1. A coefficient of +1 represents a perfect prediction, 0 no better than random prediction and −1 indicates total disagreement between prediction and observation.

We can use the averaged table of confusion for multiclass problems (see http://en.wikipedia.org/wiki/Confusion_matrix).

Replacement for protected division

I was wondering if anyone has tried Ji Ni's replacement for x/y with x/sqrt(1+y*y)?

This seems like a neat idea and I am interested to learn if it works well for others too.

Bill Langdon

[From Yahoo Groups - Genetic Programming]

Use the standard [[fallthrough]] attribute

gcc 7 has added a default fallthrough warning (-Wimplicit-fallthrough enabled by -Wextra).

To suppress this warning C++17 provides a standard way: the [[fallthrough]]; attribute.

In C++11 / C++14, with gcc, it's also possible to add a //-fallthrough comment to silence the warning. This is the current approach but the C++17 attribute specifier sequence is a better option.

CMake generate pkg-config .pc

CMake can generate pkg-config .pc files for packages. The .pc file serves as a Rosetta stone for many build systems.

A good basic reference for pkg-config .pc syntax is helpful.

Relevant code:

# this fragment generates build/my_package.pc
# 
# cmake -B build -DCMAKE_INSTALL_PREFIX=~/mylib

cmake_minimum_required(VERSION 3.0)

project(mylib 
LANGUAGES C
HOMEPAGE_URL https://github.invalid/username/mylib
DESCRIPTION "example library"
VERSION 1.1.2)

add_library(mine mine.c)
add_library(support support.c)

set(target1 mine)
set(target2 support)

set(pc_libs_private)
set(pc_req_private)
set(pc_req_public) 

configure_file(my_package.pc.in my_package.pc @ONLY) 

and

# this template is filled-in by CMake `configure_file(... @ONLY)`
# the `@....@` are filled in by CMake configure_file(), 
# from variables set in your CMakeLists.txt or by CMake itself
#
# Good tutoral for understanding .pc files: 
# https://people.freedesktop.org/~dbn/pkg-config-guide.html

prefix="@CMAKE_INSTALL_PREFIX@"
exec_prefix="${prefix}"
libdir="${prefix}/lib"
includedir="${prefix}/include"

Name: @PROJECT_NAME@
Description: @CMAKE_PROJECT_DESCRIPTION@
URL: @CMAKE_PROJECT_HOMEPAGE_URL@
Version: @PROJECT_VERSION@
Requires: @pc_req_public@
Requires.private: @pc_req_private@
Cflags: -I"${includedir}"
Libs: -L"${libdir}" -l@target1@ -l@target2@
Libs.private: -L"${libdir}" -l@target1@ -l@target2@ @pc_libs_private@

Possible problems: How to generate .pc (pkg-config) file supporting –prefix of the cmake –install?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.