Git Product home page Git Product logo

htm-community / htm.core Goto Github PK

View Code? Open in Web Editor NEW

This project forked from numenta/nupic.core-legacy

145.0 29.0 73.0 300.3 MB

Actively developed Hierarchical Temporal Memory (HTM) community fork (continuation) of NuPIC. Implementation for C++ and Python

Home Page: http://numenta.org

License: GNU Affero General Public License v3.0

CMake 2.16% C 0.01% Python 33.12% PowerShell 0.18% Shell 0.61% C++ 63.69% Dockerfile 0.07% Batchfile 0.18%
nupic cpp htm ai hierarchical-temporal-memory neuromorphic-computing neural-networks anomaly-detection prediction cortical-learning sparse-distributed-memory neuroscience neuroscience-inspired-ai reproducible-research

htm.core's Introduction

NuPIC Logo

htm.core

CI Build Status

This is a Community Fork of the nupic.core C++ repository, with Python bindings. This implements the theory as described in Numenta's BAMI.

Project Goals

  • Actively developed C++ core library (Numenta's NuPIC repos are in maintenance mode only)
  • Clean, lean, optimized, and modern codebase
  • Stable and well tested code
  • Open and easier involvement of new ideas across HTM community (it's fun to contribute, we make master run stable, but are more open to experiments and larger revamps of the code if it proves useful).
  • Interfaces to other programming languages, currently C++ and Python

Features

  • Implemented in C++11 through C++17
    • Static and shared lib files for use with C++ applications.
  • Interfaces to Python 3 and Python 2.7 (Only Python 3 under Windows)
  • Cross Platform Support for Windows, Linux, OSX and ARM64
  • Easy installation. Many fewer dependencies than nupic.core, all are handled by CMake
  • Significant speed optimizations
  • Simplified codebase
    • Removed CapnProto serialization. It was pervasive and complicated the code considerably. It was replaced with simple binary streaming serialization in C++ library.
    • Removed sparse matrix libraries, use the optimized Connections class instead
  • New and Improved Algorithms
    • Revamped all algorithms APIs, making it easier for developers & researchers to use our codebase
    • Sparse Distributed Representation class, integration, and tools for working with them
  • API-compatibility with Numenta's code. An objective is to stay close to the Nupic API Docs. This is a priority for the NetworkAPI. The algorithms APIs on the other hand have deviated from their original API (but their logic is the same as Numenta's). If you are porting your code to this codebase, please follow the API Differences and consult the API Changelog.
  • The 'NetworkAPI' as originally defined by the NuPIC library includes a set of build-in Regions. These are described in NetworkAPI docs
  • REST interface for NetworkAPI with a REST server.

Installation

Prerequisites

For running C++ apps/examples/tests from binary release: none. If you want to use python, then obviously:

  • Python

    • Standard Python 3.7+ (Recommend using the latest) [Tested with 3.8, 3.11.1]
    • Standard Python 2.7
      • We recommend the latest version of 2.7 where possible, but the system version should be fine.
      • Python 2 is Not Supported on Windows, use Python 3 instead.
      • Python 2 is not tested by our CI anomore. It may still work but we don't test it. We expect to drop support for Python2 around 2020.
    • Anaconda Python 3.7+
      • On windows you must run from within 'Anaconda Prompt' not 'Command Prompt'.
      • Anaconda Python is not tested in our CI.

    Be sure that your Python executable is in the Path environment variable. The Python that is in your default path is the one that will determine which version of Python the extension library will be built for.

    • Other implementations of Python may not work.
    • Only the standard python from python.org have been tested.
    • On Linux you will need both the Python install and the Python-dev install '$ sudo apt install python3.11' '$ sudo apt install python3.11-dev'
    • You will probably also want to setup a [https://docs.python.org/3/library/venv.html(venv environment).
  • C++ compiler: c++11/17 compatible (ie. g++, clang++).

    • boost library (if not a C++17 or greater compiler that supports filesystem.) If the build needs boost, it will automatically download and install boost with the options it needs.
    • CMake 3.7+ (MSVC 2019 needs CMake 3.14+, MSVC 2022 needs CMake 3.21+).
      Install the latest using https://cmake.org/download/

Note: Windows MSVC 2019 runs as C++17 by default so boost is not needed. On linux use -std=c++17 compile option to avoid needing boost.

Building from Source

An advantage of HTM.core is its well tested self-sustained dependency install, so you can install HTM on almost any platform/system if installing from source.

Fork or download the HTM-Community htm.core repository from https://github.com/htm-community/htm.core.

To fork the repo with git:

git clone https://github.com/htm-community/htm.core

Simple Python build (any platform)

  1. Prerequisites: install the following python packages: python -m ensurepip --upgrade python -m pip install setuptools packaging pip -r install requirements.txt

  2. At a command prompt, cd to the root directory of this repository.

  3. Run: python setup.py install --user --force

    This will build and install a release version of htm.core. The --user option prevents the system installed site-packages folder from being changed and avoids the need for admin privileges. The --force option forces the package to be replaced if it already exists from a previous build. Alternatively you can type pip uninstall htm.core to remove a previous package before performing a build.

    • If you are using virtualenv you may not need the --user or --force options.

    • If you are using Anaconda Python you must run within the Anaconda Prompt on Windows. Do not use --user or --force options.

    • If you run into problems due to caching of arguments in CMake, delete the folder <path-to-repo>/build and try again. This may be only an issue when you restart a build after a failure.

  4. After that completes you are ready to import the library:

    python.exe
    >>> import htm           # Python Library
    >>> import htm.bindings  # C++ Extensions
    >>> help( htm )          # Documentation

    You can run the unit tests with

    python setup.py test

Simple C++ build

After cloning/downloading the repository, do the following:

cd path-to-repository
mkdir -p build/scripts
cd build/scripts
cmake ../..
make -j8 install
Build Artifact File Location
Static Library build/Release/lib/libhtm-core.a
Shared Library build/Release/lib/libhtm-core.so
Header Files build/Release/include/
Unit Tests build/Release/bin/unit_tests
Hotgym Dataset Example build/Release/bin/benchmark_hotgym
MNIST Dataset Example build/Release/bin/mnist_sp
REST Server Example build/Release/bin/rest_server
REST Client Example build/Release/bin/rest_client
  • A debug library can be created by adding -DCMAKE_BUILD_TYPE=Debug to the cmake command above.

    • The debug library will be put in build/Debug rather than build/Release. Use the cmake option -DCMAKE_INSTALL_PREFIX=../Release to change this.
  • The -j option can be used with the make install command to compile with multiple threads.

  • This will not build the Python interface. Use the Python build described above to build and install the python interface.

Here is an example of a Release build of your own C++ app that links to htm.core as a shared library.

#! /bin/sh
# Using GCC on linux ...
# First build htm.core from sources.
#      cd <path-to-repo>
#      mkdir -p build/scripts
#      cd build/scripts
#      cmake ../..
#      make -j4 install
#
# Now build myapp
# We use -std=c++17 to get <filesystem> so we can avoid using the boost library.
# The -I gives the path to the includes needed to use with the htm.core library.
# The -L gives the path to the shared htm.core library location at build time.
# The LD_LIBRARY_PATH envirment variable points to the htm.core library location at runtime.
g++ -o myapp -std=c++17 -I <path-to-repo>/build/Release/include myapp.cpp -L <path-to-repo>/build/Release/lib -lhtm_core -lpthread -ldl

# Run myapp 
export LD_LIBRARY_PATH=<path-to-repo>/build/Release/lib:$LD_LIBRARY_PATH
./myapp

Here is an example of a Debug build of your own C++ app that links to htm.core as a shared library.

#! /bin/sh
# Using GCC on linux ...
# First build htm.core as debug from sources.
#      cd <path-to-repo>
#      mkdir -p build/scripts
#      cd build/scripts
#      cmake ../.. -DCMAKE_BUILD_TYPE=Debug
#      make -j4 install
#
# Now build myapp
# The -g -Og tells the compiler to build debug mode with no optimize.
# We use -std=c++17 to get <filesystem> so we can avoid using the boost library.
# The -D_GLIBCXX_DEBUG setting tell compiler to compile std:: with debug
# The -I gives the path to the includes needed to use with the htm.core library.
# The -L gives the path to the shared htm.core library location at build time.
# The LD_LIBRARY_PATH envirment variable points to the htm.core library location at runtime.
g++ -g -Og -o myapp -std=c++17 -D_GLIBCXX_DEBUG -I <path-to-repo>/build/Debug/include myapp.cpp -L <path-to-repo>/build/Debug/lib -lhtm_core -lpthread -ldl

# Run myapp in the debugger
export LD_LIBRARY_PATH=<path-to-repo>/build/Debug/lib:$LD_LIBRARY_PATH
gdb ./myapp

Docker Builds

Build for Docker amd64 (x86_64)

Our Dockerfile allows easy (cross) compilation from/to many HW platforms. This docker file does the full build, test & package build. It takes quite a while to complete.

If you are on amd64 (x86_64) and would like to build a Docker image:

docker build --build-arg arch=amd64 .

Docker build for ARM64

If you are on ARM64 and would like to build a Docker image, run the command below. The CI automated ARM64 build (detailed below) uses this specifically.

docker build --build-arg arch=arm64 .

Note:

  • If you're directly on ARM64/aarch64 (running on real HW) you don't need the docker image, and can use the standard binary/source installation procedure.

Docker build for ARM64/aarch64 on AMD64/x86_64 HW

A bit tricky part is providing cross-compilation builds if you need to build for a different platform (aarch64) then your system is running (x86_64). A typical case is CI where all the standard(free) solutions offer only x86_64 systems, but we want to build for ARM.

See our ARM release workflow.

When running locally run:

docker run --privileged --rm multiarch/qemu-user-static:register
docker build -t htm-arm64-docker --build-arg arch=arm64 -f Dockerfile-pypi .
docker run htm-arm64-docker uname -a
docker run htm-arm64-docker python setup.py test

Note:

  • the 1st line allows you to emulate another platform on your HW.
  • 2nd line builds the docker image. The Dockerfile is a lightweight Alpine_arm64 image, which does full build,test&package build. It can take quite a long time. The Dockerfile-pypi "just" switches you to ARM64/aarch64 env, and then you can build & test yourself.

Automated Builds, CI

We use Github Actions to build and run multiplatform (OSX, Windows, Linux, ARM64) tests and releases.

  • the pr.yml runs on each pull-request (PR), builds for Linux(Ubuntu 20.04), Windows(2019), OSX(10.15) and checkes that all tests pass OK. This is mandatory for a new PR to be accepted.
  • release.yml is created manually by the maintainers in the release process and creates
    • binary GitHub releases
    • PyPI wheels for htm.core
    • uploads artifacts
  • arm.yml is an ARM64 build (that takes a long time) and thus is ran only daily.

CI Build Status

Linux/OSX/Windows auto build on PR @ Github Actions

ARM64 auto build @ Github Actions

This uses Docker and QEMU to achieve an ARM64 build on Actions' x86_64/amd64 hardware.

Documentation

For Doxygen see docs README. For NetworkAPI see NetworkAPI docs.

Workflow

Using IDE (Netbeans, XCode, Eclipse, KDevelop, etc)

Generate IDE solution & build.

  • Choose the IDE that interest you (remember that IDE choice is limited to your OS).
  • Open CMake executable in the IDE.
  • Specify the source folder ($HTM_CORE) which is the location of the root CMakeList.exe.
  • Specify the build system folder ($HTM_CORE/build/scripts), i.e. where IDE solution will be created.
  • Click Generate.

For MS Visual Studio 2017, 2019 or 2022 as the IDE

After downloading the repository, do the following:

  • NOTE: Visual Studio 2019 requires CMake version 3.14 or higher.
  •   Visual Studio 2022 requires CMake version 3.21 or higher.
    
  • CD to the top of repository.
  • Double click on startupMSVC.bat
    • This will setup the build, create the solution file (build/scripts/htm.cpp.sln), and start MS Visual Studio.
  • Select Release or Debug as the Solution Configuration. Solution Platform must remain at x64.
  • Build everything. This will build the C++ library.
  • In the solution explorer window, right Click on 'unit_tests' and select Set as StartUp Project so debugger will run unit tests.
  • If you also want the Python extension library; then delete the build folder and then in a command prompt, cd to root of repository and run python setup.py install --user --force.

For Visual Studio Code (VSCode) as the IDE

Visual Studio Code can be used on any of our three platforms (Windows, Linux, OSx). You will need the C/C++ Tools extension by Microsoft and CMake Tools by vector-of-bool.

Startup Visual Studio Code and open the folder containing your htm.core repository which will set the workspace. Let it scan for a kit. Clear all of the notifications (lower right) so it can let you do the scan.

Then set your project level settings by initializing /.vscode/settings.json to the following as a starting point.

For Windows 10:

  .vscode\settings.json          
    {
    "cmake.buildDirectory": "${workspaceRoot}/build/scripts",
    "cmake.generator": "Visual Studio 16 2019",
    "cmake.platform": "x64",
    }

To use Visual Studio 2017 as the tool chain, change generator to "Visual Studio 15 2017" and set the platform to "win32". Note that the ninja generator, the default, did not work very well on Windows.

For Ubuntu and OSx:

   .vscode/settings.json
    {
    "cmake.buildDirectory": "${workspaceRoot}/build/scripts",
    "cmake.generator": "gcc",
    }

For Eclipse as the IDE

  • File - new C/C++Project - Empty or Existing CMake Project
  • Location: ($HTM_CORE) - Finish
  • Project properties - C/C++ Build - build command set "make -C build/scripts VERBOSE=1 install -j [number of your's CPU cores]"
  • There can be issue with indexer and boost library, which can cause OS memory to overflow -> add exclude filter to your project properties - Resource Filters - Exclude all folders that matches boost, recursively
  • (Eclipse IDE for C/C++ Developers, 2019-03 on Ubuntu 18.04)

For all new work, tab settings are at 2 characters, replace tabs with spaces. The clang-format is LLVM style.

Debugging

Creating a debug build of the htm.core library and unit tests is the same as building any C++ application in Debug mode in any IDE as long as you do not include the python bindings. i.e. do not include -DBINDING_BUILD=Python3 in the CMake command.

(on Linux)
   rm -r build
   mkdir -p build/scripts
   cd build/scripts
   CMake -DCMAKE_BUILD_TYPE=Debug ../..

However, if you need to debug the python bindings using an IDE debugger it becomes a little more difficult. The problem is that it requires a debug version of the python library, python37_d.lib. It is possible to obtain one and link with it, but a way to better isolate the python extension is to build a special main( ) as explained in debugging Python.

Be aware that the CMake maintains a cache of build-time arguments and it will ignore some arguments passed to CMake if is already in the cache. So, between runs you need to clear the cache or even better, entirely remove the build/ folder (ie. git clean -xdf).

Python development mode

When you run python setup.py install --user --force it will copy python scripts into build/Release/distr/src and deploy as package into user site-packages (on linux in /home/.local/). To avoid deploying there use "development mode": python setup.py develop --user --force This will create link file in site-packages pointing to the distr folder. You can modify distr scripts and your changes will be reflected immediately. Note: Unfortunately calling this command again will not overwrite distr scripts, so you need to delete distr folder first.

To remove the link file call:

python setup.py develop --user --uninstall

Note: you can always check from where you are importing sources, by typing into python console e.g.:

import htm.bindings.sdr
print(htm.bindings.sdr.__file__)

Note2: It is obvious, but anyway - do not use --user option while using python environment managers(Anaconda..)

Dependency management

The installation scripts will automatically download and build the dependencies it needs.

Once these third party components have been downloaded and built they will not be re-visited again on subsequent builds. So to refresh the third party components or rebuild them, delete the folder build/ThirdParty and then re-build.

If you are installing on an air-gap computer (one without Internet) then you can manually download the dependencies. On another computer, download the distribution packages as listed and rename them as indicated. Copy these to ${REPOSITORY_DIR}/build/ThirdParty/share on the target machine.

Name to give it Where to obtain it
libyaml.zip https://github.com/yaml/libyaml/archive/refs/tags/0.2.5.tar.gz
boost.tar.gz (*note3) https://dl.bintray.com/boostorg/release/1.72.0/source/boost_1_72_0.tar.gz
googletest.tar.gz https://github.com/google/googletest/archive/refs/tags/release-1.12.1.tar.gz
eigen.tar.bz2 https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.tar.gz
mnist.zip (*note4) https://github.com/wichtounet/mnist/archive/3b65c35ede53b687376c4302eeb44fdf76e0129b.zip
pybind11.tar.gz https://github.com/pybind/pybind11/archive/refs/tags/v2.10.1.tar.gz
cereal.tar.gz https://github.com/USCiLab/cereal/archive/refs/tags/v1.3.2.tar.gz
sqlite3.tar.gz https://www.sqlite.org/2022/sqlite-autoconf-3380200.tar.gz
digestpp.zip https://github.com/kerukuro/digestpp/archive/34ff2eeae397ed744d972d86b5a20f603b029fbd.zip
cpp-httplib.zip(*note4) https://github.com/yhirose/cpp-httplib/archive/refs/tags/v0.11.3.zip
  • note3: Boost is not required for any compiler that supports C++17 with std::filesystem (MSVC2017, gcc-8, clang-9).
  • note4: Used for examples. Not required to run but the build expects it.

Testing

We support test-driven development with reproducible builds. You should run tests locally, and tests are also run as a part of the CI.

C++ & Python Unit Tests:

There are two sets (somewhat duplicit) tests for c++ and python.

  • C++ Unit tests -- to run: ./build/Release/bin/unit_tests
  • Python Unit tests -- to run: python setup.py test (runs also the C++ tests above)
    • py/tests/
    • bindings/py/tests/

Examples

Python Examples

There are a number of python examples, which are runnable from the command line. They are located in the module htm.examples.

Example Command Line Invocation: $ python -m htm.examples.sp.hello_sp

Look in:

  • py/htm/examples/
  • py/htm/advanced/examples/

Hot Gym

This is a simple example application that calls the SpatialPooler and TemporalMemory algorithms directly. This attempts to predict the electrical power consumption for a gymnasium over the course of several months.

To run python version:

python -m htm.examples.hotgym

To run C++ version: (assuming current directory is root of the repository)

./build/Release/bin/benchmark_hotgym

There is also a dynamically linked version of Hot Gym (not available on MSVC). You will need specify the location of the shared library with LD_LIBRARY_PATH.

To run: (assuming current directory is root of the repository)

LD_LIBRARY_PATH=build/Release/lib ./build/Release/bin/dynamic_hotgym

MNIST benchmark

The task is to recognize images of hand written numbers 0-9. This is often used as a benchmark. This should score at least 95%.

To run: (assuming current directory is top of repository)

  ./build/Release/bin/mnist_sp

In Python:

python py/htm/examples/mnist.py

REST example

The REST interface for NetworkAPI provides a way to access the underlining htm.core library using a REST client. The examples provide both a full REST web server that can process the web requests that allow the user to create a Network object resource and perform htm operations on it. Message layout details can be found in NetworkAPI REST docs. To run:

   ./build/Release/bin/server [port [network_interface]]

A REST client, implemented in C++ is also provided as an example of how to use the REST web server. To run: first start the server.

   ./build/Release/bin/client [host [port]]

The default host is 127.0.0.1 (the local host) and the port is 8050.

License

The htm.core library is distributed under GNU Affero Public License version 3 (AGPLv3). The full text of the license can be found at http://www.gnu.org/licenses.

Libraries that are incorporated into htm.core have the following licenses:

Library Source Location License
libyaml https://github.com/yaml/libyaml https://github.com/yaml/libyaml/blob/master/LICENSE
boost (*note3) https://www.boost.org/ https://www.boost.org/LICENSE_1_0.txt
eigen http://eigen.tuxfamily.org/ https://www.mozilla.org/en-US/MPL/2.0/
pybind11 https://github.com/pybind/pybind11 https://github.com/pybind/pybind11/blob/master/LICENSE
cereal https://uscilab.github.io/cereal/ https://opensource.org/licenses/BSD-3-Clause
digestpp https://github.com/kerukuro/digestpp released into public domain
cpp-httplib https://github.com/yhirose/cpp-httplib https://github.com/yhirose/cpp-httplib/blob/master/LICENSE
  • note3: Boost is not used if built with any compiler that supports C++17 with std::filesystem (MSVC2017, gcc-8, clang-9).

Cite us

We're happy that you can use the community work in this repository or even join the development! Please give us attribution by linking to us as htm.core at https://github.com/htm-community/htm.core/ , and for papers we suggest to use the following BibTex citation:

@misc{htmcore2019,
	abstract = "Implementation of cortical algorithms based on HTM theory in C++ \& Python. Research \& development library.",
	author = "M. Otahal and D. Keeney and D. McDougall and others",
	commit = bf6a2b2b0e04a1d439bb0492ea115b6bc254ce18,
	howpublished = "\url{https://github.com/htm-community/htm.core/}",
	journal = "Github repository",
	keywords = "HTM; Hierarchical Temporal Memory; NuPIC; Numenta; cortical algorithm; sparse distributed representation; anomaly; prediction; bioinspired; neuromorphic",
	publisher = "Github",
	series = "{Community edition}",
	title = "{HTM.core implementation of Hierarchical Temporal Memory}",
	year = "2019"
}

Note: you can update the commit to reflect the latest version you have been working with to help making the research reproducible.

Helps

Numenta's BAMI The formal theory behind it all. Also consider Numenta's Papters.

HTM School is a set of videos that explains the concepts.

Indy's Blog

For questions regarding the theory can be posted to the HTM Forum.

Questions and bug reports regarding the library code can be posted in htm.core Issues blog.

Related community work

Community projects for working with HTM.

Visualization

HTMPandaVis

This project aspires to create tool that helps visualize HTM systems in 3D by using opensource framework for 3D rendering https://www.panda3d.org/

NetworkAPI has region called "DatabaseRegion". This region can be used for generating SQLite file and later on read by PandaVis - DashVis feature, to show interactive plots in web browser on localhost. See napi_hello_database for basic usage.

For more info, visit repository of the project pandaVis1

htm.core's People

Contributors

akhilaananthram avatar andrewmalta13 avatar baroobob avatar brev avatar breznak avatar chetan51 avatar csimons avatar ctrl-z-9000-times avatar danstanton avatar dkeeney avatar fcr avatar johanwiden avatar lscheinkman avatar lupino avatar mrcslws avatar natoromano avatar oxtopus avatar pettitda avatar rcrowder avatar rhyolight avatar saganbolliger avatar scottpurdy avatar subutai avatar suzusuzu avatar tomsilver avatar utensil avatar vitaly-krugl avatar wilsonk avatar ywcui1990 avatar zbysekz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

htm.core's Issues

Remove Dimensions from the Link object.

As a placeholder for things to do.
Dimensions are known to be obsolete and unused. By removing dimensions we can simplify the Link (also removing LinkPolicy) and provide zero copy for some links and automatic type conversion for those that need it.

As part of this I suggest replacing the "isSparse" in the Input and Output objects with a new NTA_BasicType so that data can be converted between dense and sparse automatically as dictated by the specs of the source and destination plugins.

Similar: #139

Reduce dependencies

@dkeeney @chhenning I was reviewing your interesting work on refactoring/reducing nupic.core, and I'd like to get it merged back here to some extend.

Features that I like:

  • using c++ instead of boost, where possible. Seen in @dkeeney 's base branch? Although I'm interested only in changes with c++11 (not ++17 as you target) Ok, let's go c++17. #55 #106
  • removing apr* , boost.filesystem is the replacement? Also in the base branch? #74
  • getting rid of zlib / capnp for compression. I think one of you sticks with either. I'd for sure remove zlib, as users can zip themselves; capnp adds lots of mess, but works quite ok, do you think vanilla plain-text backups are ok/better? #48
  • removing Swig, replacing with pybind11 (@chhenning ?) does it work well? which branch can I start cherry-picking and merging from? Or David has gone the pure C++ way.
    ...this bullet will be quite complex, I'll leave it for now for further discussion. #81
    • Do I still need/benefit from your SP/TMRegion classes? #54

Refactor: create SDR class

I'd like to create a class for dealing with SDRs. The primary function of this class is to convert SDR values between the various SDR formats, including dense, flat-index, and index-w/-dimensions. This is something I've done with my own HTM implementation (which is python3). This should benefit the quality & readability of the code. This should also benefit the performance because code can be easily rewritten to use the optimal data format for whatever task.

My python3 implementation of this class is at: https://github.com/ctrl-z-9000-times/sdr_algorithms/blob/master/sdr.py

Example API:

SparseDistributedRepresentation X(vector<UInt> dimensions)
X.size() -> UInt // Total number of bits in the SDR.
// Calling any of the setters will replace the SDRs value.
X.setDense(UInt dense[])
X.setFlatIndex(UInt flatIndex[])
X.setIndex(vector<UInt[]> index)
// Calling any of the getters will convert the SDR to the correct 
// data format and save it inside of this class for future use.  In 
// this way, all calculations are performed lazy and then cached.
X.getDense() -> UInt[]
X.getFlatIndex() -> UInt[]
X.getIndex -> vector<UInt[]>

// Example usage
X.setDense( myData );
// SDR class converts from dense array to indices of non-zero values.
myIndex = X.getFlatIndex();  // SDR does work here.
myIndexAgain = X.getFlatIndex();  // SDR reuses result of previous line.
// Setting a new value will clear out the saved copies of the old value.
X.setFlatIndex( myNewData );

This could eventually be extended/subclassed to use bitvectors, like proposed by #12 (comment)

So what's everybody think of this? If this or something like it were available as part of Nupic would you use it?

Spatial Pooler: investigate using Connections as backend

Connections are optimized structure used in TM, if we use 1 cell per column, this should be able to mimic SP.

Then we can delete SparseBinaryMatrix (used as backend in SP), this PR is mutualy exclusive to #105

  • implement testing benchmark for SP #30
  • see if SP can be implemented using Connections -> yes!
  • benchmark SparseBinaryMatrix vs Connections performance -> about 10x faster (11s for 1000iters on SparseBinaryMatrix vs 7s on 5000iters on Connections in HelloSPTP benchmark)
  • remove SparseBinaryMatrix -> #104

After #92

New Spatial Pooler with Macro columns for faster Local inhibition

Performance improvements for Spatial Pooler Topology / Local Inhibition. I mentioned this on the numenta.org forum, and here I'm hoping to flesh out the idea more and communicate it with you all.

The spatial pooler with global inhibition works great as is; however local inhibition does not scale well because of the algorithms used. The differences between local and global inhibition happen at a large scale, but within a small (topological) area local and global inhibition do the same thing. Poor mans topology uses global inhibition to approximate local inhibition by making a spatial pooler with global inhibition for each area of local inhibition. In other words: Macro columns can use global inhibition and still have a large scale topology, by simulating a macro column for each topological area.

Pros:

  • Speed, this should run as fast as the underlying spatial poolers with global inhibition
  • API, should be similar to the underlying spatial pooler's API

Cons:

  • Spatial resolution. The current local-inhibition spreads the mini-columns across the input space, but this proposal would cluster many mini-columns into a point and many clusters are spread across the input space. This can be mitigated by using many clusters of mini-columns which allows for an evenly spread blanket of mini-columns.

Implementation:
A new C++ class which will create and maintain a list of SpatialPooler instances, one for each macro column. Macro columns are arranged in a uniform grid over the input space. Macro columns inputs are rectangular slices of the input space.

Example:
The MNIST dataset would be a good example. Its fast, easy to solve, widely recognized, and its visual data which is pretty.

API: Similar to SpatialPooler class ...

  • I thought that I'd replace references to "columns" with either "macroColumns" or "miniColumns" throughout this class.
  • initialize() - has the same parameters as the SP class except:
    • remove param columnDimensions
    • add param macroColumnDimentions of type vector<UInt> This must have the same length as the inputDimensions.
    • add param miniColumnsPerMacro of type UInt
    • change type of potentialRadius from UInt to vector<Real>
    • change type of wrapAround from bool to vector<bool>
  • compute() - no change to public facing API. This method will deal with dividing up the inputs, running SP.compute(), and concatenating the results.
  • Add method getMacroColumns() -> vector<*SpatialPooler> Use this method to access the underlying SP instances.
  • Replace method getColumnDimensions() with:
    • getMacroColumnDimensions() -> vector<UInt>
    • getMiniColumns() -> UInt

...


Started a separate issue from #3 (comment)
Related #84
Author @ctrl-z-9000-times

Implement Serializable interface

for proper, clean serialization

API:

  • void save()
  • static T load()
  • struct params {}
    • here are all constructor parameters that need to be serialized (will also be used for Comparable)

TODO:
discuss details of the API
Part of #58

Switch to c++17

from modern C++ features support, reducing dependencies.

  • enable c++17 in CMake, compile code with it locally
    • fix additional c++17 nuances
  • enable in all CI and make sure all compile
    • update build scripts
    • make sure recent version of compiler installs in the CI
    • Linux
    • Windows
    • OSX
  • replace suitable boost:: with std:: -- boost now optional with c++17
  • integrate c++17 enhancements in codebase (may not need all at once)
    • replace all *T, Real*, etc with std::array (for fixed size), std::vector for variable respectively; no speed impact
    • use RAII; remove pointers, make local variable where possible; elsewhere use smart_pointers
    • use const where possible, cleaner, safer, faster
    • use auto for cleaner code
    • use for-each for(auto item : list)
    • replace loops with <algorithms> where possible; ie fill(), etc

This is a great readup for any c++ programmer:
https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md

Port integration tests to unit-tests

from src/test/integration/

  • PyRegion
  • CppRegion
  • ConnectionPerformance-Test
  • move other src/examples/ to unit tests

this will

  • simplify cmake
  • better coverage
  • easier to run tests in CI

Setup CI

  • for Windows -- AppVeyor
  • Linux -- Travis
  • OSX -- CircleCI
  • add binary Releases

Road map for migrating to PyBind11 and Python3.x support.

Here is what I propose:
I am going to abandon issue #74 (Replace APR) and PR #79. I was trying to do this in too large of a chunk. I also messed up and ended up with massive whitespace change due to line ending changes (working on windows trying to get MinGW to work). lets break up the effort into smaller parts.

  1. New PR to replace existing header only Boost with Boost with filesystem and system modules.

  2. A new PR for each file, or very small groups of files, to replace the APR calls to std:: or boost:: calls.
    This will be a lot of PR's so I will need help in getting these reviewed. The work is already done, just need to open the PR and copy my changes on a file by file bases.

  3. A new PR to remove the APR libraries...delete only

Then we can look at removing SWIG and replacing it with PyBind11.
4) Change some coupling in RegionImplFactory so that the Python related code can be isolated.

  1. Build PyBind11 interface module (based on work by @chhenning). Might not be able to run both PyBind11 and SWIG in the same module but maybe we could just disable SWIG for this PR.

  2. Remove SWIG.

At this point we can look at Porting more modules from Python code into the core such that it can be used as a stand-alone C++ library. Specifically SPRegion and TMRegion as well as lots of encoders.

  • There is a major problem with using the version of MinGW that we are currently using. It is a special version hacked up to work with Python 2.7 and SWIG which is no longer supported. I propose that we not support that platform during the transition, working only with Linux and OSx platforms. Then after step 6 we add Visual Studio 2017 platform and perhaps re-introduce the real MinGW platform.

Wishlist

Unordered, confusing, just ideas-drop list.

  • top down compute (so we can get input for given SDR, without Classifier)
  • Performance optimizations #3
  • Re-Organize repo structure into more isolated, modular APIs
  • move to Python 3
  • idealize goals, directions of this fork, publish

Moving Python interface to a separate repository

This is going to be a big PR. During the move we can switch from SWIG to Bind11 as the Python interface. This would mean we could support both Python 2.7 and Python 3.6+.

Start by ripping out the SWIG build. @chhenning already has a working interface for Bind11 that we can drop in. We just need to package it as a separate repository and set up a process for building in parallel with nupic.cpp.

Only the RegionImplFactory currently has calls into the Python world that would need to be resolved. The way to do this is to have a subclass of RegisteredRegionImpl for each language interface that is passed to the factory to create a custom plugin. RegisteredRegionImplCpp for plugins written in C++ and RegisteredRegionImplPy for plugins written in Python. The logic for allocating and calling a python object via (PyRegion.cpp) would reside in this subclass rather than in the factory. RegisteredRegionImplPy and PyRegion.cpp/hpp would reside in the Python repository. All of the Python helper classes and the Python tests would also be in the Python repository.

When we implement the CSharp interface it can do the same thing...have its own subclass of RegisteredRegionImpl for allocating custom plugins. The nupic.cpp code does not need to have any knowledge of Python or CSharp.

Problems with ConnectionsPerformanceTest

When running unit_test I got this:
[ RUN ] ConnectionsPerformanceTest.testTMLarge
0.013655 in temporal memory (large): initialize
24.1005 in temporal memory (large): initialize + learn
28.7102 in temporal memory (large): initialize + learn + test
/home/dave/cpp/src/test/unit/algorithms/ConnectionsPerformanceTest.cpp:277: Failure
Expected: (tim) <= (28.0f), actual: 28.7102 vs 28
[ FAILED ] ConnectionsPerformanceTest.testTMLarge (28723 ms)

This was compiled in debug mode on Ubuntu.
Two problems:

  1. that actual time it takes to run (28 seconds)
  2. The actual error which may be in the test facility.

Investigate removal of custom math/*Matrix;

There is a huge ammount of code in math/*Matrix that is old, unmaintained and quite untested. Propose removal and replacement :

  • replace with Connections
  • or Eigen #42
  • or keep only the SparseBinaryMatrix

Look at the problem and proposed order of progress:

grep -R 'Matrix' src/nupic/math/ | cut -d: -f1 |sort -u

  • src/nupic/math/Math.hpp
  • src/nupic/math/ArrayAlgo.hpp -- maybe most of its methods will be removed?
  • src/nupic/math/DenseMatrix.hpp -- only one user SDRClassifier, replace with sparse & rm first #170 #169
  • src/nupic/math/NearestNeighbor.hpp -- NN does not have to be in a HTM codebase
  • src/nupic/math/SparseBinaryMatrix.hpp -- heavily used in SP #93 , #169
  • algorithms/CondProbTable -- rm too? Uses:
    • src/nupic/math/SparseMatrix01.hpp -- not used, rm
  • src/nupic/math/SparseMatrix.hpp -- huge, untested, suprisingly not a base of SparseBinaryM #169
    • used in ArrayAlgo, DenseMatrix, SpatialPooler
  • src/nupic/math/SparseMatrixAlgorithms.hpp -- not used, rm
  • src/nupic/math/SparseMatrixConnections.hpp -- not used, rm
    • src/nupic/math/SegmentMatrixAdapter.hpp -- used only in SMConn
      • py_SegmentSparseMatrix
  • src/nupic/math/SparseRLEMatrix.hpp -- not used, rm
  • src/nupic/math/SparseTensor.hpp -- not sure how needed for Py - implemented (but reverted) here: bfb58df , #169
    • Domain
    • Index
    • PyBindSparseTensor
  • math/Math.hpp:Gaussian_2D -- not used, further cleanup Math.hpp

Related:

  • #93 SP on Connections

Prerequisities:

  • c++ speed tests #30
  • validate Python nupic still compatible with this nupic.core #137

Optimization for performance

Steps:

  • set baseline benchmarking tests, the more, the better
    • micro benchmarks
    • IDE profiling
    • real life benchmark hotgym) #30
  • refactor code to use shared, encapsulated class for passing around data, "SDR type"
    • for now it could be typedef UInt*,
    • later wrap vector, add some methods,
    • even later wrap opt-Matrix type,...
  • identify bottlenecks
  • vectorize
    • almost all the optimization libraries work on vectors
    • replace usecases where we have setPermanence(newValue) called in a loop, with vectorized version (a scalar can be a vector with 1 item)
  • compare math library toolkits
    • the library have their data type (EIgenMatrix, etc)
    • converting to/from it will kill the (gained) performance -> "SDR type"
  • iterative optimizations

Requirements:

  • what we want from the library?
  • speed
  • multi-platform
  • sparse (memory efficient)
  • big user-base, popular
  • low code "intrusiveness"
  • CPU backend (SSE, openMP)
  • nVidia GPU backend (CUDA)
  • AMD GPU backend (openCL)
  • open source
  • clean & lean API (ease of use)
  • bindings/support for other languages (python,...)
  • I don't need no optimizations

Considered toolkits:

Links:

Removal/rewrite of TP(=Cells4), BacktrackingTM in favor of TM

  • TP/Cells4 & related classes account for a huge number of old/ugly code
  • TM is currently being used instead of TP for most(all?) use-cases
  • find on forums, etc if TP has still a valid use-case. Where is it better performing than TM?
    --- #327 (comment)
    • BackTM is slightly better at Anomaly (NAB)
  • remove TP/Cells4 from our codebase
    • remove BacktrackingTM

EDIT: more discussion on why those should be removed here #327

Publish bindings on pypi

#1
Depends on binary releases #361
Use this as an example RedFT/Hexy@ddc9d01

  • fix format so whl is published on PYPI
  • fix whl content so we have all needed files there
    • people should be able to install by pip install my.whl
  • get token to publish to "real" PYPI
  • update Readme with instructions

ArrayBase pull request.

@breznak I need your help. I am trying to submit a pull request but having a bit of a problem. Here is what I did:

  1. Cloned the htm-community/nupic.cpp repository.
  2. Opened a new local branch "Array"
  3. copied my source files for ArrayBase, Array, ArrayRef to the new branch.
  4. Made a few changes so that it would work. built and ran the unit tests.
  5. committed the changes and pushed them to github.com/dkeeney/nupic.cpp Array branch
  6. On htm-community/nupic.cpp repository, clicked the pull request button. Tried to specify the pull request coming from github.com/dkeeney/nupic.cpp Array branch to htm-community/nupic.cpp Master branch. But it will not let me select that combination.

Am I doing this all wrong?

Replace APR libraries with std:: libraries (abandoned)

Another placeholder for things to do.
Delete the APR library dependency and see what breaks. Fix it by replacing it with calls to std:: libraries.
In the OS folder the Directories and Path objects can be implemented with #include filesystem . This of course requires switching to C++17.

Edit: part of #47

Real-life benchmark: Hotgym example using C++ algorithms

Implement a pipeline running full real-world HTM task.

Currently implemented using raw HTM classes (TM, SP,...),
not NetworkAPI (needs TM/SPRegion), not as python code using c++ bindings (would be possible).

Pipeline:

  • compile as standalone executable (for profiling)
  • load CSV from file
    • use our classical "hotgym" dataset
    • parse command-line for optional filename and num runs
  • encode CSV data
    • MultiEncoder for more fields than 1
  • run SpatialPooler to get SDR
    • global
    • local inhibition
  • run TP to get temporal predictions
    • use more modern TM as alternative
    • TP (old) obsoleted
    • BacktrackingTM (TP based) obsoleted
    • show SDR output computed by these TM flavours
      • also checks deterministic algorithms' outputs #194
  • compute Anomaly score
    • test AnomalyLikelihood
  • add SDR Classifier
    • needs encoder topDownCompute (SDR -> Real) decided WONTFIX
  • measure execution time
    • more fine-grained separate timers for each part of pipeline (SP, Encoder, TM,..)
      • fine grained timer checks for each part
  • implement as a class to make more reusable
  • use SDR for all layers
    • SDR Metrics
    • enforce common Compute/Serializable/... interface
  • implement using core algorithms (SP, TM)
    • encoder
    • SP
    • TM
    • AN
    • classifier
      • predictor
    • CP (later, when implemented) #285
    • implement using NetworkAPI
  • test parallelization #255
  • test interfaces
    • serialization
  • optimize parameters #433

We are looking for a real-life benchmark we can use as a base for our performance optimizations #3 .
In Python there is a "Hotgym anomaly example" (stresses encoder, SP, TM, Anomaly) , implement similar example in C++ and add it to integration-tests with timing.

  • suggested with NAPI

I have ported SPRegion and TMRegion and they run under windows but I was waiting until after PyBind was implemented to merge them in with a later PR.

Waiting for #54 SPRegion & TMRegion in C++

Introduce Interfaces

so that code can be more cleanly used, code redux.
Interfaces enforce stable API, name form of *able.

  • Computable
    • feed forward compute input
    • void compute(T* input, T* output)
    • will be used for all classes: encoder, SP, TM*, Anomaly, ...
  • Serializable
    • can serialize to file
    • void save(..) throws;
    • static T load(..)
  • Printable
    • nice way to display the object in human readable form
    • std::string toString() const
    • override << ..?
  • Comparable
    • implements equals comparison
    • override ==

Build Windows CI with native MSVC c++ compiler

as now we build with some mingw, MSVC is capable, available as OSS now, let's see if we can get it in Appveyor CI.
Would be better to build with native compiler.

EDIT:
MSVC on Win put off only after SWIG removal:
see
#55 (comment)

I suggest that we hold off on Visual Studio until after you get rid of Swig. The numenta documentation says that it will not work unless you install a very old compiler. I was not able to get the bindings to build under Visual Studio with C++11 although I have not tried since we removed capnproto. And I agree MingW is not really a viable build environment.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.