Git Product home page Git Product logo

htm-community / htm.core Goto Github PK

View Code? Open in Web Editor NEW

This project forked from numenta/nupic.core-legacy

145.0 30.0 74.0 300.63 MB

Actively developed Hierarchical Temporal Memory (HTM) community fork (continuation) of NuPIC. Implementation for C++ and Python

Home Page: http://numenta.org

License: GNU Affero General Public License v3.0

CMake 1.44% C 0.01% Python 23.28% PowerShell 0.12% Shell 0.41% C++ 42.54% Dockerfile 0.05% Batchfile 0.12% HTML 27.36% JavaScript 4.69%
nupic cpp htm ai hierarchical-temporal-memory neuromorphic-computing neural-networks anomaly-detection prediction cortical-learning sparse-distributed-memory neuroscience neuroscience-inspired-ai reproducible-research

htm.core's Issues

Refactor: create SDR class

I'd like to create a class for dealing with SDRs. The primary function of this class is to convert SDR values between the various SDR formats, including dense, flat-index, and index-w/-dimensions. This is something I've done with my own HTM implementation (which is python3). This should benefit the quality & readability of the code. This should also benefit the performance because code can be easily rewritten to use the optimal data format for whatever task.

My python3 implementation of this class is at: https://github.com/ctrl-z-9000-times/sdr_algorithms/blob/master/sdr.py

Example API:

SparseDistributedRepresentation X(vector<UInt> dimensions)
X.size() -> UInt // Total number of bits in the SDR.
// Calling any of the setters will replace the SDRs value.
X.setDense(UInt dense[])
X.setFlatIndex(UInt flatIndex[])
X.setIndex(vector<UInt[]> index)
// Calling any of the getters will convert the SDR to the correct 
// data format and save it inside of this class for future use.  In 
// this way, all calculations are performed lazy and then cached.
X.getDense() -> UInt[]
X.getFlatIndex() -> UInt[]
X.getIndex -> vector<UInt[]>

// Example usage
X.setDense( myData );
// SDR class converts from dense array to indices of non-zero values.
myIndex = X.getFlatIndex();  // SDR does work here.
myIndexAgain = X.getFlatIndex();  // SDR reuses result of previous line.
// Setting a new value will clear out the saved copies of the old value.
X.setFlatIndex( myNewData );

This could eventually be extended/subclassed to use bitvectors, like proposed by #12 (comment)

So what's everybody think of this? If this or something like it were available as part of Nupic would you use it?

Road map for migrating to PyBind11 and Python3.x support.

Here is what I propose:
I am going to abandon issue #74 (Replace APR) and PR #79. I was trying to do this in too large of a chunk. I also messed up and ended up with massive whitespace change due to line ending changes (working on windows trying to get MinGW to work). lets break up the effort into smaller parts.

  1. New PR to replace existing header only Boost with Boost with filesystem and system modules.

  2. A new PR for each file, or very small groups of files, to replace the APR calls to std:: or boost:: calls.
    This will be a lot of PR's so I will need help in getting these reviewed. The work is already done, just need to open the PR and copy my changes on a file by file bases.

  3. A new PR to remove the APR libraries...delete only

Then we can look at removing SWIG and replacing it with PyBind11.
4) Change some coupling in RegionImplFactory so that the Python related code can be isolated.

  1. Build PyBind11 interface module (based on work by @chhenning). Might not be able to run both PyBind11 and SWIG in the same module but maybe we could just disable SWIG for this PR.

  2. Remove SWIG.

At this point we can look at Porting more modules from Python code into the core such that it can be used as a stand-alone C++ library. Specifically SPRegion and TMRegion as well as lots of encoders.

  • There is a major problem with using the version of MinGW that we are currently using. It is a special version hacked up to work with Python 2.7 and SWIG which is no longer supported. I propose that we not support that platform during the transition, working only with Linux and OSx platforms. Then after step 6 we add Visual Studio 2017 platform and perhaps re-introduce the real MinGW platform.

Implement Serializable interface

for proper, clean serialization

API:

  • void save()
  • static T load()
  • struct params {}
    • here are all constructor parameters that need to be serialized (will also be used for Comparable)

TODO:
discuss details of the API
Part of #58

Setup CI

  • for Windows -- AppVeyor
  • Linux -- Travis
  • OSX -- CircleCI
  • add binary Releases

Moving Python interface to a separate repository

This is going to be a big PR. During the move we can switch from SWIG to Bind11 as the Python interface. This would mean we could support both Python 2.7 and Python 3.6+.

Start by ripping out the SWIG build. @chhenning already has a working interface for Bind11 that we can drop in. We just need to package it as a separate repository and set up a process for building in parallel with nupic.cpp.

Only the RegionImplFactory currently has calls into the Python world that would need to be resolved. The way to do this is to have a subclass of RegisteredRegionImpl for each language interface that is passed to the factory to create a custom plugin. RegisteredRegionImplCpp for plugins written in C++ and RegisteredRegionImplPy for plugins written in Python. The logic for allocating and calling a python object via (PyRegion.cpp) would reside in this subclass rather than in the factory. RegisteredRegionImplPy and PyRegion.cpp/hpp would reside in the Python repository. All of the Python helper classes and the Python tests would also be in the Python repository.

When we implement the CSharp interface it can do the same thing...have its own subclass of RegisteredRegionImpl for allocating custom plugins. The nupic.cpp code does not need to have any knowledge of Python or CSharp.

Publish bindings on pypi

#1
Depends on binary releases #361
Use this as an example RedFT/Hexy@ddc9d01

  • fix format so whl is published on PYPI
  • fix whl content so we have all needed files there
    • people should be able to install by pip install my.whl
  • get token to publish to "real" PYPI
  • update Readme with instructions

Introduce Interfaces

so that code can be more cleanly used, code redux.
Interfaces enforce stable API, name form of *able.

  • Computable
    • feed forward compute input
    • void compute(T* input, T* output)
    • will be used for all classes: encoder, SP, TM*, Anomaly, ...
  • Serializable
    • can serialize to file
    • void save(..) throws;
    • static T load(..)
  • Printable
    • nice way to display the object in human readable form
    • std::string toString() const
    • override << ..?
  • Comparable
    • implements equals comparison
    • override ==

Remove Dimensions from the Link object.

As a placeholder for things to do.
Dimensions are known to be obsolete and unused. By removing dimensions we can simplify the Link (also removing LinkPolicy) and provide zero copy for some links and automatic type conversion for those that need it.

As part of this I suggest replacing the "isSparse" in the Input and Output objects with a new NTA_BasicType so that data can be converted between dense and sparse automatically as dictated by the specs of the source and destination plugins.

Similar: #139

Real-life benchmark: Hotgym example using C++ algorithms

Implement a pipeline running full real-world HTM task.

Currently implemented using raw HTM classes (TM, SP,...),
not NetworkAPI (needs TM/SPRegion), not as python code using c++ bindings (would be possible).

Pipeline:

  • compile as standalone executable (for profiling)
  • load CSV from file
    • use our classical "hotgym" dataset
    • parse command-line for optional filename and num runs
  • encode CSV data
    • MultiEncoder for more fields than 1
  • run SpatialPooler to get SDR
    • global
    • local inhibition
  • run TP to get temporal predictions
    • use more modern TM as alternative
    • TP (old) obsoleted
    • BacktrackingTM (TP based) obsoleted
    • show SDR output computed by these TM flavours
      • also checks deterministic algorithms' outputs #194
  • compute Anomaly score
    • test AnomalyLikelihood
  • add SDR Classifier
    • needs encoder topDownCompute (SDR -> Real) decided WONTFIX
  • measure execution time
    • more fine-grained separate timers for each part of pipeline (SP, Encoder, TM,..)
      • fine grained timer checks for each part
  • implement as a class to make more reusable
  • use SDR for all layers
    • SDR Metrics
    • enforce common Compute/Serializable/... interface
  • implement using core algorithms (SP, TM)
    • encoder
    • SP
    • TM
    • AN
    • classifier
      • predictor
    • CP (later, when implemented) #285
    • implement using NetworkAPI
  • test parallelization #255
  • test interfaces
    • serialization
  • optimize parameters #433

We are looking for a real-life benchmark we can use as a base for our performance optimizations #3 .
In Python there is a "Hotgym anomaly example" (stresses encoder, SP, TM, Anomaly) , implement similar example in C++ and add it to integration-tests with timing.

  • suggested with NAPI

I have ported SPRegion and TMRegion and they run under windows but I was waiting until after PyBind was implemented to merge them in with a later PR.

Waiting for #54 SPRegion & TMRegion in C++

Switch to c++17

from modern C++ features support, reducing dependencies.

  • enable c++17 in CMake, compile code with it locally
    • fix additional c++17 nuances
  • enable in all CI and make sure all compile
    • update build scripts
    • make sure recent version of compiler installs in the CI
    • Linux
    • Windows
    • OSX
  • replace suitable boost:: with std:: -- boost now optional with c++17
  • integrate c++17 enhancements in codebase (may not need all at once)
    • replace all *T, Real*, etc with std::array (for fixed size), std::vector for variable respectively; no speed impact
    • use RAII; remove pointers, make local variable where possible; elsewhere use smart_pointers
    • use const where possible, cleaner, safer, faster
    • use auto for cleaner code
    • use for-each for(auto item : list)
    • replace loops with <algorithms> where possible; ie fill(), etc

This is a great readup for any c++ programmer:
https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md

Build Windows CI with native MSVC c++ compiler

as now we build with some mingw, MSVC is capable, available as OSS now, let's see if we can get it in Appveyor CI.
Would be better to build with native compiler.

EDIT:
MSVC on Win put off only after SWIG removal:
see
#55 (comment)

I suggest that we hold off on Visual Studio until after you get rid of Swig. The numenta documentation says that it will not work unless you install a very old compiler. I was not able to get the bindings to build under Visual Studio with C++11 although I have not tried since we removed capnproto. And I agree MingW is not really a viable build environment.

ArrayBase pull request.

@breznak I need your help. I am trying to submit a pull request but having a bit of a problem. Here is what I did:

  1. Cloned the htm-community/nupic.cpp repository.
  2. Opened a new local branch "Array"
  3. copied my source files for ArrayBase, Array, ArrayRef to the new branch.
  4. Made a few changes so that it would work. built and ran the unit tests.
  5. committed the changes and pushed them to github.com/dkeeney/nupic.cpp Array branch
  6. On htm-community/nupic.cpp repository, clicked the pull request button. Tried to specify the pull request coming from github.com/dkeeney/nupic.cpp Array branch to htm-community/nupic.cpp Master branch. But it will not let me select that combination.

Am I doing this all wrong?

Investigate removal of custom math/*Matrix;

There is a huge ammount of code in math/*Matrix that is old, unmaintained and quite untested. Propose removal and replacement :

  • replace with Connections
  • or Eigen #42
  • or keep only the SparseBinaryMatrix

Look at the problem and proposed order of progress:

grep -R 'Matrix' src/nupic/math/ | cut -d: -f1 |sort -u

  • src/nupic/math/Math.hpp
  • src/nupic/math/ArrayAlgo.hpp -- maybe most of its methods will be removed?
  • src/nupic/math/DenseMatrix.hpp -- only one user SDRClassifier, replace with sparse & rm first #170 #169
  • src/nupic/math/NearestNeighbor.hpp -- NN does not have to be in a HTM codebase
  • src/nupic/math/SparseBinaryMatrix.hpp -- heavily used in SP #93 , #169
  • algorithms/CondProbTable -- rm too? Uses:
    • src/nupic/math/SparseMatrix01.hpp -- not used, rm
  • src/nupic/math/SparseMatrix.hpp -- huge, untested, suprisingly not a base of SparseBinaryM #169
    • used in ArrayAlgo, DenseMatrix, SpatialPooler
  • src/nupic/math/SparseMatrixAlgorithms.hpp -- not used, rm
  • src/nupic/math/SparseMatrixConnections.hpp -- not used, rm
    • src/nupic/math/SegmentMatrixAdapter.hpp -- used only in SMConn
      • py_SegmentSparseMatrix
  • src/nupic/math/SparseRLEMatrix.hpp -- not used, rm
  • src/nupic/math/SparseTensor.hpp -- not sure how needed for Py - implemented (but reverted) here: bfb58df , #169
    • Domain
    • Index
    • PyBindSparseTensor
  • math/Math.hpp:Gaussian_2D -- not used, further cleanup Math.hpp

Related:

  • #93 SP on Connections

Prerequisities:

  • c++ speed tests #30
  • validate Python nupic still compatible with this nupic.core #137

New Spatial Pooler with Macro columns for faster Local inhibition

Performance improvements for Spatial Pooler Topology / Local Inhibition. I mentioned this on the numenta.org forum, and here I'm hoping to flesh out the idea more and communicate it with you all.

The spatial pooler with global inhibition works great as is; however local inhibition does not scale well because of the algorithms used. The differences between local and global inhibition happen at a large scale, but within a small (topological) area local and global inhibition do the same thing. Poor mans topology uses global inhibition to approximate local inhibition by making a spatial pooler with global inhibition for each area of local inhibition. In other words: Macro columns can use global inhibition and still have a large scale topology, by simulating a macro column for each topological area.

Pros:

  • Speed, this should run as fast as the underlying spatial poolers with global inhibition
  • API, should be similar to the underlying spatial pooler's API

Cons:

  • Spatial resolution. The current local-inhibition spreads the mini-columns across the input space, but this proposal would cluster many mini-columns into a point and many clusters are spread across the input space. This can be mitigated by using many clusters of mini-columns which allows for an evenly spread blanket of mini-columns.

Implementation:
A new C++ class which will create and maintain a list of SpatialPooler instances, one for each macro column. Macro columns are arranged in a uniform grid over the input space. Macro columns inputs are rectangular slices of the input space.

Example:
The MNIST dataset would be a good example. Its fast, easy to solve, widely recognized, and its visual data which is pretty.

API: Similar to SpatialPooler class ...

  • I thought that I'd replace references to "columns" with either "macroColumns" or "miniColumns" throughout this class.
  • initialize() - has the same parameters as the SP class except:
    • remove param columnDimensions
    • add param macroColumnDimentions of type vector<UInt> This must have the same length as the inputDimensions.
    • add param miniColumnsPerMacro of type UInt
    • change type of potentialRadius from UInt to vector<Real>
    • change type of wrapAround from bool to vector<bool>
  • compute() - no change to public facing API. This method will deal with dividing up the inputs, running SP.compute(), and concatenating the results.
  • Add method getMacroColumns() -> vector<*SpatialPooler> Use this method to access the underlying SP instances.
  • Replace method getColumnDimensions() with:
    • getMacroColumnDimensions() -> vector<UInt>
    • getMiniColumns() -> UInt

...


Started a separate issue from #3 (comment)
Related #84
Author @ctrl-z-9000-times

Problems with ConnectionsPerformanceTest

When running unit_test I got this:
[ RUN ] ConnectionsPerformanceTest.testTMLarge
0.013655 in temporal memory (large): initialize
24.1005 in temporal memory (large): initialize + learn
28.7102 in temporal memory (large): initialize + learn + test
/home/dave/cpp/src/test/unit/algorithms/ConnectionsPerformanceTest.cpp:277: Failure
Expected: (tim) <= (28.0f), actual: 28.7102 vs 28
[ FAILED ] ConnectionsPerformanceTest.testTMLarge (28723 ms)

This was compiled in debug mode on Ubuntu.
Two problems:

  1. that actual time it takes to run (28 seconds)
  2. The actual error which may be in the test facility.

Removal/rewrite of TP(=Cells4), BacktrackingTM in favor of TM

  • TP/Cells4 & related classes account for a huge number of old/ugly code
  • TM is currently being used instead of TP for most(all?) use-cases
  • find on forums, etc if TP has still a valid use-case. Where is it better performing than TM?
    --- #327 (comment)
    • BackTM is slightly better at Anomaly (NAB)
  • remove TP/Cells4 from our codebase
    • remove BacktrackingTM

EDIT: more discussion on why those should be removed here #327

Reduce dependencies

@dkeeney @chhenning I was reviewing your interesting work on refactoring/reducing nupic.core, and I'd like to get it merged back here to some extend.

Features that I like:

  • using c++ instead of boost, where possible. Seen in @dkeeney 's base branch? Although I'm interested only in changes with c++11 (not ++17 as you target) Ok, let's go c++17. #55 #106
  • removing apr* , boost.filesystem is the replacement? Also in the base branch? #74
  • getting rid of zlib / capnp for compression. I think one of you sticks with either. I'd for sure remove zlib, as users can zip themselves; capnp adds lots of mess, but works quite ok, do you think vanilla plain-text backups are ok/better? #48
  • removing Swig, replacing with pybind11 (@chhenning ?) does it work well? which branch can I start cherry-picking and merging from? Or David has gone the pure C++ way.
    ...this bullet will be quite complex, I'll leave it for now for further discussion. #81
    • Do I still need/benefit from your SP/TMRegion classes? #54

Port integration tests to unit-tests

from src/test/integration/

  • PyRegion
  • CppRegion
  • ConnectionPerformance-Test
  • move other src/examples/ to unit tests

this will

  • simplify cmake
  • better coverage
  • easier to run tests in CI

Optimization for performance

Steps:

  • set baseline benchmarking tests, the more, the better
    • micro benchmarks
    • IDE profiling
    • real life benchmark hotgym) #30
  • refactor code to use shared, encapsulated class for passing around data, "SDR type"
    • for now it could be typedef UInt*,
    • later wrap vector, add some methods,
    • even later wrap opt-Matrix type,...
  • identify bottlenecks
  • vectorize
    • almost all the optimization libraries work on vectors
    • replace usecases where we have setPermanence(newValue) called in a loop, with vectorized version (a scalar can be a vector with 1 item)
  • compare math library toolkits
    • the library have their data type (EIgenMatrix, etc)
    • converting to/from it will kill the (gained) performance -> "SDR type"
  • iterative optimizations

Requirements:

  • what we want from the library?
  • speed
  • multi-platform
  • sparse (memory efficient)
  • big user-base, popular
  • low code "intrusiveness"
  • CPU backend (SSE, openMP)
  • nVidia GPU backend (CUDA)
  • AMD GPU backend (openCL)
  • open source
  • clean & lean API (ease of use)
  • bindings/support for other languages (python,...)
  • I don't need no optimizations

Considered toolkits:

Links:

Wishlist

Unordered, confusing, just ideas-drop list.

  • top down compute (so we can get input for given SDR, without Classifier)
  • Performance optimizations #3
  • Re-Organize repo structure into more isolated, modular APIs
  • move to Python 3
  • idealize goals, directions of this fork, publish

Spatial Pooler: investigate using Connections as backend

Connections are optimized structure used in TM, if we use 1 cell per column, this should be able to mimic SP.

Then we can delete SparseBinaryMatrix (used as backend in SP), this PR is mutualy exclusive to #105

  • implement testing benchmark for SP #30
  • see if SP can be implemented using Connections -> yes!
  • benchmark SparseBinaryMatrix vs Connections performance -> about 10x faster (11s for 1000iters on SparseBinaryMatrix vs 7s on 5000iters on Connections in HelloSPTP benchmark)
  • remove SparseBinaryMatrix -> #104

After #92

Replace APR libraries with std:: libraries (abandoned)

Another placeholder for things to do.
Delete the APR library dependency and see what breaks. Fix it by replacing it with calls to std:: libraries.
In the OS folder the Directories and Path objects can be implemented with #include filesystem . This of course requires switching to C++17.

Edit: part of #47

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.