The htm.core's discuss from htm-community

Refactor: create SDR class

I'd like to create a class for dealing with SDRs. The primary function of this class is to convert SDR values between the various SDR formats, including dense, flat-index, and index-w/-dimensions. This is something I've done with my own HTM implementation (which is python3). This should benefit the quality & readability of the code. This should also benefit the performance because code can be easily rewritten to use the optimal data format for whatever task.

My python3 implementation of this class is at: https://github.com/ctrl-z-9000-times/sdr_algorithms/blob/master/sdr.py

Example API:

SparseDistributedRepresentation X(vector<UInt> dimensions)
X.size() -> UInt // Total number of bits in the SDR.
// Calling any of the setters will replace the SDRs value.
X.setDense(UInt dense[])
X.setFlatIndex(UInt flatIndex[])
X.setIndex(vector<UInt[]> index)
// Calling any of the getters will convert the SDR to the correct 
// data format and save it inside of this class for future use.  In 
// this way, all calculations are performed lazy and then cached.
X.getDense() -> UInt[]
X.getFlatIndex() -> UInt[]
X.getIndex -> vector<UInt[]>

// Example usage
X.setDense( myData );
// SDR class converts from dense array to indices of non-zero values.
myIndex = X.getFlatIndex();  // SDR does work here.
myIndexAgain = X.getFlatIndex();  // SDR reuses result of previous line.
// Setting a new value will clear out the saved copies of the old value.
X.setFlatIndex( myNewData );

This could eventually be extended/subclassed to use bitvectors, like proposed by #12 (comment)

So what's everybody think of this? If this or something like it were available as part of Nupic would you use it?

Uppgrade to yaml-cpp 0.6

which was just released
jbeder/yaml-cpp#415 (comment)

Road map for migrating to PyBind11 and Python3.x support.

Here is what I propose:
I am going to abandon issue #74 (Replace APR) and PR #79. I was trying to do this in too large of a chunk. I also messed up and ended up with massive whitespace change due to line ending changes (working on windows trying to get MinGW to work). lets break up the effort into smaller parts.

New PR to replace existing header only Boost with Boost with filesystem and system modules.
A new PR for each file, or very small groups of files, to replace the APR calls to std:: or boost:: calls.
This will be a lot of PR's so I will need help in getting these reviewed. The work is already done, just need to open the PR and copy my changes on a file by file bases.
A new PR to remove the APR libraries...delete only

Then we can look at removing SWIG and replacing it with PyBind11.
4) Change some coupling in RegionImplFactory so that the Python related code can be isolated.

Build PyBind11 interface module (based on work by @chhenning). Might not be able to run both PyBind11 and SWIG in the same module but maybe we could just disable SWIG for this PR.
Remove SWIG.

At this point we can look at Porting more modules from Python code into the core such that it can be used as a stand-alone C++ library. Specifically SPRegion and TMRegion as well as lots of encoders.

There is a major problem with using the version of MinGW that we are currently using. It is a special version hacked up to work with Python 2.7 and SWIG which is no longer supported. I propose that we not support that platform during the transition, working only with Linux and OSx platforms. Then after step 6 we add Visual Studio 2017 platform and perhaps re-introduce the real MinGW platform.

Figure serialization (remove capnproto?)

Also in https://github.com/chhenning/nupic.core/pull/4/commits
Replace with plain stream writes?

remove zlib (minimal, non-breaking change)
remove capnp & replace with << bitstream backups
- validate tests

Part of #47

Switch to Python 3

https://discourse.numenta.org/t/proposal-to-introduce-pybind-for-move-toward-python-3-compatibility/3110/65

use pybind11 lib in nupic.cpp
rewrite Py to python3 code

Implement Serializable interface

for proper, clean serialization

API:

void save()
static T load()
struct params {}
- here are all constructor parameters that need to be serialized (will also be used for Comparable)

TODO:
discuss details of the API
Part of #58

Spatial Pooler: separate Inhibition, Topology, Boosting into standalone classes

Boosting
Topology
Inhibition

After #108

Setup CI

for Windows -- AppVeyor
Linux -- Travis
OSX -- CircleCI
add binary Releases

Moving Python interface to a separate repository

This is going to be a big PR. During the move we can switch from SWIG to Bind11 as the Python interface. This would mean we could support both Python 2.7 and Python 3.6+.

Start by ripping out the SWIG build. @chhenning already has a working interface for Bind11 that we can drop in. We just need to package it as a separate repository and set up a process for building in parallel with nupic.cpp.

Only the RegionImplFactory currently has calls into the Python world that would need to be resolved. The way to do this is to have a subclass of RegisteredRegionImpl for each language interface that is passed to the factory to create a custom plugin. RegisteredRegionImplCpp for plugins written in C++ and RegisteredRegionImplPy for plugins written in Python. The logic for allocating and calling a python object via (PyRegion.cpp) would reside in this subclass rather than in the factory. RegisteredRegionImplPy and PyRegion.cpp/hpp would reside in the Python repository. All of the Python helper classes and the Python tests would also be in the Python repository.

When we implement the CSharp interface it can do the same thing...have its own subclass of RegisteredRegionImpl for allocating custom plugins. The nupic.cpp code does not need to have any knowledge of Python or CSharp.

Travis CI for Linux

#1

Remove SHOULDFAIL macros in tests, replace with EXPECT_THROW

from gtest

Cleanup src/nupic/{os,types,ntypes}

Try to remove, prune

os/
types/
ntypes/

Remove GaborNode and ImageRegion

as those are old and not used in codebase

Publish bindings on pypi

#1
Depends on binary releases #361
Use this as an example RedFT/Hexy@ddc9d01

fix format so whl is published on PYPI
fix whl content so we have all needed files there
- people should be able to install by pip install my.whl
get token to publish to "real" PYPI
update Readme with instructions

Introduce Interfaces

so that code can be more cleanly used, code redux.
Interfaces enforce stable API, name form of *able.

Computable
- feed forward compute input
- void compute(T* input, T* output)
- will be used for all classes: encoder, SP, TM*, Anomaly, ...
Serializable
- can serialize to file
- void save(..) throws;
- static T load(..)
Printable
- nice way to display the object in human readable form
- std::string toString() const
- override << ..?
Comparable
- implements equals comparison
- override ==

Enhancement: TM accept external predictive input

The temporal memory should accept predictive inputs (onto distal basal dendrites) from an external source.

There is a python example of this at: https://github.com/numenta/htmresearch/blob/master/htmresearch/algorithms/apical_tiebreak_temporal_memory.py

Replace Region and Link pointers with smart pointers.

Another to placeholder

Integrate CUDA SP as a C++ wrapper

After htm-community/htm.cuda#2

Upgrade to YamlLib 2.x

https://github.com/yaml/libyaml
v 2.1
After #8

Implement Column Pooler from htmresearch

Column Pooler, CP, is implemented in python in HTM research. Used for sensory-motor experiments, it extends functionality of TM.

Remove Dimensions from the Link object.

As a placeholder for things to do.
Dimensions are known to be obsolete and unused. By removing dimensions we can simplify the Link (also removing LinkPolicy) and provide zero copy for some links and automatic type conversion for those that need it.

As part of this I suggest replacing the "isSparse" in the Input and Output objects with a new NTA_BasicType so that data can be converted between dense and sparse automatically as dictated by the specs of the source and destination plugins.

Similar: #139

Modular repo structure

How to make the repos more modular for rapid development...

https://discourse.numenta.org/t/re-organisation-of-the-c-py-community-repos/

And old issue in nupic.core

Remove memory-leak test in Nupic

currently disabled, as it does not pass. Verify the logic is OK, fix..
#61 (comment)
from PR #61

Investigate using Eigen for SparseMatrix, Connections

or using Eigen matrices self in the classes.

An advantage would be ability to read Eigen data in other languages seamlessly through the library: c++, py, R, java
http://eigen.tuxfamily.org/index.php?title=FAQ#Other_languages

Real-life benchmark: Hotgym example using C++ algorithms

Implement a pipeline running full real-world HTM task.

Currently implemented using raw HTM classes (TM, SP,...),
not NetworkAPI (needs TM/SPRegion), not as python code using c++ bindings (would be possible).

Pipeline:

We are looking for a real-life benchmark we can use as a base for our performance optimizations #3 .
In Python there is a "Hotgym anomaly example" (stresses encoder, SP, TM, Anomaly) , implement similar example in C++ and add it to integration-tests with timing.

suggested with NAPI

I have ported SPRegion and TMRegion and they run under windows but I was waiting until after PyBind was implemented to merge them in with a later PR.

Waiting for #54 SPRegion & TMRegion in C++

Switch to c++17

from modern C++ features support, reducing dependencies.

This is a great readup for any c++ programmer:
https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md

Build Windows CI with native MSVC c++ compiler

as now we build with some mingw, MSVC is capable, available as OSS now, let's see if we can get it in Appveyor CI.
Would be better to build with native compiler.

EDIT:
MSVC on Win put off only after SWIG removal:
see
#55 (comment)

I suggest that we hold off on Visual Studio until after you get rid of Swig. The numenta documentation says that it will not work unless you install a very old compiler. I was not able to get the bindings to build under Visual Studio with C++11 although I have not tried since we removed capnproto. And I agree MingW is not really a viable build environment.

ArrayBase pull request.

@breznak I need your help. I am trying to submit a pull request but having a bit of a problem. Here is what I did:

Cloned the htm-community/nupic.cpp repository.
Opened a new local branch "Array"
copied my source files for ArrayBase, Array, ArrayRef to the new branch.
Made a few changes so that it would work. built and ran the unit tests.
committed the changes and pushed them to github.com/dkeeney/nupic.cpp Array branch
On htm-community/nupic.cpp repository, clicked the pull request button. Tried to specify the pull request coming from github.com/dkeeney/nupic.cpp Array branch to htm-community/nupic.cpp Master branch. But it will not let me select that combination.

Am I doing this all wrong?

Investigate removal of custom math/*Matrix;

There is a huge ammount of code in math/*Matrix that is old, unmaintained and quite untested. Propose removal and replacement :

replace with Connections
or Eigen #42
or keep only the SparseBinaryMatrix

Look at the problem and proposed order of progress:

grep -R 'Matrix' src/nupic/math/ | cut -d: -f1 |sort -u

#93 SP on Connections

Prerequisities:

c++ speed tests #30
validate Python nupic still compatible with this nupic.core #137

Publish library bin builds after each merge

#1

New Spatial Pooler with Macro columns for faster Local inhibition

Performance improvements for Spatial Pooler Topology / Local Inhibition. I mentioned this on the numenta.org forum, and here I'm hoping to flesh out the idea more and communicate it with you all.

The spatial pooler with global inhibition works great as is; however local inhibition does not scale well because of the algorithms used. The differences between local and global inhibition happen at a large scale, but within a small (topological) area local and global inhibition do the same thing. Poor mans topology uses global inhibition to approximate local inhibition by making a spatial pooler with global inhibition for each area of local inhibition. In other words: Macro columns can use global inhibition and still have a large scale topology, by simulating a macro column for each topological area.

Pros:

Speed, this should run as fast as the underlying spatial poolers with global inhibition
API, should be similar to the underlying spatial pooler's API

Cons:

Spatial resolution. The current local-inhibition spreads the mini-columns across the input space, but this proposal would cluster many mini-columns into a point and many clusters are spread across the input space. This can be mitigated by using many clusters of mini-columns which allows for an evenly spread blanket of mini-columns.

Implementation:
A new C++ class which will create and maintain a list of SpatialPooler instances, one for each macro column. Macro columns are arranged in a uniform grid over the input space. Macro columns inputs are rectangular slices of the input space.

Example:
The MNIST dataset would be a good example. Its fast, easy to solve, widely recognized, and its visual data which is pretty.

API: Similar to SpatialPooler class ...

I thought that I'd replace references to "columns" with either "macroColumns" or "miniColumns" throughout this class.
initialize() - has the same parameters as the SP class except:
- remove param columnDimensions
- add param macroColumnDimentions of type vector<UInt> This must have the same length as the inputDimensions.
- add param miniColumnsPerMacro of type UInt
- change type of potentialRadius from UInt to vector<Real>
- change type of wrapAround from bool to vector<bool>
compute() - no change to public facing API. This method will deal with dividing up the inputs, running SP.compute(), and concatenating the results.
Add method getMacroColumns() -> vector<*SpatialPooler> Use this method to access the underlying SP instances.
Replace method getColumnDimensions() with:
- getMacroColumnDimensions() -> vector<UInt>
- getMiniColumns() -> UInt

...

Started a separate issue from #3 (comment)
Related #84
Author @ctrl-z-9000-times

Problems with ConnectionsPerformanceTest

When running unit_test I got this:
[ RUN ] ConnectionsPerformanceTest.testTMLarge
0.013655 in temporal memory (large): initialize
24.1005 in temporal memory (large): initialize + learn
28.7102 in temporal memory (large): initialize + learn + test
/home/dave/cpp/src/test/unit/algorithms/ConnectionsPerformanceTest.cpp:277: Failure
Expected: (tim) <= (28.0f), actual: 28.7102 vs 28
[ FAILED ] ConnectionsPerformanceTest.testTMLarge (28723 ms)

This was compiled in debug mode on Ubuntu.
Two problems:

that actual time it takes to run (28 seconds)
The actual error which may be in the test facility.

Removal/rewrite of TP(=Cells4), BacktrackingTM in favor of TM

TP/Cells4 & related classes account for a huge number of old/ugly code
TM is currently being used instead of TP for most(all?) use-cases
find on forums, etc if TP has still a valid use-case. Where is it better performing than TM?
--- #327 (comment)
- BackTM is slightly better at Anomaly (NAB)
remove TP/Cells4 from our codebase
- remove BacktrackingTM

EDIT: more discussion on why those should be removed here #327

Reduce dependencies

@dkeeney @chhenning I was reviewing your interesting work on refactoring/reducing nupic.core, and I'd like to get it merged back here to some extend.

Features that I like:

using c++ instead of boost, where possible. Seen in @dkeeney 's base branch? ~~Although I'm interested only in changes with c++11 (not ++17 as you target)~~ Ok, let's go c++17. #55 #106
removing apr* , boost.filesystem is the replacement? Also in the base branch? #74
getting rid of zlib / capnp for compression. I think one of you sticks with either. I'd for sure remove zlib, as users can zip themselves; ~~capnp adds lots of mess, but works quite ok, do you think vanilla plain-text backups are ok/better?~~ #48
removing Swig, replacing with pybind11 (@chhenning ?) does it work well? which branch can I start cherry-picking and merging from? Or David has gone the pure C++ way.
...this bullet will be quite complex, I'll leave it for now for further discussion. #81
- Do I still need/benefit from your SP/TMRegion classes? #54

Travis CI for OSX

#1

try update osx_image : https://docs.travis-ci.com/user/reference/osx/
avoid pip from numenta

Port integration tests to unit-tests

from src/test/integration/

PyRegion
CppRegion
ConnectionPerformance-Test
move other src/examples/ to unit tests

this will

simplify cmake
better coverage
easier to run tests in CI

Replace custom Random with std::uniform_random_distribution , remove "singleton"

This will solve many problems and avoid non-related yet complicated code.

replace shuffle and sample() methods with standard c++
replace custom Random with a std implementation
- verify it gives same sequence (for same seed) on all platforms
remove RandomImpl is a singleton "feature"
move code cleanups

Merge unit tests from dkeeny

https://github.com/chhenning/nupic.core/pull/4/commits

Form team of Reviewers

Optimization for performance

Steps:

set baseline benchmarking tests, the more, the better
- micro benchmarks
- IDE profiling
- real life benchmark hotgym) #30
refactor code to use shared, encapsulated class for passing around data, "SDR type"
- for now it could be typedef UInt*,
- later wrap vector, add some methods,
- even later wrap opt-Matrix type,...
identify bottlenecks
vectorize
- almost all the optimization libraries work on vectors
- replace usecases where we have setPermanence(newValue) called in a loop, with vectorized version (a scalar can be a vector with 1 item)
compare math library toolkits
- the library have their data type (EIgenMatrix, etc)
- converting to/from it will kill the (gained) performance -> "SDR type"
iterative optimizations

Requirements:

what we want from the library?
speed
multi-platform
sparse (memory efficient)
big user-base, popular
low code "intrusiveness"
CPU backend (SSE, openMP)
nVidia GPU backend (CUDA)
AMD GPU backend (openCL)
open source
clean & lean API (ease of use)
bindings/support for other languages (python,...)
I don't need no optimizations

Considered toolkits:

Blaze https://bitbucket.org/blaze-lib/blaze
Eigen http://eigen.tuxfamily.org/index.php?title=Main_Page
other? let me know

Links:

nicely detailed issue numenta#28 Please read, ideas, code,...
Eigen vs Blaze discussion: https://news.ycombinator.com/item?id=10117971

Wishlist

Unordered, confusing, just ideas-drop list.

top down compute (so we can get input for given SDR, without Classifier)
Performance optimizations #3
Re-Organize repo structure into more isolated, modular APIs
move to Python 3
idealize goals, directions of this fork, publish

Spatial Pooler: investigate using Connections as backend

Connections are optimized structure used in TM, if we use 1 cell per column, this should be able to mimic SP.

Then we can delete SparseBinaryMatrix (used as backend in SP), this PR is mutualy exclusive to #105

implement testing benchmark for SP #30
see if SP can be implemented using Connections -> yes!
benchmark SparseBinaryMatrix vs Connections performance -> about 10x faster (11s for 1000iters on SparseBinaryMatrix vs 7s on 5000iters on Connections in HelloSPTP benchmark)
remove SparseBinaryMatrix -> #104

After #92

Switch to c++17 and remove Boost

We need to get rid of SWIG before this step.

After #55
After #81

AppVeyor CI for Win

#1

Merge BacktrackingTM from dkeeny

Hi @dkeeney !
I'd like to merge your C++ implementation of Backtracking TM.
From
https://github.com/dkeeney/nupic.core/tree/base
I have cloned the repo and can cherry-pick the commits. Just headsup: is the development finished?Any tests? etc.

Replace APR libraries with std:: libraries (abandoned)

Another placeholder for things to do.
Delete the APR library dependency and see what breaks. Fix it by replacing it with calls to std:: libraries.
In the OS folder the Directories and Path objects can be implemented with #include filesystem . This of course requires switching to C++17.

Edit: part of #47

Broken Windows CI?

all PRs quit with this "unrelated" error:
https://ci.appveyor.com/project/breznak/nupic-cpp/build/0.3.0.181#L253

What changed in Appveyor? We can try fixing the Swing warnings, or update compiler, or move to MSVC completely #69 ? Can you test and build some of the "broken" PRs @dkeeney ?

Separate tests as "external" project

having a separate folder, namespace
separate CMake file. That is called include(tests) from our main CMake file

Follow up to #61 #2

Remove the #define macros in utils/Math.hpp

it's hell ugly. ITER_1 etc , ideally all of them.

Merge Region classes from dkeeney

SPRegion
TMRegion
tests

htm-community / htm.core Goto Github PK

htm.core's Issues

Recommend Projects

Recommend Topics

Recommend Org