I'd like to create a class for dealing with SDRs. The primary function of this class

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data

Refactor: create SDR class about htm.core HOT 18 CLOSED

htm-community commented on June 3, 2024 1

Refactor: create SDR class

from htm.core.

Comments (18)

dkeeney commented on June 3, 2024 1

Yes, in fact perhaps the SDR class might replace it if we could find a way to pass other types of arrays as well.

from htm.core.

breznak commented on June 3, 2024

@ctrl-z-9000-times I'm definitely 👍 for this idea! It is in line with #58 ,

do you plat to "just introduce" the class and use it as-needed, or enforce the usage of SDR_t all over the codebase? (I'd welcome the latter)

I'd add some more helper methods, like:

bool isSparse()
Real getSparsity(()
bool isDistributed()
set/getMeaning() //the real-world value

About the implementation, I'm worried about the speed/effectiveness,

can we make it fast&clean enough? Or would be better to use:

using sdr_t = vector<UInt>; //change to any type later
SDRHelper ... this class above, to manipulate sdr_t as needed

from htm.core.

dkeeney commented on June 3, 2024

I like this idea as well.
I think the underlining array elements should not be a UInt32 or float32 as it is now. Each element contains either 1 or 0 so it could be a byte or even a bit. It would take only 32 64bit elements to create a reasonable sized SDR without resorting to a sparse matrix. This SDR class could then handle all sorts of operations such that the using code never has to be aware of the underlining storage.
Just a thought.

from htm.core.

breznak commented on June 3, 2024

underlining array elements

Do you think we HAVE to stick to C-style array (Uint*)? I khow how it is used for python interfacing with numpy, but from the POV of c++ algorithms, vector or c++11 array would much improve the code.

should not be a UInt32 or float32 as it is now. Each element contains either 1 or 0 so it could be a byte or even a bit.

I'm not sure how much the CPUs/GPUs are optimized for handling float, I think byte/bit could end up slower. But it is something to try.

SDR without resorting to a sparse matrix.

I would like to replace SparseMatrix, either by using Connections for SP, or some sparse matrix library (Eigen)

This SDR class could then handle all sorts of operations such that the using code never has to be aware of the underlining storage.

This is the key idea, make an abstraction for the raw data type 👍

from htm.core.

dkeeney commented on June 3, 2024

should not be a UInt32 or float32 as it is now. Each element contains either 1 or 0 so it could be a byte or even a bit.
I'm not sure how much the CPUs/GPUs are optimized for handling float, I think byte/bit could end up slower. But it is something to try.

A byte element can be accessed at the same speed as an Int or float. Bit might cost a little more time due to shift and mask. Type conversions with float are costly. The real problem is being compatible with Python numpy arrays without doing a copy.

from htm.core.

ctrl-z-9000-times commented on June 3, 2024

do you plan to "just introduce" the class and use it as-needed, or enforce the usage of SDR_t all over the codebase? (I'd welcome the latter)

I prefer incremental changes B/C they are easier to do. Id also rather not break API until the class has proven its self in behind the scenes/ private methods.

from htm.core.

dkeeney commented on June 3, 2024

Agreed.

from htm.core.

ctrl-z-9000-times commented on June 3, 2024

bool isSparse()
bool isDistributed()

I don't understand what these would do? The point of this is to switch between sparse & dense representations at will so it's always sparse. And i dont know how you would check if its distrbuted since thats a statistical property.

would be better to use:
using sdr_t = vector;
SDRHelper ...

I also don't understand why this would be faster other than by avoiding a method lookup. These method lookups typically happen in the outermost loop too.

from htm.core.

breznak commented on June 3, 2024

I don't understand what these would do? The point of this is to switch between sparse & dense representations at will so it's always sparse. And i dont know how you would check if its distrbuted since thats a statistical property.

Yes, these could be helper methods that verify the property for you. About isSparse(), I came from the idea of using sdr_t and then your input doesn't really have to be a SDR (ie encoder, TM output, union...all may not be sparse at all)

Actually, check recently merged utils/VectorHelpers::binaryToSparse() It is a helper method for conversion from dense binary to sparse and vice versa. Is that what you wanted?

from htm.core.

ctrl-z-9000-times commented on June 3, 2024

Using vectors is a good idea.

One issue I can forsee is that you can not convert from raw pointer to vector without copying, though this can be fixed by changing the calling code to also use vectors. Also I would overload the setters to accept both C arrays & vectors BC I'd rather not force the calling code to be reworked.

Also, even with vectors the SDRs values are immutable because this class needs to keep cached copies of all the different data formats which it supports, so changing data can easily break this code unless you explicitly get the data, modify it, and reassign it using the public API which will clear the caches. You can append to a vector but that does not mean it will work.

from htm.core.

dkeeney commented on June 3, 2024

Actually, you can get a pointer to the internal buffer of a vector. use something like &a[0]. It is a contiguous buffer. You can also set the size of that buffer using the vector functions. And you can assign data to that buffer using that pointer as long as you are sure that the buffer size is large enough. Other STL structures are not necessarily contiguous.

…

On Wed, Nov 14, 2018 at 9:50 AM David McDougall ***@***.***> wrote: Using vectors is a good idea. One issue I can forsee is that you can not convert from raw pointer to vector without copying, though this can be fixed by changing the calling code to also use vectors. Also I would overload the setters to accept both C arrays & vectors BC I'd rather not force the calling code to be reworked. Also, even with vectors the SDRs values are immutable because this class needs to keep cached copies of all the different data formats which it supports, so changing data can easily break this code unless you explicitly get the data, modify it, and reassign it using the public API which will clear the caches. You can append to a vector but that does not mean it will work. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#109 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFBa_4PlUs-3GK6gGVhQZ-CnQjdl7m2kks5uvFffgaJpZM4YduNh> .

from htm.core.

breznak commented on June 3, 2024

you can get a pointer to the internal buffer of a vector

vector.data() is the c11 way
I would like to see how (SP,...) work with sparse vectors, I think the time saved on loops and finding indices will well cover the 2 extra conversions at the end.

Also I would overload the setters to accept both C arrays & vectors BC I'd rather not force the calling code to be reworked.

👍

vector compute(vector in) //new 
void compute(vector in, UInt* out) { 
  out.data() = compute(in)
}

And we can program the internals with modern vectors etc and still on public API have Uint* C-arrays for python

from htm.core.

dkeeney commented on June 3, 2024

We also need to be sure this will work with my Array class (which is a container for SDR arrays being passed in the Link object.

from htm.core.

breznak commented on June 3, 2024

his will work with my Array class

Though we have the flexibility to change that class, right

from htm.core.

ctrl-z-9000-times commented on June 3, 2024

Each element contains either 1 or 0 so it could be a byte or even a bit.

Computers work really well with bytes. They're fast, directly addressable, and built into C++.

Using bits can squeeze in 8x more memory performance but C++ is not setup to deal with them. C++ has a specialized class to manage bit vectors: "vector<bool>" which according to the documentation "provides a quirky interface". quirky. Do either of you have experience with this class? Is it usable or should we stick to the standard byte? The SDR class will work fine with either selection, it's all of the calling code which I'm concerned about.

from htm.core.

dkeeney commented on June 3, 2024

I do not have experience with vector<bool> although lots of experience working with bits using shift and mask. Let's not use bool; I recommend Byte which is defined in nupic as "signed char" for some reason. The calling routines are currently setup to handle 4byte elements. I think SP uses UInt32 and TP uses Real (which is defined as a Real32 which is float) but you should check. They just check for 0 or non-zero. As a carryover from Python, they seem to use UInt32 and Real32 interchangeable. It just happens to work for SDR's but I think there are type conversions going on that they don't realize.

…

On Thu, Nov 15, 2018 at 7:43 AM David McDougall ***@***.***> wrote: Each element contains either 1 or 0 so it could be a byte or even a bit. Computers work really well with bytes. They're fast, directly addressable, and built into C++. Using bits can squeeze in 8x more memory performance but C++ is not setup to deal with them. C++ has a specialized class to manage bit vectors: "vector<bool>" which according to the documentation "provides a quirky interface". *quirky*. Do either of you have experience with this class? Is it usable or should we stick to the standard byte? The SDR class will work fine with either selection, it's all of the calling code which I'm concerned about. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#109 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFBa_-PX5kHOcbDn_2hn306WSXPBaghaks5uvYuxgaJpZM4YduNh> .

from htm.core.

breznak commented on June 3, 2024

I have done some work with this microoptimization

IMHO here applies "premature optimization is root of all evil". Might focus on clean&easy code and algorithmic complexity first
We'd need to benchmark and profile
https://en.cppreference.com/w/cpp/utility/bitset Bitset is what you'd want to use instead of vector
- rather than optimizing for small size (cache) and CPU speed, I'd try vectorizing the code as much as possible (c++11/17 std::algorithms - fill, transform /c++17 only, that's why I want that revision/ , minmax_element,...)
  - with already vector operations, offload to a data-type & lib that supports GPGPU (TensorFlow, Eigen #14 , CUDA #50 ..)

So I'd suggest:

make it with normal vector
ensure same data-type is passed along all the processing pipeline (encode->SP->TM...) and used everywhere internally (the role of SDR_t)
try SDR_t = bitset
vectorize code
GPGPU

from htm.core.

breznak commented on June 3, 2024

Fixed in #113, closing.
We may reopen for the applications of the new SDR class, or create a new PR&issue.

from htm.core.

Refactor: create SDR class about htm.core HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent