Git Product home page Git Product logo

Comments (18)

dkeeney avatar dkeeney commented on June 3, 2024 1

Yes, in fact perhaps the SDR class might replace it if we could find a way to pass other types of arrays as well.

from htm.core.

breznak avatar breznak commented on June 3, 2024

@ctrl-z-9000-times I'm definitely 👍 for this idea! It is in line with #58 ,

  • do you plat to "just introduce" the class and use it as-needed, or enforce the usage of SDR_t all over the codebase? (I'd welcome the latter)

I'd add some more helper methods, like:

bool isSparse()
Real getSparsity(()
bool isDistributed()
set/getMeaning() //the real-world value

About the implementation, I'm worried about the speed/effectiveness,

  • can we make it fast&clean enough? Or would be better to use:
using sdr_t = vector<UInt>; //change to any type later
SDRHelper ... this class above, to manipulate sdr_t as needed

from htm.core.

dkeeney avatar dkeeney commented on June 3, 2024

I like this idea as well.
I think the underlining array elements should not be a UInt32 or float32 as it is now. Each element contains either 1 or 0 so it could be a byte or even a bit. It would take only 32 64bit elements to create a reasonable sized SDR without resorting to a sparse matrix. This SDR class could then handle all sorts of operations such that the using code never has to be aware of the underlining storage.
Just a thought.

from htm.core.

breznak avatar breznak commented on June 3, 2024

underlining array elements

Do you think we HAVE to stick to C-style array (Uint*)? I khow how it is used for python interfacing with numpy, but from the POV of c++ algorithms, vector or c++11 array would much improve the code.

should not be a UInt32 or float32 as it is now. Each element contains either 1 or 0 so it could be a byte or even a bit.

I'm not sure how much the CPUs/GPUs are optimized for handling float, I think byte/bit could end up slower. But it is something to try.

SDR without resorting to a sparse matrix.

I would like to replace SparseMatrix, either by using Connections for SP, or some sparse matrix library (Eigen)

This SDR class could then handle all sorts of operations such that the using code never has to be aware of the underlining storage.

This is the key idea, make an abstraction for the raw data type 👍

from htm.core.

dkeeney avatar dkeeney commented on June 3, 2024
should not be a UInt32 or float32 as it is now. Each element contains either 1 or 0 so it could be a byte or even a bit.

I'm not sure how much the CPUs/GPUs are optimized for handling float, I think byte/bit could end up slower. But it is something to try.

A byte element can be accessed at the same speed as an Int or float. Bit might cost a little more time due to shift and mask. Type conversions with float are costly. The real problem is being compatible with Python numpy arrays without doing a copy.

from htm.core.

ctrl-z-9000-times avatar ctrl-z-9000-times commented on June 3, 2024

do you plan to "just introduce" the class and use it as-needed, or enforce the usage of SDR_t all over the codebase? (I'd welcome the latter)

I prefer incremental changes B/C they are easier to do. Id also rather not break API until the class has proven its self in behind the scenes/ private methods.

from htm.core.

dkeeney avatar dkeeney commented on June 3, 2024

Agreed.

from htm.core.

ctrl-z-9000-times avatar ctrl-z-9000-times commented on June 3, 2024

bool isSparse()
bool isDistributed()

I don't understand what these would do? The point of this is to switch between sparse & dense representations at will so it's always sparse. And i dont know how you would check if its distrbuted since thats a statistical property.

would be better to use:
using sdr_t = vector;
SDRHelper ...

I also don't understand why this would be faster other than by avoiding a method lookup. These method lookups typically happen in the outermost loop too.

from htm.core.

breznak avatar breznak commented on June 3, 2024

I don't understand what these would do? The point of this is to switch between sparse & dense representations at will so it's always sparse. And i dont know how you would check if its distrbuted since thats a statistical property.

Yes, these could be helper methods that verify the property for you. About isSparse(), I came from the idea of using sdr_t and then your input doesn't really have to be a SDR (ie encoder, TM output, union...all may not be sparse at all)

Actually, check recently merged utils/VectorHelpers::binaryToSparse() It is a helper method for conversion from dense binary to sparse and vice versa. Is that what you wanted?

from htm.core.

ctrl-z-9000-times avatar ctrl-z-9000-times commented on June 3, 2024

Using vectors is a good idea.

One issue I can forsee is that you can not convert from raw pointer to vector without copying, though this can be fixed by changing the calling code to also use vectors. Also I would overload the setters to accept both C arrays & vectors BC I'd rather not force the calling code to be reworked.

Also, even with vectors the SDRs values are immutable because this class needs to keep cached copies of all the different data formats which it supports, so changing data can easily break this code unless you explicitly get the data, modify it, and reassign it using the public API which will clear the caches. You can append to a vector but that does not mean it will work.

from htm.core.

dkeeney avatar dkeeney commented on June 3, 2024

from htm.core.

breznak avatar breznak commented on June 3, 2024

you can get a pointer to the internal buffer of a vector

vector.data() is the c11 way
I would like to see how (SP,...) work with sparse vectors, I think the time saved on loops and finding indices will well cover the 2 extra conversions at the end.

Also I would overload the setters to accept both C arrays & vectors BC I'd rather not force the calling code to be reworked.

👍

vector compute(vector in) //new 
void compute(vector in, UInt* out) { 
  out.data() = compute(in)
}

And we can program the internals with modern vectors etc and still on public API have Uint* C-arrays for python

from htm.core.

dkeeney avatar dkeeney commented on June 3, 2024

We also need to be sure this will work with my Array class (which is a container for SDR arrays being passed in the Link object.

from htm.core.

breznak avatar breznak commented on June 3, 2024

his will work with my Array class

Though we have the flexibility to change that class, right

from htm.core.

ctrl-z-9000-times avatar ctrl-z-9000-times commented on June 3, 2024

Each element contains either 1 or 0 so it could be a byte or even a bit.

Computers work really well with bytes. They're fast, directly addressable, and built into C++.

Using bits can squeeze in 8x more memory performance but C++ is not setup to deal with them. C++ has a specialized class to manage bit vectors: "vector<bool>" which according to the documentation "provides a quirky interface". quirky. Do either of you have experience with this class? Is it usable or should we stick to the standard byte? The SDR class will work fine with either selection, it's all of the calling code which I'm concerned about.

from htm.core.

dkeeney avatar dkeeney commented on June 3, 2024

from htm.core.

breznak avatar breznak commented on June 3, 2024

I have done some work with this microoptimization

  • IMHO here applies "premature optimization is root of all evil". Might focus on clean&easy code and algorithmic complexity first
  • We'd need to benchmark and profile
  • https://en.cppreference.com/w/cpp/utility/bitset Bitset is what you'd want to use instead of vector
    • rather than optimizing for small size (cache) and CPU speed, I'd try vectorizing the code as much as possible (c++11/17 std::algorithms - fill, transform /c++17 only, that's why I want that revision/ , minmax_element,...)
      • with already vector operations, offload to a data-type & lib that supports GPGPU (TensorFlow, Eigen #14 , CUDA #50 ..)

So I'd suggest:

  1. make it with normal vector
  2. ensure same data-type is passed along all the processing pipeline (encode->SP->TM...) and used everywhere internally (the role of SDR_t)
  3. try SDR_t = bitset
  4. vectorize code
  5. GPGPU

from htm.core.

breznak avatar breznak commented on June 3, 2024

Fixed in #113, closing.
We may reopen for the applications of the new SDR class, or create a new PR&issue.

from htm.core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.