Git Product home page Git Product logo

bdot's Introduction

Bdot

Fast Matrix Multiply on Pretty Big Data

Build Status

Bdot does big dot products (by making your RAM bigger on the inside). It's based on Bcolz and includes transparent disk-based storage.

Bigger on the Inside

Supports matrix . vector and matrix . matrix for most common numpy numeric data types (numpy.int64, numpy.int32, numpy.float64, numpy.float32)

Install

pip install bdot

or build from source (requires bcolz >= 0.9.0)

python setup.py build_ext --inplace
python setup.py install

Usage

Matrix . Vector

Multiply a matrix (carray) with a vector (numpy.ndarray), returns a vector (numpy.ndarray)

import bdot
import numpy as np

matrix = np.random.random_integers(0, 12000, size=(300000, 100))
bcarray = bdot.carray(matrix, chunklen=2**13, cparams=bdot.cparams(clevel=2))

v = bcarray[0]

result = bcarray.dot(v)
expected = matrix.dot(v)

# should return True
(expected == result).all()

Matrix . Matrix

Multiply a matrix (carray) with the transpose of a matrix (carray), returns a matrix (carray)

import bdot
import numpy as np

matrix = np.random.random_integers(0, 120, size=(1000, 100))
bcarray1 = bdot.carray(matrix, chunklen=2**9, cparams=bdot.cparams(clevel=2))
bcarray2 = bdot.carray(matrix, chunklen=2**9, cparams=bdot.cparams(clevel=2))

# calculates bcarray1 . bcarray2.T (transpose)
result = bcarray1.dot(bcarray2)
expected = matrix.dot(matrix.T)

# should return True
(expected == result).all()

Save Result to Disk (Experimental)

Save really big results directly to disk

# create correctly sized container (helper method, not required)
output = bcarray1.empty_dot(bcarray2, rootdir='/path/to/bcolz/output')

# generate results directly on disk
bcarray1.dot(bcarray2, out=output)

# make sure the last bits get written
output.flush()

This method can also be used to get carray output for ndarray vector input, just leave off the rootdir parameter in empty_dot, or create your own carray container.

Test

nosetests bdot

Simple Benchmarks

Benchmarks were done on data structures generated by the above code, are very informal, and vary a bit across data sets.

Space

  • numpy ~229MB
  • bdot ~64MB

compression ratio: 3.5

Time

  • numpy ~33 ms
  • bdot ~48 ms

percent performance: 68%

Goals

This project has three goals, each slightly more fantastic than the last:

  1. Allow computation on (compressed) data which is (~5-10x) larger than RAM at approximately the same speed as numpy.dot

  2. Allow computation on (slightly compressed) data at speeds that improve on numpy.dot

  3. Allow computation on (compressed) data which resides on disk at some sizable percentage (~50-30%) of the speed of numpy.dot

So far, the first goal has been met.

Acknowledgements

This library wouldn't be possible without all the talented people who worked hard to create Bcolz (and the libraries on which it's based). Initial code was also heavily influenced by Bquery.

Awesome TARDIS can be found here

bdot's People

Contributors

waylonflinn avatar

Watchers

James Cloos avatar Florian Gesser avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.