Git Product home page Git Product logo

byteslice's Introduction

ByteSlice is a main-memory data format for fixed length unsigned integers, and attributes that can be encoded as such (e.g., age, datetime). It is primarily designed for highly efficient ordinal comparison based scan and lookup in column-store databases. The basic idea is to chop column values into multiple bytes and store the bytes at different contiguous memory spaces.

The implementation heavily utilizes Single-Instruction-Multiple-Data (SIMD) instruction sets on modern CPUs to achieve bare-metal speed processing. The scan algorithms are optimized to reduce number of instructions, memory footprint, branch mis-predictions and other performance-critical factors.

Using the library

A quick glimpse:

// Create a column of two million 12-bit values in ByteSlice format
Column* column = new Column(ColumnType::kByteSlicePadRight, 12, 2*1024*1024);
// Prepare a bit vector to store scan results
BitVector* bitvector = new BitVector(column);
// Execute scan on the column with predicate value < 3
column->Scan(Comparator::kLess,
            3,
            bitvector,
            Bitwise::kSet);

Build from source

Clone

git clone --recursive https://github.com/fzqneo/ByteSlice.git

Or this after cloning without --recursive:

git submodule update --init --recursive

Build

You need CMake to generate build scripts. Makefile is tested.

To generate debug build:

mkdir debug
cd debug
cmake -DCMAKE_BUILD_TYPE=debug ..
make -j4

To generate release build:

mkdir release
cd release
cmake -DCMAKE_BUILD_TYPE=release ..
make -j4

NOTE: The default build type is debug, which may not give optimal performance.

Running examples

Example programs are in 'example/' directory.

example/example1 -s 10000000

To see a full list of options:

example/example1 -h

NOTE: The source code of example program showcases how to use the library.

Multithreading

Multithreading is controlled by OpenMP environment variables: (assume you use GCC)

OMP_NUM_THREADS=2 ./example/example1

NOTE: The default number of threads depends on the system, which is usually the number of cores. You may also want to set the thread affinity via GOMP_CPU_AFFINITY (assume you use GCC).

Running tests

make check

Build tests without running.

make check-build

Documentation (work in progress)

You need doxygen to generate documentations in html and latex.

 doxygen

File structure

  • example/ - Example programs

  • third-party/ - Third-party libraries

  • src/ - ByteSlice library source files

  • tests/ - Unit tests written in GoogleTest framework

Run examples in Docker

A compiled release-build is contained in the Docker image zf01/byteslice. You need to install Docker.

Run with default parameters:

docker run --rm zf01/byteslice

Run with custom parameters:

docker run --rm -it zf01/byteslice /bin/bash
OMP_NUM_THREADS=1 /root/ByteSlice/release/example/example1 -s 16000000 -b 17

Build Docker image from source

# Run inside the project directory
docker build -t byteslice .

Citing this work

Ziqiang Feng, Eric Lo, Ben Kao, and Wenjian Xu. "Byteslice: Pushing the envelop of main memory data processing with a new storage layout." In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 31-46. ACM, 2015.

Download: http://dl.acm.org/citation.cfm?id=2747642

BibTex:

@inproceedings{Feng:2015:BPE:2723372.2747642,
 author = {Feng, Ziqiang and Lo, Eric and Kao, Ben and Xu, Wenjian},
 title = {ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout},
 booktitle = {Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data},
 series = {SIGMOD '15},
 year = {2015},
 isbn = {978-1-4503-2758-9},
 location = {Melbourne, Victoria, Australia},
 pages = {31--46},
 numpages = {16},
 url = {http://doi.acm.org/10.1145/2723372.2747642},
 doi = {10.1145/2723372.2747642},
 acmid = {2747642},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {column store, main memory, olap, simd, storage layout},
} 

Contact

Ziqiang Feng ( zf at cs dot cmu dot edu )

Platform requirements

  1. C++ compiler supporting C++11, OpenMP and AVX2
  2. CPU with AVX2 instruction set extension

Tested platform

This package has been tested with the following configuration:

  • Linux 3.13.0-66-generic (64-bit)
  • Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
  • g++ 4.9.3

Known issues

  1. posix_memalign() is used in some files, causing compilation failure on Windows.

byteslice's People

Contributors

fzqneo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.