My use case has small integer arrays , typically order of size 32-128 ints. Compressed

small array compression about turbopfor-integer-compression HOT 3 CLOSED

patelprateek commented on June 18, 2024

small array compression

from turbopfor-integer-compression.

Comments (3)

powturbo commented on June 18, 2024

Integer Compression is highly depending on the distribution of the data.
Simply start icapp, it prints the best functions for your data.

One of the efficient ways to compress ML models is quantization.
The quantized data can compressed with TurboPFor (just try icapp).
You can also try lossy floating point compression with TurboRazor "fprazor32"

If you don't need a fine grained direct but block based access, you can further compress the quantized data using an entry coder such "asymmetric Numeral Systems"
see my propositions here (user dnd)

from turbopfor-integer-compression.

patelprateek commented on June 18, 2024

can you point me to some example , it seems like icapp binary would need data in a specific format to ?

Regarding ml models : agreed , we do use quantization , but i am trying to compress some parameters that are learnt by non quantized models for storage and bandiwdth improvement , so trying approaches where we are able to efficiently compress this data : for example large emebedding tables from pre-trained models.
Hence i wanted to see

if there are lossless floating point compressions in this library that works well for embedding of dimension ranging from 32 to 2048 floating point numbers
Lossy compression where i am able to tune error rate (for not we use fp16, scalar 8 bit quantization , brain float 16) , so i would like to evaluate how the compression techniques here compare when it comes toa. specific modeling or indexing task

from turbopfor-integer-compression.

powturbo commented on June 18, 2024

There is no specific format for icapp, just provide your data as raw or text (see Readme or the benchmarks in the issues).
For raw 16-bits floats or integers use "icapp file -Fs", for 32 bits use "icapp file" (32 bits is default), or "icapp file -Fu".
The data will be encoded using a block size of 128 (or 256 for avx2) floats/integers.
If yo have variable block size, then you must write your own application.
You can provide a sample and I'll check that for you.

You can also try the integer or quasi integer mode like in this benchmark.
For lossy compression check this

from turbopfor-integer-compression.

small array compression about turbopfor-integer-compression HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent