Git Product home page Git Product logo

turbo-base64's Introduction

Turbo Base64:Fastest Base64 SSE/AVX2/AVX512/Neon/Altivec

Build ubuntu

Fastest Base64 SIMD Encoding library
  • 100% C (C++ headers), as simple as memcpy
  • No other base64 library encode or decode faster
  • ✨ Scalar can be faster than other SSE or ARM Neon based base64 libraries
  • SSE faster than other SSE/AVX/AVX2! base64 library
  • Fastest AVX2 implementation
  • TurboBase64 AVX2 decoding up to ~2x faster than other AVX2 libs.
  • TurboBase64 is 3,5-4 times faster than other libs for short strings
  • Fastest ARM Neon base64
  • πŸ†•(2023.04) avx512 - 2x faster than avx2, faster than any other implementation
  • πŸ‘ Dynamic CPU detection and JIT scalar/sse/avx/avx2/avx512 switching
  • Base64 robust error checking, optimized for long+short strings
  • Portable library, 32/64 bits, SSE/AVX/AVX2/AVX512, ARM Neon, Power9 Altivec
  • OS:Linux amd64, arm64, Power9, MacOs+Apple M1, s390x. Windows: Mingw, visual c++
  • Big endian + Little endian
  • Ready and simple to use library, no armada of files, no hassles dependencies
  • LICENSE GPL 3 . Commercial license available. Contact us at powturbo [AT] gmail [DOT] com


Download Turbo-Base64 executable benchmark tb64app from releases, extract the files and type "tb64app"

Benchmark incl. the best SIMD Base64 libs:

  • Single thread
  • Including base64 error checking
  • Small file + realistic and practical (no PURE cache) benchmark
  • Unlike other benchmarks, the best of the best scalar+simd libraries are included
  • all libraries with the latest version

Benchmark AMD CPU: AMD Ryzen 9 7950X @ 4,50 GHz, DDR5 6000 CL30 - gcc-12.2

E Size ratio% E MB/s D MB/s 1,000,000 bytes - 2023.07
1333336 133.33% 77619 76716 8:tb64v512vbmi
1333336 133.33% 41325 48783 7:_tb64v256 avx2
1333336 133.33% 45292 46665 5:tb64v256 avx2
1000000 100.00% 37047 31694 10:memcpy
1333336 133.33% 25077 28537 4:tb64v128a avx
1333336 133.33% 24375 27880 3:tb64v128
1333336 133.33% 9513 6908 2:tb64x
1333336 133.33% 9513 5975 9:_tb64x
1333336 133.33% 4914 5182 1:tb64s
E Size ratio% E MB/s D MB/s 10,000 bytes - 2023.07
13336 133.36% 89079 92006 8:tb64v512vbmi
10000 100.00% 84418 85703 10:memcpy
13336 133.36% 34963 46216 7:_tb64v256 avx2
13336 133.36% 40722 44552 5:tb64v256 avx2
13336 133.36% 22601 27298 4:tb64v128a avx
13336 133.36% 21113 26930 3:tb64v128
13336 133.36% 9648 6809 2:tb64x
13336 133.36% 9626 5599 9:_tb64x
13336 133.36% 4937 5184 1:tb64s

Benchmark Intel CPU: i7-9700k 3.6GHz gcc 11.2

E Size ratio% E MB/s D MB/s Name 50,000 bytes - 2022.02
66668 133.3 32794 37837 tb64v256 Turbo Base64 avx2
66668 133.3 27789 22264 b64avx2 aklomp Base64 avx2
66668 133.3 25305 21980 fb64avx2 lemire Fastbase64 avx2
66668 133.3 17348 20686 tb64v128a Turbo Base64 avx
66668 133.3 16035 18865 tb64v128 Turbo Base64 sse
66668 133.3 15820 13078 b64avx aklomp Base64 avx
66668 133.3 15322 11302 b64sse aklomp Base64 sse41
50000 100.0 47593 47623 memcpy
E Size ratio% E MB/s D MB/s Name 1 MB - 2022.02
1333336 133.3 29086 29748 tb64v256 Turbo Base64 avx2
1333336 133.3 26153 22515 b64avx2 Base64 avx2
1333336 133.3 23686 21231 fb64avx2 Fastbase64 avx2
1333336 133.3 16897 20215 tb64v128a Turbo Base64 avx
1333336 133.3 15932 18749 tb64v128 Turbo Base64 sse
1333336 133.3 15537 12959 b64avx Base64 avx
1333336 133.3 15135 11304 b64sse Base64 sse41
1333336 133.3 6546 5473 TB64x Turbo Base64 scalar
1333336 133.3 6495 4454 b64plain Base64 plain
1333336 133.3 1908 2752 TB64s Turbo Base64 scalar
1333336 133.3 2541 4289 chrome Google Chrome base64
1333336 133.3 2670 2299 fb64plain FastBase64 plain
1333334 135.4 1754 219 linux Linux base64
1000000 100.0 28688 28656 memcpy

TurboBase64 vs. Base64 for short strings (incl. checking)

String length E MB/s D MB/s Name 50,000 bytes - short strings 2022.02
4 - 16 2330 2161 TB64avx2 Turbo Base64 avx2
891 734 b64avx2 Base64 avx2
8 - 32 3963 3570 TB64avx2 Turbo Base64 avx2
1348 943 b64avx2 Base64 avx2
16 - 64 6881 5937 TB64avx2 Turbo Base64 avx2
2509 1488 b64avx2 Base64 avx2
32 - 128 10946 8880 TB64avx2 Turbo Base64 avx2
4902 2777 b64avx2 Base64 avx2
Benchmark ARM Neon: Apple M1 3,5GHz (clang 12.0)
E MB/s size ratio D MB/s 50,000 bytes (2023.08)
24012.43 66668 133.34% 15352.09 tb64v128 (turbo-base64)
19087.55 66668 133.34% 12515.17 b64neon64 (aklomp/base64)
5611.48 66668 133.34% 5092.64 tb64s
9782.45 66668 133.34% 6798.98 tb64x
6181.37 66668 133.34% 3108.54 b64plain
45566.16 50000 100.00% 45484.13 memcpy
Benchmark ARM Neon: ARMv8 A73-ODROID-N2 1.8GHz (clang 6.0)
E Size ratio% E MB/s D MB/s Name 30MB binary 2019.12
40000000 133.3 2026 1650 TB64neon Turbo Base64 Neon
40000000 133.3 1795 1285 b64neon64 Base64 Neon
40000000 133.3 1270 1095 TB64x Turbo Base64 scalar
40000000 133.3 695 965 TB64s Turbo Base64 scalar
40000000 133.3 512 782 fb64neon Fastbase64 SIMD Neon
40000000 133.3 565 460 Chrome Google Chrome base64
40000000 133.3 642 614 b64plain Base64 plain
40000000 133.3 506 548 fb64plain Fastbase64 plain
40500000 135.4 314 91 Linux Linux base64
30000000 100.0 3820 3834 memcpy
  • (bold = pareto in category) MB=1.000.000
  • (E/D) : Encode/Decode
  • Timmings are respectively relative to the base64 output/input in encode/decode.

Compile: (Download or clone Turbo Base64 SIMD)

    git clone https://github.com/powturbo/Turbo-Base64.git
    make

Usage: (Benchmark App)

    ./tb64app file 
    or  
    ./tb64app

Function usage:

static inline unsigned turbob64len(unsigned n)
Base64 output length after encoding

unsigned tb64enc(const unsigned char *in, unsigned inlen, unsigned char *out)
Encode binary input 'in' buffer into base64 string 'out'
with automatic cpu detection for simd and switch (sse/avx2/scalar
in : Input buffer to encode
inlen : Length in bytes of input buffer
out : Output buffer
return value: Length of output buffer
Remark : byte 'zero' is not written to end of output stream
Caller must add 0 (out[outlen] = 0) for a null terminated string

unsigned tb64dec(const unsigned char *in, unsigned inlen, unsigned char *out)
Decode base64 input 'in' buffer into binary buffer 'out'
in : input buffer to decode
inlen : length in bytes of input buffer
out : output buffer
return value: >0 output buffer length
0 Error (invalid base64 input or input length = 0)

Environment:

OS/Compiler (32 + 64 bits):
  • Windows: Visual C++ (2017)
  • Windows: MinGW-w64 makefile
  • Linux amd/intel: GNU GCC (>=4.6)
  • Linux amd/intel: Clang (>=3.2)
  • Linux arm: aarch64 ARMv8 Neon: gcc (>=6.3)
  • Linux arm: aarch64 ARMv8 Neon: clang (>=6.0)
  • MaxOS: XCode (>=9), apple M1
  • PowerPC ppc64le: gcc (>=8.0) incl. SIMD Altivec
References:
* SIMD Base64 publications:

Last update: 06 AUG 2023

turbo-base64's People

Contributors

cyfdecyf avatar georgthegreat avatar kwang-cognitiv avatar powturbo avatar qoega avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

turbo-base64's Issues

cannot decode base64 with length 785

Hello,
tb64declen() and tb64dec() return 0 when the input base64 has length of 785
it's repeatable with the following b64

UVNUSwEGBgACQAAF//4A//0C+fb//QcF/fgJAP8GCvgEB/0JAP77/fwJBQH2APz/AAIA/vUDAAID+wIC/vwA/gIH/fr//f0CBwH8/QMBAwgAAvf9AgAMEf0A/wX3BgAC+QD/Bv0LBQEB/QT9Bg3+A/7+Bf70/PwB+PsI9vkJAwv+AP4EAwD4//r8/fUBA/wD+P8FAAQCBQEBAP/9BgMH//4F//wAAPn4/AAH/P8JAgAAAfsGAAEJ/wAC/v75BwIHA/sF+v4ABfr8AP79A/37BQgBAAD6AAYCBAcD/QL3CP72/wAAAQAABP7+AP4AAvgB/PkAAQD+DgUABAAFBfAA///7/QoEBAEEAv8DC/8BAPoB+v7/AP//AAr+AO0S+/oFAgYF/wb7BQP99QcCAQT5CPkAAAj9AAT/8/n1AQMD9/UL/wAAAgL//fgIBv/8Aw4A+/4GCPYAAQED9AkDAvkG/gP8AggC/fr/Bwb+AgD/9f3/BgIF+BH9BP3+AgMEBAkGA/4AAwv6AQUDAPz+BwQC+/L2+f75/wX/APj/AwkAAQX+AAD6//oDAgwAAAH9+AP/AP4A/QIA/An2+/n9AAAFAAMCBAYD+wH+/QD7BP//9gABA/T3CAAJAP0A/PgB/wICAAX7BAL6BQIE//z+/wcB9/z6AvUEAAIE/wEECf748P0DAv0E/AEFAAD7+Qz/9wDeCQvq+gT09/P8/AAdERf+8df57AsC+wzv+fzY+Qrt8hgbAecG7B31AgL1+Pj6+OUP8wX5Bub9+AD6CgE=

is there any limitation on the input?

Thanks

Low performance on short strings.

ClickHouse/ClickHouse#8397 (comment)

The library behaves worse than https://github.com/aklomp/base64
on strings of average length 77 bytes:

:) SELECT avg(length(URL)) FROM test.hits

SELECT avg(length(URL))
FROM test.hits

β”Œβ”€β”€avg(length(URL))─┐
β”‚ 77.54074297450794 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

You can download the test data here:
https://clickhouse.yandex/docs/en/getting_started/example_datasets/metrica/

Remove GPL v2 License

Some files (conf.h and time_.h) have GPL v2 License. And it looks like a mistake (since all other files including LICENSE have another license).

If it's a mistake, please, update license text in these files and remove GPL v2 License. It will allow the usage of this library in projects where GPLv2 is incompatible.

Thanks.

in-situ encoding

Given enough buffer space, would in-situ encoding be feasible you think ?

Keep up the great work

MB/s for decoder is misleading

Which shows MB/s of the input data, rather than output.

And you doesn't clarify that anywhere! But clarify that Kb means 1000.

Are you also calculating MB/s for decompressors in terms of input size? So if you compress gigabytes of zeros, then the performance of the decompressor will be around zero (because the input will be extremely small).

If we imagine a base64 modification in which encoding and decoding take the same amount of time, then your benchmark will show that decoding is 1.333 times faster.

So I think it's wrong.

clang needs stdlib included in a couple of files

Hello,

Compilation fails on Apple Clang as a couple of files need to include stdlib.h.

Please add includes to turbob64c.c and turbob64v128.c.

diff --git a/turbob64c.c b/turbob64c.c
index f52be2d..8598f1d 100644
--- a/turbob64c.c
+++ b/turbob64c.c
@@ -24,6 +24,7 @@
 // Turbo-Base64: Scalar encode
 #include "turbob64_.h"
 #include "turbob64.h"
+#include <stdlib.h>
 
 size_t tb64enclen(size_t n) { return TB64ENCLEN(n); }
  
diff --git a/turbob64v128.c b/turbob64v128.c
index b1f2ba0..d0cdc4d 100644
--- a/turbob64v128.c
+++ b/turbob64v128.c
@@ -24,6 +24,7 @@
 //  Turbo-Base64: ssse3 + arm neon functions (see also turbob64v256)
 
 #include <string.h>
+#include <stdlib.h>
 
   #if defined(__AVX__)
 #include <immintrin.h>

invalid UTF-8 bytes after β€œ=" "=="

Hi @powturbo
Thanks for your great work!
Recently, I discovered that when using this library to encode image files, there are some strange characters appearing at the end. Printing them out shows 'NULL' or just some patterns. ( Like ”7mlbMjdKxLobZAOx6jFekoqMbHg==οΏ½#οΏ½οΏ½+ZοΏ½οΏ½8ZοΏ½s)οΏ½οΏ½k_HοΏ½οΏ½οΏ½pdοΏ½?οΏ½οΏ½οΏ½ΤΎ ” "Px/wA7sn4uWWf/AAj/AA3/ALQooor0Yg==NULLNULLNULL"

My code:

std::ifstream ifs(file_path, std::ios::binary);
if (!ifs.is_open()) {
    std::cerr << "Unable to open file: " << file_path << std::endl;
}

ifs.seekg(0, std::ios::end);
auto size = ifs.tellg();
ifs.seekg(0, std::ios::beg);

// Read the file content into a char buffer
auto buf = new unsigned char[size];
ifs.read((char *) buf, size);

//use turbobase64
auto outsize = tb64enclen(size);
auto out = new uint8_t[outsize];

size_t num_enc = tb64enc(buf, size, out); //error handle

out[num_enc] = 0;

std::string str_encode(out, out + num_enc);

std::cout << str_encode << std::endl;

I'm confused. Shouldn't the size of a string converted to Base64 be fixed? Why are there unknown characters appearing

Sample example for Turbo-Base64

Hey, Seems like a great library can you please provide a simple example of how to use this library? I having issues figuring a few things out like do I have to allocate the memory for the output buffer too after decoding, I think a sample simple example will help a lot of people.

Thank you

restartable algo

Hello,

It is often not feasible to prepare the entire input buffer or the entire output buffer prior to a single call of a transformation algo.

For example, look at zlib API. User provides input chunks in succession and output chunks in succession. The entire data to be transformed does not have to be in memory at once.

https://zlib.net/zlib_how.html

Please provide this kind of multi-call API!

Padding correctness

I'm not sure the output is always correct wrt the padding at the end of the generated encoding (though it is still reversible).

char kBase64Chars[] =
    "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

char* out = malloc(turbob64len(64) + 1024);
memset(out, 0, turbob64len(64) + 1024);
turbob64enc(kBase64Chars, 64, out);

The output:
QUJDREVGR0hJSktMTU5PUFFSU1RVVldYWVphYmNkZWZnaGlqa2xtbm9wcXJzdHV2d3h5ejAxMjM0NTY3ODkrLwAAAA==
which is 92 char.

The expected output is:
QUJDREVGR0hJSktMTU5PUFFSU1RVVldYWVphYmNkZWZnaGlqa2xtbm9wcXJzdHV2d3h5ejAxMjM0NTY3ODkrLw==
which is 88 char.

Note also

turbob64len(64) == 88

but will cause an overrun. This seem to be a known issue, since the benchmark over-allocates by 1024, but I'm not seeing any documentation.

can it build by vs2015?

hello!!! I finished base64 encode/decode previously but run a little slow ,i want to use you code in vs2015 , what can i do

Decoding on Apple Mac M1 Pro fails with valid Base64-encoded input

Hi @powturbo ,

I'm using this great lib of yours for decoding BASE64 encoded image data exported from Microsoft Outlook. While things work smoothly with all my test data, I have one image failing to be decoded properly. I have attached the file base64.zip containing a file base64.dat (10880 bytes) for your reference.

When I pass the entire encoded data (of size 10880 bytes) to tb64dec(), it returns 0 (== error) from within tb64v128dec() at line 136: if(!(rc=tb64xdec(ip, inlen&(64-1), op)) || vaddvq_u8(vshrq_n_u8(xv,7))) return 0; //decode all

Reason being: tb64xdec() returns 0 immediately because the argument (inlen&(64-1)) equals 0 (as inlen == 10880) which is checked within that function. I don't think this is an intended behaviour.

If I decode the exact same data in 4 byte BASE64 chunks by repeatedly calling tb64dec() with proper offsets into the data, everything works just fine, so I conclude that a) the input data is correctly BASE64 encoded (I also checked with several BASE64 validitors successfully) and b) the above return code 0 is not correct ( the comment "// decode all" behind "return 0;" also suggests that there may be some different behaviour intended.

I would even argue that whenever you pass data of a size which is a multiple of 64 bytes to the tb64dec() function, it will return 0 instead of the size of the decoded data.

I therefore suggest to change the code as follows:

File turbob64v128.c:
[...]
//BG DEL line 136: if(!(rc=tb64xdec(ip, inlen&(64-1), op)) || vaddvq_u8(vshrq_n_u8(xv,7))) return 0; //decode all
//BG ADD line 136:
size_t rc = 0;
if (inlen&(64-1)) { // if inlen was not a multiple of 64 bytes, there's exactly inlen-(ip-in) == inlen&(64-1) bytes left to decode, but don't call if there's nothing left!
if(!(rc=tb64xdec(ip, inlen&(64-1), op)) || vaddvq_u8(vshrq_n_u8(xv,7)))
return 0; // failure!
}
//BG ADD end
return (op-out)+rc;

Without having looked into it deeper, I think tb64v256dec() probably suffers from the same issue.

Or maybe its all me. Any feedback or comments?

Kind regards,
BjΓΆrn

base64.zip

Benchmark doesn't test the usual scenario

It's great that you can convert gigabytes of data per second, but that's if all those gigabytes are in one long line!

This is a rare scenario. Whereas the usual scenario is processing a large set of small strings (1kb isn't very small). And some people might think that processing a large set of strings will take the same amount of time as if it were one long string, but that would make a huge difference. Try to process random sizes from say 1 to 64 bytes until the N megabytes is reached.

Although prefetching speeds up on big data, for small sizes it fills the cache with data that the application might not need. And worst of all, if the prefetch is done on the unallocated page (which could be next), then an exception will be thrown, and although the OS will handle and ignore this exception, it takes time.

So there also should be a test, where the input addresses are not consecutive, but randomized. (So miss-prefetching will only make worse.)

And the best thing would be a graph showing the cost of each size, starting at 1 until the graph turns into a straight line. How people are doing to show the performance of memcpy. Like this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.