Git Product home page Git Product logo

cm256's Introduction

cm256

Fast GF(256) Cauchy MDS Block Erasure Codec in C

cm256 is a simple library for erasure codes. From given data it generates redundant data that can be used to recover the originals.

It is roughly 2x faster than Longhair, and CM256 supports input data that is not a multiple of 8 bytes.

Currently only Visual Studio 2013 is supported, though other versions of MSVC may work.

The original data should be split up into equally-sized chunks. If one of these chunks is erased, the redundant data can fill in the gap through decoding.

The erasure code is parameterized by three values (OriginalCount, RecoveryCount, BlockBytes). These are:

  • The number of blocks of original data (OriginalCount), which must be less than 256.
  • The number of blocks of redundant data (RecoveryCount), which must be no more than 256 - OriginalCount.

For example, if a file is split into 3 equal pieces and sent over a network, OriginalCount is 3. And if 2 additional redundant packets are generated, RecoveryCount is 2. In this case up to 256 - 3 = 253 additional redundant packets can be generated.

Building: Quick Setup

Include the cm256.* and gf256.* files in your project and consult the cm256.h header for usage.

Usage

Documentation is provided in the header file cm256.h.

When your application starts up it should call cm256_init() to verify that the library is linked properly:

	#include "cm256.h"

	if (cm256_init()) {
		// Wrong static library
		exit(1);
	}

To generate redundancy, use the cm256_encode function. To solve for the original data use the cm256_decode function.

Example usage:

bool ExampleFileUsage()
{
    if (cm256_init())
    {
        exit(1);
    }

    cm256_encoder_params params;

    // Number of bytes per file block
    params.BlockBytes = 4321;

    // Number of blocks
    params.OriginalCount = 33;

    // Number of additional recovery blocks generated by encoder
    params.RecoveryCount = 12;

    // Size of the original file
    static const int OriginalFileBytes = params.OriginalCount * params.BlockBytes;

    // Allocate and fill the original file data
    uint8_t* originalFileData = new uint8_t[OriginalFileBytes];
    memset(originalFileData, 1, OriginalFileBytes);

    // Pointers to data
    cm256_block blocks[256];
    for (int i = 0; i < params.OriginalCount; ++i)
    {
        blocks[i].Block = originalFileData + i * params.BlockBytes;
    }

    // Recovery data
    uint8_t* recoveryBlocks = new uint8_t[params.RecoveryCount * params.BlockBytes];

    // Generate recovery data
    if (cm256_encode(params, blocks, recoveryBlocks))
    {
        exit(1);
    }

    // Initialize the indices
    for (int i = 0; i < params.OriginalCount; ++i)
    {
        blocks[i].Index = cm256_get_original_block_index(params, i);
    }

    //// Simulate loss of data, subsituting a recovery block in its place ////
    blocks[0].Block = recoveryBlocks; // First recovery block
    blocks[0].Index = cm256_get_recovery_block_index(params, 0); // First recovery block index
    //// Simulate loss of data, subsituting a recovery block in its place ////

    if (cm256_decode(params, blocks))
    {
        exit(1);
    }

    // blocks[0].Index will now be 0.

    delete[] originalFileData;
    delete[] recoveryBlocks;

    return true;
}

The example above is just one way to use the cm256_decode function.

This API was designed to be flexible enough for UDP/IP-based file transfer where the blocks arrive out of order.

Benchmark

CM256 demonstrates similar encoding and (worst case) decoding performance:

Encoder: 1296 bytes k = 100 m = 1 : 5.55886 usec, 23314.1 MBps
Decoder: 1296 bytes k = 100 m = 1 : 6.72915 usec, 19259.5 MBps
Encoder: 1296 bytes k = 100 m = 2 : 17.2617 usec, 7507.93 MBps
Decoder: 1296 bytes k = 100 m = 2 : 19.6023 usec, 6611.46 MBps
Encoder: 1296 bytes k = 100 m = 3 : 30.4275 usec, 4259.31 MBps
Decoder: 1296 bytes k = 100 m = 3 : 32.4755 usec, 3990.7 MBps
Encoder: 1296 bytes k = 100 m = 4 : 40.6675 usec, 3186.82 MBps
Decoder: 1296 bytes k = 100 m = 4 : 43.5932 usec, 2972.94 MBps
Encoder: 1296 bytes k = 100 m = 5 : 51.7852 usec, 2502.64 MBps
Decoder: 1296 bytes k = 100 m = 5 : 51.4926 usec, 2516.86 MBps
Encoder: 1296 bytes k = 100 m = 6 : 62.6104 usec, 2069.94 MBps
Decoder: 1296 bytes k = 100 m = 6 : 64.9509 usec, 1995.35 MBps
Encoder: 1296 bytes k = 100 m = 7 : 76.3612 usec, 1697.2 MBps
Decoder: 1296 bytes k = 100 m = 7 : 75.191 usec, 1723.61 MBps
Encoder: 1296 bytes k = 100 m = 8 : 85.1384 usec, 1522.23 MBps
Decoder: 1296 bytes k = 100 m = 8 : 83.0904 usec, 1559.75 MBps
Encoder: 1296 bytes k = 100 m = 9 : 96.2561 usec, 1346.41 MBps
Decoder: 1296 bytes k = 100 m = 9 : 95.3784 usec, 1358.8 MBps
Encoder: 1296 bytes k = 100 m = 10 : 110.592 usec, 1171.87 MBps
Decoder: 1296 bytes k = 100 m = 10 : 109.714 usec, 1181.25 MBps

Encoder: 1296 bytes k = 100 m = 20 : 223.525 usec, 579.801 MBps
Decoder: 1296 bytes k = 100 m = 20 : 209.481 usec, 618.671 MBps

Encoder: 1296 bytes k = 100 m = 30 : 372.737 usec, 347.699 MBps
Decoder: 1296 bytes k = 100 m = 30 : 322.707 usec, 401.603 MBps

Encoder: 1296 bytes k = 100 m = 40 : 471.626 usec, 274.794 MBps
Decoder: 1296 bytes k = 100 m = 40 : 434.762 usec, 298.094 MBps

Encoder: 1296 bytes k = 100 m = 50 : 592.751 usec, 218.642 MBps
Decoder: 1296 bytes k = 100 m = 50 : 545.939 usec, 237.389 MBps

(These performance numbers are out of date and not well calibrated - Decoding now takes the same time as encoding within a few microseconds thanks to the new matrix solver.)

Longhair Library Results:

Note that I hand-optimized the MemXOR.cpp implementation on this PC to run faster than what is available on github, so this is a fair comparison.

Encoded k=100 data blocks with m=1 recovery blocks in 4.09607 usec : 31640.1 MB/s
+ Decoded 1 erasures in 5.85144 usec : 22148.4 MB/s
Encoded k=100 data blocks with m=2 recovery blocks in 41.5452 usec : 3119.5 MB/s
+ Decoded 2 erasures in 43.5931 usec : 2972.94 MB/s
Encoded k=100 data blocks with m=3 recovery blocks in 80.7498 usec : 1604.96 MB/s
+ Decoded 3 erasures in 86.6013 usec : 1496.51 MB/s
Encoded k=100 data blocks with m=4 recovery blocks in 123.465 usec : 1049.69 MB/s
+ Decoded 4 erasures in 127.854 usec : 1013.66 MB/s
Encoded k=100 data blocks with m=5 recovery blocks in 76.9464 usec : 1684.29 MB/s
+ Decoded 5 erasures in 88.6493 usec : 1461.94 MB/s
Encoded k=100 data blocks with m=6 recovery blocks in 87.7717 usec : 1476.56 MB/s
+ Decoded 6 erasures in 100.352 usec : 1291.45 MB/s
Encoded k=100 data blocks with m=7 recovery blocks in 103.863 usec : 1247.8 MB/s
+ Decoded 7 erasures in 127.269 usec : 1018.32 MB/s
Encoded k=100 data blocks with m=8 recovery blocks in 118.784 usec : 1091.05 MB/s
+ Decoded 8 erasures in 145.701 usec : 889.494 MB/s
Encoded k=100 data blocks with m=9 recovery blocks in 146.871 usec : 882.406 MB/s
+ Decoded 9 erasures in 158.574 usec : 817.284 MB/s
Encoded k=100 data blocks with m=10 recovery blocks in 156.819 usec : 826.433 MB/s
+ Decoded 10 erasures in 181.102 usec : 715.619 MB/s

Encoded k=100 data blocks with m=20 recovery blocks in 282.039 usec : 459.511 MB/s
+ Decoded 20 erasures in 370.103 usec : 350.172 MB/s

Encoded k=100 data blocks with m=30 recovery blocks in 428.618 usec : 302.367 MB/s
+ Decoded 30 erasures in 614.693 usec : 210.837 MB/s

Encoded k=100 data blocks with m=40 recovery blocks in 562.323 usec : 230.472 MB/s
+ Decoded 40 erasures in 855.188 usec : 151.546 MB/s

Encoded k=100 data blocks with m=50 recovery blocks in 727.041 usec : 178.257 MB/s
+ Decoded 50 erasures in 1181.11 usec : 109.727 MB/s

Results Discussion:

For m=1 they are both running the same kind of code, so they're basically the same.

For m=2 and m=3, CM256 is 2.5x faster.

For m=4, CM256 is 3x faster in this case. Longhair could use more tuning. Back when I wrote it, the right time to switch to the Windowed decoder was at m=5, but on my new PC it seems like m=4 is a better time to do it. CM256 only has one mode so it doesn't require any tuning for best performance.

For m=5...30, CM256 performance is not quite 2x faster, maybe 1.7x or so.

For m>30, CM256 is at least 2x faster.

Comparisons with Other Libraries

The approach taken in CM256 is similar to the Intel Storage Acceleration Library (ISA-L) available here:

https://01.org/intel%C2%AE-storage-acceleration-library-open-source-version/downloads

ISA-L more aggressively optimizes the matrix multiplication operation, which is the most expensive step of encoding.

CM256 takes better advantage of the m=1 case and the first recovery symbol, which is also possible with the Vandermonde matrices supported by ISA-L.

ISA-L uses a O(N^3) Gaussian elimination solver for decoding. The CM256 decoder solves the linear system using a fast O(N^2) LDU-decomposition algorithm from "Pivoting and Backward Stability of Fast Algorithms for Solving Cauchy Linear Equations" (T. Boros, T. Kailath, V. Olshevsky), which was hand-optimized for memory accesses.

Credits

This software was written entirely by myself ( Christopher A. Taylor [email protected] ). If you find it useful and would like to buy me a coffee, consider tipping.

cm256's People

Contributors

catid avatar lmiskiew avatar uweigand avatar yselkowitz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cm256's Issues

Project fails to build on VS 2015

Dear Catid,

Recently I've tried to build this project on the latest version of VS 2015. I've noted that in your readme, you've stated that this project only works on the 2013 edition.
I was just wondering if this still works in general since the tests that I tried running in main (in the unit tests folder, disabling the performance tests leaving only the example file usage and the check memswap) failed.

Note that I have not changed anything at all in the source apart from disabling the compilation of the performance tests.

This is the result of the debug output in VS

'matrix_test.exe' (Win32): Loaded 'C:\Windows\SysWOW64\ntdll.dll'. Cannot find or open the PDB file.
'matrix_test.exe' (Win32): Loaded 'C:\Windows\SysWOW64\kernel32.dll'. Cannot find or open the PDB file.
'matrix_test.exe' (Win32): Loaded 'C:\Windows\SysWOW64\KernelBase.dll'. Cannot find or open the PDB file.
'matrix_test.exe' (Win32): Loaded 'C:\Windows\SysWOW64\apphelp.dll'. Cannot find or open the PDB file.
The program '[9996] matrix_test.exe' has exited with code 3 (0x3).

Apologies if this is the wrong place to ask this, I'm new to programming in VS.

Sincerely,
TritonJak

Failing to build on ubuntu: error: inlining failed in call to always_inline

fippo@ubuntu:~$ uname -a
Linux ubuntu 5.8.0-43-generic #49~20.04.1-Ubuntu SMP Fri Feb 5 09:57:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
fippo@mainserver:~/cm_debug/build$ make
Scanning dependencies of target cm_debug
[ 20%] Building CXX object CMakeFiles/cm_debug.dir/main.cpp.o
[ 40%] Building CXX object CMakeFiles/cm_debug.dir/cm256.cpp.o
[ 60%] Building CXX object CMakeFiles/cm_debug.dir/gf256.cpp.o
In file included from /home/fippo/cm_debug/gf256.h:75:0,
                 from /home/fippo/cm_debug/gf256.cpp:30:
/usr/lib/gcc/x86_64-linux-gnu/5/include/tmmintrin.h: In function ‘void gf256_mul_mem(void*, const void*, uint8_t, int)’:
/usr/lib/gcc/x86_64-linux-gnu/5/include/tmmintrin.h:136:1: error: inlining failed in call to always_inline ‘__m128i _mm_shuffle_epi8(__m128i, __m128i)’: target specific option mismatch
 _mm_shuffle_epi8 (__m128i __X, __m128i __Y)
 ^
/home/fippo/cm_debug/gf256.cpp:1197:50: error: called from here
             h0 = _mm_shuffle_epi8(table_hi_y, h0);
                                                  ^
In file included from /home/fippo/cm_debug/gf256.h:75:0,
                 from /home/fippo/cm_debug/gf256.cpp:30:
/usr/lib/gcc/x86_64-linux-gnu/5/include/tmmintrin.h:136:1: error: inlining failed in call to always_inline ‘__m128i _mm_shuffle_epi8(__m128i, __m128i)’: target specific option mismatch
 _mm_shuffle_epi8 (__m128i __X, __m128i __Y)
 ^
/home/fippo/cm_debug/gf256.cpp:1196:50: error: called from here
             l0 = _mm_shuffle_epi8(table_lo_y, l0);
                                                  ^
In file included from /home/fippo/cm_debug/gf256.h:75:0,
                 from /home/fippo/cm_debug/gf256.cpp:30:
/usr/lib/gcc/x86_64-linux-gnu/5/include/tmmintrin.h:136:1: error: inlining failed in call to always_inline ‘__m128i _mm_shuffle_epi8(__m128i, __m128i)’: target specific option mismatch
 _mm_shuffle_epi8 (__m128i __X, __m128i __Y)
 ^
/home/fippo/cm_debug/gf256.cpp:1197:50: error: called from here
             h0 = _mm_shuffle_epi8(table_hi_y, h0);
                                                  ^
In file included from /home/fippo/cm_debug/gf256.h:75:0,
                 from /home/fippo/cm_debug/gf256.cpp:30:
/usr/lib/gcc/x86_64-linux-gnu/5/include/tmmintrin.h:136:1: error: inlining failed in call to always_inline ‘__m128i _mm_shuffle_epi8(__m128i, __m128i)’: target specific option mismatch
 _mm_shuffle_epi8 (__m128i __X, __m128i __Y)
 ^
/home/fippo/cm_debug/gf256.cpp:1196:50: error: called from here
             l0 = _mm_shuffle_epi8(table_lo_y, l0);
                                                  ^
CMakeFiles/cm_debug.dir/build.make:110: recipe for target 'CMakeFiles/cm_debug.dir/gf256.cpp.o' failed
make[2]: *** [CMakeFiles/cm_debug.dir/gf256.cpp.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/cm_debug.dir/all' failed
make[1]: *** [CMakeFiles/cm_debug.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

I suggest to add travis.ci or any other CI system to check builds.

compile error in linux

hi, i am using this library for my application which run on linux.
Does this library support linux? The follow is error ScreenShot:

image

错误 in screenshot means error

thx very much.

install TARGETS given no ARCHIVE DESTINATION for static library target

When building with old cmake - I have following error:

fippo@nanopim4:~/koding/cm256/build$ cmake ..
-- The C compiler identification is GNU 8.3.0
-- The CXX compiler identification is GNU 8.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:47 (install):
  install TARGETS given no ARCHIVE DESTINATION for static library target
  "cm256".


-- Configuring incomplete, errors occurred!
See also "/home/fippo/koding/cm256/build/CMakeFiles/CMakeOutput.log".
fippo@nanopim4:~/koding/cm256/build$ cmake --version
cmake version 3.13.4

sigbus illegal alignment at gf256.cpp gf256_add_mem line 790

HI, I am trying to run this program at android platform.

extern "C" void gf256_add_mem(void * GF256_RESTRICT vx,
                              const void * GF256_RESTRICT vy, int bytes)
{
    GF256_M128 * GF256_RESTRICT x16 = reinterpret_cast<GF256_M128 *>(vx);
    const GF256_M128 * GF256_RESTRICT y16 = reinterpret_cast<const GF256_M128 *>(vy);

#if defined(GF256_TARGET_MOBILE)
# if defined(GF256_TRY_NEON)
    // Handle multiples of 64 bytes
    if (CpuHasNeon)
    {
        while (bytes >= 64)
        {
            GF256_M128 x0 = vld1q_u8((uint8_t*) x16);
            GF256_M128 x1 = vld1q_u8((uint8_t*)(x16 + 1) );
            GF256_M128 x2 = vld1q_u8((uint8_t*)(x16 + 2) );
            GF256_M128 x3 = vld1q_u8((uint8_t*)(x16 + 3) );
            GF256_M128 y0 = vld1q_u8((uint8_t*)y16);
            GF256_M128 y1 = vld1q_u8((uint8_t*)(y16 + 1));
            GF256_M128 y2 = vld1q_u8((uint8_t*)(y16 + 2));
            GF256_M128 y3 = vld1q_u8((uint8_t*)(y16 + 3));

            vst1q_u8((uint8_t*)x16,     veorq_u8(x0, y0));
            vst1q_u8((uint8_t*)(x16 + 1), veorq_u8(x1, y1));
            vst1q_u8((uint8_t*)(x16 + 2), veorq_u8(x2, y2));
            vst1q_u8((uint8_t*)(x16 + 3), veorq_u8(x3, y3));

            bytes -= 64, x16 += 4, y16 += 4;
        }

        // Handle multiples of 16 bytes
        while (bytes >= 16)
        {
            GF256_M128 x0 = vld1q_u8((uint8_t*)x16);
            GF256_M128 y0 = vld1q_u8((uint8_t*)y16);

            vst1q_u8((uint8_t*)x16, veorq_u8(x0, y0));

            bytes -= 16, ++x16, ++y16;
        }
    }
    else
# endif // GF256_TRY_NEON
    {
        uint64_t * GF256_RESTRICT x8 = reinterpret_cast<uint64_t *>(x16);
        const uint64_t * GF256_RESTRICT y8 = reinterpret_cast<const uint64_t *>(y16);

        const unsigned count = (unsigned)bytes / 8;
        for (unsigned ii = 0; ii < count; ++ii)
            x8[ii] ^= y8[ii];

        x16 = reinterpret_cast<GF256_M128 *>(x8 + count);
        y16 = reinterpret_cast<const GF256_M128 *>(y8 + count);

        bytes -= (count * 8);
    }
#else // GF256_TARGET_MOBILE
# if defined(GF256_TRY_AVX2)
    if (CpuHasAVX2)
    {
        GF256_M256 * GF256_RESTRICT x32 = reinterpret_cast<GF256_M256 *>(x16);
        const GF256_M256 * GF256_RESTRICT y32 = reinterpret_cast<const GF256_M256 *>(y16);

        while (bytes >= 128)
        {
            GF256_M256 x0 = _mm256_loadu_si256(x32);
            GF256_M256 y0 = _mm256_loadu_si256(y32);
            x0 = _mm256_xor_si256(x0, y0);
            GF256_M256 x1 = _mm256_loadu_si256(x32 + 1);
            GF256_M256 y1 = _mm256_loadu_si256(y32 + 1);
            x1 = _mm256_xor_si256(x1, y1);
            GF256_M256 x2 = _mm256_loadu_si256(x32 + 2);
            GF256_M256 y2 = _mm256_loadu_si256(y32 + 2);
            x2 = _mm256_xor_si256(x2, y2);
            GF256_M256 x3 = _mm256_loadu_si256(x32 + 3);
            GF256_M256 y3 = _mm256_loadu_si256(y32 + 3);
            x3 = _mm256_xor_si256(x3, y3);

            _mm256_storeu_si256(x32, x0);
            _mm256_storeu_si256(x32 + 1, x1);
            _mm256_storeu_si256(x32 + 2, x2);
            _mm256_storeu_si256(x32 + 3, x3);

            bytes -= 128, x32 += 4, y32 += 4;
        }

        // Handle multiples of 32 bytes
        while (bytes >= 32)
        {
            // x[i] = x[i] xor y[i]
            _mm256_storeu_si256(x32,
                _mm256_xor_si256(
                    _mm256_loadu_si256(x32),
                    _mm256_loadu_si256(y32)));

            bytes -= 32, ++x32, ++y32;
        }

        x16 = reinterpret_cast<GF256_M128 *>(x32);
        y16 = reinterpret_cast<const GF256_M128 *>(y32);
    }
    else
# endif // GF256_TRY_AVX2
    {
        while (bytes >= 64)
        {
            GF256_M128 x0 = _mm_loadu_si128(x16);
            GF256_M128 y0 = _mm_loadu_si128(y16);
            x0 = _mm_xor_si128(x0, y0);
            GF256_M128 x1 = _mm_loadu_si128(x16 + 1);
            GF256_M128 y1 = _mm_loadu_si128(y16 + 1);
            x1 = _mm_xor_si128(x1, y1);
            GF256_M128 x2 = _mm_loadu_si128(x16 + 2);
            GF256_M128 y2 = _mm_loadu_si128(y16 + 2);
            x2 = _mm_xor_si128(x2, y2);
            GF256_M128 x3 = _mm_loadu_si128(x16 + 3);
            GF256_M128 y3 = _mm_loadu_si128(y16 + 3);
            x3 = _mm_xor_si128(x3, y3);

            _mm_storeu_si128(x16, x0);
            _mm_storeu_si128(x16 + 1, x1);
            _mm_storeu_si128(x16 + 2, x2);
            _mm_storeu_si128(x16 + 3, x3);

            bytes -= 64, x16 += 4, y16 += 4;
        }
    }
#endif // GF256_TARGET_MOBILE

#if !defined(GF256_TARGET_MOBILE)
    // Handle multiples of 16 bytes
    while (bytes >= 16)
    {
        // x[i] = x[i] xor y[i]
        _mm_storeu_si128(x16,
            _mm_xor_si128(
                _mm_loadu_si128(x16),
                _mm_loadu_si128(y16)));

        bytes -= 16, ++x16, ++y16;
    }
#endif

    uint8_t * GF256_RESTRICT x1 = reinterpret_cast<uint8_t *>(x16);
    const uint8_t * GF256_RESTRICT y1 = reinterpret_cast<const uint8_t *>(y16);

    // Handle a block of 8 bytes
    const int eight = bytes & 8;
    if (eight)
    {
        uint64_t * GF256_RESTRICT x8 = reinterpret_cast<uint64_t *>(x1);
        const uint64_t * GF256_RESTRICT y8 = reinterpret_cast<const uint64_t *>(y1);
        *x8 ^= *y8;
    }

    // Handle a block of 4 bytes
    const int four = bytes & 4;
    if (four)
    {
        uint32_t * GF256_RESTRICT x4 = reinterpret_cast<uint32_t *>(x1 + eight);
        const uint32_t * GF256_RESTRICT y4 = reinterpret_cast<const uint32_t *>(y1 + eight);
        *x4 ^= *y4;
    }

    // Handle final bytes
    const int offset = eight + four;
    switch (bytes & 3)
    {
    case 3: x1[offset + 2] ^= y1[offset + 2];
    case 2: x1[offset + 1] ^= y1[offset + 1];
    case 1: x1[offset] ^= y1[offset];
    default:
        break;
    }
}

we have define GF256_TARGET_MOBILE and GF256_TRY_NEON, and CpuHasNeon is true.

    // Handle a block of 8 bytes
    const int eight = bytes & 8;
    if (eight)
    {
        uint64_t * GF256_RESTRICT x8 = reinterpret_cast<uint64_t *>(x1);
        const uint64_t * GF256_RESTRICT y8 = reinterpret_cast<const uint64_t *>(y1);
        *x8 ^= *y8;
    }

*x8^=*y8 cause sigbus illegal alignment error. is there some compile arguments must be set?
thanks very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.