In your expert opinion which of the below would be more faster. <l

Different set of instructions for non AV512 `extract` method for `Vec4f` about version2 HOT 3 CLOSED

Akhil-CM commented on August 16, 2024

Different set of instructions for non AV512 `extract` method for `Vec4f`

from version2.

Comments (3)

AgnerF commented on August 16, 2024 1

Please note that this is not the place to post programming questions. It is better to ask at stackoverflow.com using the tag vector-class-library.

Method 1 is better if index is changing often. Method 2 is better if the index is constant because of better branch prediction in the switch statement.

from version2.

AgnerF commented on August 16, 2024 1

This is not efficient.
_mm_extract_ps has a constant parameter so it needs the switch statement. It returns the result in an integer register so you have the extra cost of moving between different types of registers.
See https://stackoverflow.com/questions/5526658/intel-sse-why-does-mm-extract-ps-return-int-instead-of-float

from version2.

Akhil-CM commented on August 16, 2024

EDIT :-
I missed a more convenient macro below _mm_extract_ps called _MM_EXTRACT_FLOAT which just gets the job done.

Here's the addition to the code below

    // CORRECT USAGE VERSION 3
    std::cout << "Correct usage version 3\n" ;
    float val_float4 ;
    _MM_EXTRACT_FLOAT(val_float4, one2four, 0x01);
    std::cout << "val float : " << val_float4 << '\n';
    std::cout << "val float hex : " << std::hexfloat << val_float4
              << std::defaultfloat << '\n';

Previous :-

@AgnerF
Sorry to bump this issue again but after some testing I found we can use _mm_extract_ps macro present in smmintrin.h header. It returns a 32-bit int with the same bit pattern as the 32-bit float from the lane we specify using the second argument to the macro. So, we need to reinterpret_cast the bits at the int address as a float address to store the value.
There's two ways to do it as below.

Here's a test cpp

#include <x86intrin.h>
#include <iostream>

int main()
{
    __m128 one2four = _mm_setr_ps(1.0f, 2.0f, 3.0f, 4.0f);

    // INCORRECT USAGE VERSION 1
    std::cout << "Incorrect usage version 1\n" ;
    int val_int = _mm_extract_ps(one2four, 0x01);
    std::cout << "val int : " << val_int << '\n';
    std::cout << "val int hex : " << std::hex << val_int << std::dec << '\n';

    // INCORRECT USAGE VERSION 2
    std::cout << "Incorrect usage version 2\n" ;
    float val_float = _mm_extract_ps(one2four, 0x01);
    std::cout << "val float : " << val_float << '\n';
    std::cout << "val float hex : " << std::hexfloat << val_float
              << std::defaultfloat << '\n';

    // CORRECT USAGE VERSION 1
    std::cout << "Correct usage version 1\n" ;
    float val_float2 = reinterpret_cast<float&>(val_int);
    std::cout << "val float : " << val_float2 << '\n';
    std::cout << "val float hex : " << std::hexfloat << val_float2
              << std::defaultfloat << '\n';

    // CORRECT USAGE VERSION 2
    std::cout << "Correct usage version 2\n" ;
    float val_float3;
    reinterpret_cast<int&>(val_float3) = _mm_extract_ps(one2four, 0x01);
    std::cout << "val float : " << val_float3 << '\n';
    std::cout << "val float hex : " << std::hexfloat << val_float3
              << std::defaultfloat << '\n';
}

The output:

Incorrect usage version 1
val int : 1073741824
val int hex : 40000000
Incorrect usage version 2
val float : 1.07374e+09
val float hex : 0x1p+30
Correct usage version 1
val float : 2
val float hex : 0x1p+1
Correct usage version 2
val float : 2
val float hex : 0x1p+1

from version2.

Different set of instructions for non AV512 `extract` method for `Vec4f` about version2 HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent