Git Product home page Git Product logo

perftest's People

Contributors

sebbbi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

perftest's Issues

RX 5700 XT Results

New Radeon, new results!

PerfTest
To select adapter, use: PerfTest.exe [ADAPTER_INDEX]

Adapters found:
0: AMD Radeon RX 5700 XT
1: NVIDIA GeForce GTX 1080
2: Microsoft Basic Render Driver
Using adapter 0

Running 30 warm-up frames and 30 benchmark frames:
.............................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Performance compared to Buffer.Load random

Buffer.Load uniform: 5.289ms 2.385x
Buffer.Load linear: 4.874ms 2.588x
Buffer.Load random: 4.656ms 2.710x
Buffer.Load uniform: 5.986ms 2.108x
Buffer.Load linear: 6.514ms 1.937x
Buffer.Load random: 6.115ms 2.063x
Buffer.Load uniform: 12.519ms 1.008x
Buffer.Load linear: 12.985ms 0.972x
Buffer.Load random: 12.617ms 1.000x
Buffer.Load uniform: 4.769ms 2.645x
Buffer.Load linear: 4.599ms 2.744x
Buffer.Load random: 4.687ms 2.692x
Buffer.Load uniform: 6.210ms 2.032x
Buffer.Load linear: 6.164ms 2.047x
Buffer.Load random: 6.170ms 2.045x
Buffer.Load uniform: 12.838ms 0.983x
Buffer.Load linear: 13.138ms 0.960x
Buffer.Load random: 12.725ms 0.991x
Buffer.Load uniform: 4.818ms 2.619x
Buffer.Load linear: 4.697ms 2.686x
Buffer.Load random: 4.771ms 2.644x
Buffer.Load uniform: 6.295ms 2.004x
Buffer.Load linear: 6.223ms 2.027x
Buffer.Load random: 6.217ms 2.029x
Buffer.Load uniform: 13.099ms 0.963x
Buffer.Load linear: 13.312ms 0.948x
Buffer.Load random: 12.819ms 0.984x
ByteAddressBuffer.Load uniform: 7.299ms 1.728x
ByteAddressBuffer.Load linear: 6.361ms 1.983x
ByteAddressBuffer.Load random: 6.279ms 2.009x
ByteAddressBuffer.Load2 uniform: 6.913ms 1.825x
ByteAddressBuffer.Load2 linear: 9.648ms 1.308x
ByteAddressBuffer.Load2 random: 9.693ms 1.302x
ByteAddressBuffer.Load3 uniform: 9.650ms 1.307x
ByteAddressBuffer.Load3 linear: 13.069ms 0.965x
ByteAddressBuffer.Load3 random: 26.009ms 0.485x
ByteAddressBuffer.Load4 uniform: 12.956ms 0.974x
ByteAddressBuffer.Load4 linear: 16.076ms 0.785x
ByteAddressBuffer.Load4 random: 16.332ms 0.773x
ByteAddressBuffer.Load2 unaligned uniform: 7.340ms 1.719x
ByteAddressBuffer.Load2 unaligned linear: 12.697ms 0.994x
ByteAddressBuffer.Load2 unaligned random: 12.598ms 1.001x
ByteAddressBuffer.Load4 unaligned uniform: 13.019ms 0.969x
ByteAddressBuffer.Load4 unaligned linear: 19.027ms 0.663x
ByteAddressBuffer.Load4 unaligned random: 25.387ms 0.497x
StructuredBuffer.Load uniform: 9.047ms 1.395x
StructuredBuffer.Load linear: 5.461ms 2.310x
StructuredBuffer.Load random: 4.722ms 2.672x
StructuredBuffer.Load uniform: 8.770ms 1.439x
StructuredBuffer.Load linear: 6.795ms 1.857x
StructuredBuffer.Load random: 6.074ms 2.077x
StructuredBuffer.Load uniform: 9.013ms 1.400x
StructuredBuffer.Load linear: 12.948ms 0.974x
StructuredBuffer.Load random: 12.428ms 1.015x
cbuffer{float4} load uniform: 9.561ms 1.320x
cbuffer{float4} load linear: 13.446ms 0.938x
cbuffer{float4} load random: 12.445ms 1.014x
Texture2D.Load uniform: 6.537ms 1.930x
Texture2D.Load linear: 6.652ms 1.897x
Texture2D.Load random: 6.474ms 1.949x
Texture2D.Load uniform: 6.652ms 1.897x
Texture2D.Load linear: 6.606ms 1.910x
Texture2D.Load random: 6.644ms 1.899x
Texture2D.Load uniform: 12.992ms 0.971x
Texture2D.Load linear: 13.012ms 0.970x
Texture2D.Load random: 12.877ms 0.980x
Texture2D.Load uniform: 6.655ms 1.896x
Texture2D.Load linear: 6.596ms 1.913x
Texture2D.Load random: 6.476ms 1.948x
Texture2D.Load uniform: 6.612ms 1.908x
Texture2D.Load linear: 6.697ms 1.884x
Texture2D.Load random: 6.436ms 1.960x
Texture2D.Load uniform: 12.956ms 0.974x
Texture2D.Load linear: 12.988ms 0.971x
Texture2D.Load random: 12.856ms 0.981x
Texture2D.Load uniform: 6.651ms 1.897x
Texture2D.Load linear: 6.732ms 1.874x
Texture2D.Load random: 6.469ms 1.950x
Texture2D.Load uniform: 6.627ms 1.904x
Texture2D.Load linear: 12.954ms 0.974x
Texture2D.Load random: 6.450ms 1.956x
Texture2D.Load uniform: 12.949ms 0.974x
Texture2D.Load linear: 12.953ms 0.974x
Texture2D.Load random: 12.804ms 0.985x

Radeon VII results

Not sure how much value there is since you already have Vega results, but here you go just in case.

PerfTest
To select adapter, use: PerfTest.exe [ADAPTER_INDEX]

Adapters found:
0: AMD Radeon VII
1: NVIDIA GeForce GTX 1080
2: Microsoft Basic Render Driver
Using adapter 0

Running 30 warm-up frames and 30 benchmark frames:
.............................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Performance compared to Buffer.Load random

Buffer.Load uniform: 5.214ms 3.667x
Buffer.Load linear: 5.332ms 3.586x
Buffer.Load random: 18.861ms 1.014x
Buffer.Load uniform: 18.917ms 1.011x
Buffer.Load linear: 18.904ms 1.011x
Buffer.Load random: 18.885ms 1.012x
Buffer.Load uniform: 18.882ms 1.013x
Buffer.Load linear: 19.074ms 1.002x
Buffer.Load random: 19.119ms 1.000x
Buffer.Load uniform: 5.335ms 3.584x
Buffer.Load linear: 5.547ms 3.447x
Buffer.Load random: 18.872ms 1.013x
Buffer.Load uniform: 19.080ms 1.002x
Buffer.Load linear: 18.911ms 1.011x
Buffer.Load random: 18.996ms 1.007x
Buffer.Load uniform: 18.879ms 1.013x
Buffer.Load linear: 19.340ms 0.989x
Buffer.Load random: 18.985ms 1.007x
Buffer.Load uniform: 5.337ms 3.582x
Buffer.Load linear: 5.548ms 3.446x
Buffer.Load random: 18.873ms 1.013x
Buffer.Load uniform: 19.130ms 0.999x
Buffer.Load linear: 18.934ms 1.010x
Buffer.Load random: 19.100ms 1.001x
Buffer.Load uniform: 18.880ms 1.013x
Buffer.Load linear: 19.383ms 0.986x
Buffer.Load random: 21.310ms 0.897x
ByteAddressBuffer.Load uniform: 4.285ms 4.462x
ByteAddressBuffer.Load linear: 5.542ms 3.450x
ByteAddressBuffer.Load random: 18.869ms 1.013x
ByteAddressBuffer.Load2 uniform: 5.209ms 3.671x
ByteAddressBuffer.Load2 linear: 19.266ms 0.992x
ByteAddressBuffer.Load2 random: 19.005ms 1.006x
ByteAddressBuffer.Load3 uniform: 7.454ms 2.565x
ByteAddressBuffer.Load3 linear: 19.190ms 0.996x
ByteAddressBuffer.Load3 random: 37.705ms 0.507x
ByteAddressBuffer.Load4 uniform: 9.604ms 1.991x
ByteAddressBuffer.Load4 linear: 19.455ms 0.983x
ByteAddressBuffer.Load4 random: 21.360ms 0.895x
ByteAddressBuffer.Load2 unaligned uniform: 5.083ms 3.761x
ByteAddressBuffer.Load2 unaligned linear: 19.190ms 0.996x
ByteAddressBuffer.Load2 unaligned random: 18.920ms 1.011x
ByteAddressBuffer.Load4 unaligned uniform: 9.600ms 1.992x
ByteAddressBuffer.Load4 unaligned linear: 19.234ms 0.994x
ByteAddressBuffer.Load4 unaligned random: 23.485ms 0.814x
StructuredBuffer.Load uniform: 5.360ms 3.567x
StructuredBuffer.Load linear: 5.335ms 3.584x
StructuredBuffer.Load random: 18.879ms 1.013x
StructuredBuffer.Load uniform: 5.494ms 3.480x
StructuredBuffer.Load linear: 18.943ms 1.009x
StructuredBuffer.Load random: 18.898ms 1.012x
StructuredBuffer.Load uniform: 5.576ms 3.429x
StructuredBuffer.Load linear: 19.237ms 0.994x
StructuredBuffer.Load random: 21.314ms 0.897x
cbuffer{float4} load uniform: 6.819ms 2.804x
cbuffer{float4} load linear: 19.731ms 0.969x
cbuffer{float4} load random: 21.368ms 0.895x
Texture2D.Load uniform: 18.917ms 1.011x
Texture2D.Load linear: 19.067ms 1.003x
Texture2D.Load random: 18.925ms 1.010x
Texture2D.Load uniform: 18.902ms 1.011x
Texture2D.Load linear: 18.952ms 1.009x
Texture2D.Load random: 18.888ms 1.012x
Texture2D.Load uniform: 19.000ms 1.006x
Texture2D.Load linear: 19.137ms 0.999x
Texture2D.Load random: 18.965ms 1.008x
Texture2D.Load uniform: 18.919ms 1.011x
Texture2D.Load linear: 18.936ms 1.010x
Texture2D.Load random: 19.034ms 1.004x
Texture2D.Load uniform: 18.910ms 1.011x
Texture2D.Load linear: 18.971ms 1.008x
Texture2D.Load random: 18.895ms 1.012x
Texture2D.Load uniform: 18.918ms 1.011x
Texture2D.Load linear: 19.094ms 1.001x
Texture2D.Load random: 37.952ms 0.504x
Texture2D.Load uniform: 18.904ms 1.011x
Texture2D.Load linear: 19.040ms 1.004x
Texture2D.Load random: 19.053ms 1.003x
Texture2D.Load uniform: 18.942ms 1.009x
Texture2D.Load linear: 18.999ms 1.006x
Texture2D.Load random: 37.708ms 0.507x
Texture2D.Load uniform: 18.919ms 1.011x
Texture2D.Load linear: 28.305ms 0.675x
Texture2D.Load random: 37.705ms 0.507x

Some performance bottlenecks for UAV loads

Thanks for useful tool.
I added support for UAV loads in fork https://github.com/ash3D/perftest/tree/UAV_load (branch UAV_load). The results turned out to be somewhat slower than SRV on NVIDIA Kepler (GeForce GTX 760M). Previously I obtained higher performance with UAV compared to SRV under certain conditions in similar benchmark. So I started to experiment with shaders and eventually came up to about 2X speedup. The things I tried:

  • Loop unrolling.
    This improved 1d and 2d raw buffer loads but significantly worsened 3d and 4d loads on Kepler. Unrolling typed UAV buffer and texture loads resulted in crashes during benchmark execution on Kepler (but Intel worked well).
    I also tried partial loop unrolling. This eliminated big slowdown for 3d/4d loads on Kepler but in general partial unrolling performance was closer to original one (without unrolling), often slower. Different unroll factors worked best for different conditions (load width, access pattern) in somewhat unpredicted way.

  • Loop iteration count reduction.
    Big 3d/4d loads slowdown on Kepler with unrolled loop suggested to reduce iteration count. This unexpectedly also improved 1d/2d performance (2X scaledown led to >2X performance). Even more unexpectedly it improved performance of original loop without unrolling.
    Such behavior was detected on Kepler only. I tested Intel and Fermi a little bit and found mostly linear performance scaling there.

  • Remove read start address and address mask.
    Reading start address from cbuffer (used for unaligned tests) harmed UAV performance even if value is 0. Address mask which is intended to prevent compiler from merging multiple narrow loads also affected wide loads performance. It seems than NVIDIA GPUs perform wide raw buffer loads sequentially anyway so performance gains from removing address mask here apparently comes from something other. There are other places though where address mask apparently actually prevents narrow loads merging on Kepler (e.g. scalar 8/16/32 bit texture SRV loads).
    Removing address mask also fixed big slowdown for 3d/4d loads on Kepler with unrolled loop.
    I experimented with other address mask application - &= ~mask instead of |= mask. It unexpectedly improved performance. In some specific cases performance oddly became better than even without mask at all.

The modifications I mentioned also affected SRV performance in some extent but UAV performance was much more sensitive.

The results ultimately became close to expected theoretical peak rates of Kepler architecture. NVIDIA GPUs implements SRV loads in read-only TMU pipeline thus performance is different compared to CUDA which uses read/write LSU pipeline. It also differs significantly from AMD GCN. All of the 4 32-bit fetch units used for bilinear texture sampling can be utilized for buffer accesses in GCN (for wide loads/stores or coalesced 1d access). NVIDIA TMU fetch units are 64-bit beginning with Fermi (it able to filter 64-bit RGBA16F textures at full rate) but apparently only 1 of 4 used for buffer reads. I have observed similar behavior before with GT200 except its' fetch units are 32-bit.
UAV accesses are served by LSU pipeline on NVIDIA GPUs. Kepler has 2:1 LD/ST to TMU ratio but UAV cached in L2 only. Initially UAV loads was slower then SRV in the benchmark but after shaders modifications I described above UAV performance became faster then SRV for invariant loads. Ratio is still not 2X but close to it. Linear and random UAV load performance varied in wide range (probably due to increased L2 access rate) and can be much faster or slower then SRV in different cases. SRV performance is very stable (it is the same for invariant/linear/random reads).
I also tested NVIDIA Fermi2 (GeForce GTX 460) a little. Fermi has L1 cache for LSU pipeline (combined with shared memory) thus UAV performance appeared to be better. Invariant UAV reads are 2X faster then SRV ones. Linear and random UAV read performance still not as stable as SRV one but much better then on Kepler. Also Fermi is not subject to big performance drop for 3d/4d UAV raw loads with long unrolled loop.

SRV Load on NVidia

Hi, I was just testing and found the results very strange with Loads aligned on Pascal GPU.

Then I just looked to the code and see that you added multiple operation that will not ensure that your loads can't be aligned in your align test case.

Performance of Byteaddressbuffer can be similar to structured buffer if you add, that will ensure that the load address are correctly aligned in the shader :

const uint _WIDTH = LOAD_WIDTH * 4;
address = (address / _WIDTH) * _WIDTH;

New Titan V Results

PerfTest
To select adapter, use: PerfTest.exe [ADAPTER_INDEX]

Adapters found:
0: NVIDIA TITAN V
1: Intel(R) HD Graphics 530
2: Microsoft Basic Render Driver
Using adapter 0

Running 30 warm-up frames and 30 benchmark frames:
.............................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Performance compared to Buffer[RGBA8].Load random

Buffer[R8].Load uniform: 2.241ms 8.139x
Buffer[R8].Load linear: 14.806ms 1.232x
Buffer[R8].Load random: 16.514ms 1.104x
Buffer[RG8].Load uniform: 4.576ms 3.985x
Buffer[RG8].Load linear: 16.397ms 1.112x
Buffer[RG8].Load random: 16.707ms 1.092x
Buffer[RGBA8].Load uniform: 5.155ms 3.538x
Buffer[RGBA8].Load linear: 16.726ms 1.090x
Buffer[RGBA8].Load random: 18.236ms 1.000x
Buffer[R16f].Load uniform: 2.807ms 6.497x
Buffer[R16f].Load linear: 14.771ms 1.235x
Buffer[R16f].Load random: 16.857ms 1.082x
Buffer[RG16f].Load uniform: 4.128ms 4.418x
Buffer[RG16f].Load linear: 16.155ms 1.129x
Buffer[RG16f].Load random: 15.140ms 1.205x
Buffer[RGBA16f].Load uniform: 4.747ms 3.841x
Buffer[RGBA16f].Load linear: 17.517ms 1.041x
Buffer[RGBA16f].Load random: 17.727ms 1.029x
Buffer[R32f].Load uniform: 2.630ms 6.935x
Buffer[R32f].Load linear: 17.341ms 1.052x
Buffer[R32f].Load random: 15.922ms 1.145x
Buffer[RG32f].Load uniform: 4.769ms 3.824x
Buffer[RG32f].Load linear: 15.745ms 1.158x
Buffer[RG32f].Load random: 15.801ms 1.154x
Buffer[RGBA32f].Load uniform: 4.772ms 3.822x
Buffer[RGBA32f].Load linear: 29.343ms 0.621x
Buffer[RGBA32f].Load random: 29.427ms 0.620x
ByteAddressBuffer.Load uniform: 8.948ms 2.038x
ByteAddressBuffer.Load linear: 8.722ms 2.091x
ByteAddressBuffer.Load random: 10.403ms 1.753x
ByteAddressBuffer.Load2 uniform: 10.132ms 1.800x
ByteAddressBuffer.Load2 linear: 11.406ms 1.599x
ByteAddressBuffer.Load2 random: 10.999ms 1.658x
ByteAddressBuffer.Load3 uniform: 12.638ms 1.443x
ByteAddressBuffer.Load3 linear: 13.708ms 1.330x
ByteAddressBuffer.Load3 random: 14.081ms 1.295x
ByteAddressBuffer.Load4 uniform: 15.421ms 1.183x
ByteAddressBuffer.Load4 linear: 26.412ms 0.690x
ByteAddressBuffer.Load4 random: 18.078ms 1.009x
ByteAddressBuffer.Load2 unaligned uniform: 11.076ms 1.647x
ByteAddressBuffer.Load2 unaligned linear: 11.474ms 1.589x
ByteAddressBuffer.Load2 unaligned random: 12.227ms 1.492x
ByteAddressBuffer.Load4 unaligned uniform: 15.817ms 1.153x
ByteAddressBuffer.Load4 unaligned linear: 25.894ms 0.704x
ByteAddressBuffer.Load4 unaligned random: 18.138ms 1.005x
StructuredBuffer[float].Load uniform: 6.606ms 2.761x
StructuredBuffer[float].Load linear: 6.555ms 2.782x
StructuredBuffer[float].Load random: 9.063ms 2.012x
StructuredBuffer[float2].Load uniform: 8.332ms 2.189x
StructuredBuffer[float2].Load linear: 8.545ms 2.134x
StructuredBuffer[float2].Load random: 7.271ms 2.508x
StructuredBuffer[float4].Load uniform: 8.890ms 2.051x
StructuredBuffer[float4].Load linear: 9.650ms 1.890x
StructuredBuffer[float4].Load random: 9.677ms 1.885x
cbuffer{float4} load uniform: 1.381ms 13.202x
cbuffer{float4} load linear: 320.961ms 0.057x
cbuffer{float4} load random: 150.072ms 0.122x
Texture2D[R8].Load uniform: 4.481ms 4.070x
Texture2D[R8].Load linear: 15.953ms 1.143x
Texture2D[R8].Load random: 15.058ms 1.211x
Texture2D[RG8].Load uniform: 4.594ms 3.970x
Texture2D[RG8].Load linear: 14.838ms 1.229x
Texture2D[RG8].Load random: 14.938ms 1.221x
Texture2D[RGBA8].Load uniform: 5.140ms 3.548x
Texture2D[RGBA8].Load linear: 14.915ms 1.223x
Texture2D[RGBA8].Load random: 15.031ms 1.213x
Texture2D[R16F].Load uniform: 5.748ms 3.173x
Texture2D[R16F].Load linear: 15.321ms 1.190x
Texture2D[R16F].Load random: 15.044ms 1.212x
Texture2D[RG16F].Load uniform: 4.609ms 3.957x
Texture2D[RG16F].Load linear: 14.918ms 1.222x
Texture2D[RG16F].Load random: 14.851ms 1.228x
Texture2D[RGBA16F].Load uniform: 5.182ms 3.519x
Texture2D[RGBA16F].Load linear: 14.915ms 1.223x
Texture2D[RGBA16F].Load random: 29.841ms 0.611x
Texture2D[R32F].Load uniform: 4.462ms 4.087x
Texture2D[R32F].Load linear: 15.615ms 1.168x
Texture2D[R32F].Load random: 15.519ms 1.175x
Texture2D[RG32F].Load uniform: 4.585ms 3.977x
Texture2D[RG32F].Load linear: 16.651ms 1.095x
Texture2D[RG32F].Load random: 29.710ms 0.614x
Texture2D[RGBA32F].Load uniform: 5.163ms 3.532x
Texture2D[RGBA32F].Load linear: 29.970ms 0.608x
Texture2D[RGBA32F].Load random: 29.358ms 0.621x

RTX 2070 results

Adapters found:
0: NVIDIA GeForce RTX 2070
1: Microsoft Basic Render Driver
Using adapter 0

Running 30 warm-up frames and 30 benchmark frames:
.............................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Performance compared to Buffer.Load random

Buffer.Load uniform: 0.568ms 54.832x
Buffer.Load linear: 29.206ms 1.066x
Buffer.Load random: 28.891ms 1.077x
Buffer.Load uniform: 0.651ms 47.837x
Buffer.Load linear: 29.791ms 1.045x
Buffer.Load random: 29.499ms 1.055x
Buffer.Load uniform: 0.877ms 35.482x
Buffer.Load linear: 30.076ms 1.035x
Buffer.Load random: 31.125ms 1.000x
Buffer.Load uniform: 0.563ms 55.315x
Buffer.Load linear: 29.190ms 1.066x
Buffer.Load random: 29.181ms 1.067x
Buffer.Load uniform: 0.654ms 47.616x
Buffer.Load linear: 29.798ms 1.045x
Buffer.Load random: 28.903ms 1.077x
Buffer.Load uniform: 0.875ms 35.560x
Buffer.Load linear: 30.365ms 1.025x
Buffer.Load random: 30.858ms 1.009x
Buffer.Load uniform: 0.562ms 55.394x
Buffer.Load linear: 28.885ms 1.078x
Buffer.Load random: 29.487ms 1.056x
Buffer.Load uniform: 0.653ms 47.630x
Buffer.Load linear: 29.825ms 1.044x
Buffer.Load random: 28.902ms 1.077x
Buffer.Load uniform: 0.882ms 35.277x
Buffer.Load linear: 58.835ms 0.529x
Buffer.Load random: 57.622ms 0.540x
ByteAddressBuffer.Load uniform: 12.099ms 2.572x
ByteAddressBuffer.Load linear: 11.738ms 2.652x
ByteAddressBuffer.Load random: 12.959ms 2.402x
ByteAddressBuffer.Load2 uniform: 17.736ms 1.755x
ByteAddressBuffer.Load2 linear: 31.694ms 0.982x
ByteAddressBuffer.Load2 random: 18.101ms 1.720x
ByteAddressBuffer.Load3 uniform: 25.636ms 1.214x
ByteAddressBuffer.Load3 linear: 33.305ms 0.935x
ByteAddressBuffer.Load3 random: 27.643ms 1.126x
ByteAddressBuffer.Load4 uniform: 34.129ms 0.912x
ByteAddressBuffer.Load4 linear: 77.987ms 0.399x
ByteAddressBuffer.Load4 random: 60.233ms 0.517x
ByteAddressBuffer.Load2 unaligned uniform: 17.765ms 1.752x
ByteAddressBuffer.Load2 unaligned linear: 31.707ms 0.982x
ByteAddressBuffer.Load2 unaligned random: 18.088ms 1.721x
ByteAddressBuffer.Load4 unaligned uniform: 34.121ms 0.912x
ByteAddressBuffer.Load4 unaligned linear: 77.908ms 0.400x
ByteAddressBuffer.Load4 unaligned random: 60.213ms 0.517x
StructuredBuffer.Load uniform: 12.172ms 2.557x
StructuredBuffer.Load linear: 12.500ms 2.490x
StructuredBuffer.Load random: 12.900ms 2.413x
StructuredBuffer.Load uniform: 15.039ms 2.070x
StructuredBuffer.Load linear: 31.650ms 0.983x
StructuredBuffer.Load random: 15.642ms 1.990x
StructuredBuffer.Load uniform: 29.074ms 1.071x
StructuredBuffer.Load linear: 38.141ms 0.816x
StructuredBuffer.Load random: 40.840ms 0.762x
cbuffer{float4} load uniform: 0.852ms 36.535x
cbuffer{float4} load linear: 636.771ms 0.049x
cbuffer{float4} load random: 230.501ms 0.135x
Texture2D.Load uniform: 0.604ms 51.502x
Texture2D.Load linear: 28.886ms 1.078x
Texture2D.Load random: 28.868ms 1.078x
Texture2D.Load uniform: 0.735ms 42.319x
Texture2D.Load linear: 28.890ms 1.077x
Texture2D.Load random: 28.874ms 1.078x
Texture2D.Load uniform: 1.160ms 26.822x
Texture2D.Load linear: 29.416ms 1.058x
Texture2D.Load random: 29.518ms 1.054x
Texture2D.Load uniform: 0.585ms 53.250x
Texture2D.Load linear: 28.875ms 1.078x
Texture2D.Load random: 28.870ms 1.078x
Texture2D.Load uniform: 0.735ms 42.343x
Texture2D.Load linear: 28.891ms 1.077x
Texture2D.Load random: 28.871ms 1.078x
Texture2D.Load uniform: 1.163ms 26.766x
Texture2D.Load linear: 29.425ms 1.058x
Texture2D.Load random: 57.604ms 0.540x
Texture2D.Load uniform: 0.587ms 53.061x
Texture2D.Load linear: 28.881ms 1.078x
Texture2D.Load random: 28.869ms 1.078x
Texture2D.Load uniform: 0.734ms 42.400x
Texture2D.Load linear: 28.887ms 1.077x
Texture2D.Load random: 57.585ms 0.541x
Texture2D.Load uniform: 1.167ms 26.670x
Texture2D.Load linear: 57.609ms 0.540x
Texture2D.Load random: 57.596ms 0.540x

RTX 2080 results

C:\Code\perftest\perftest> ..\x64\Release\perftest.exe
PerfTest
To select adapter, use: PerfTest.exe [ADAPTER_INDEX]

Adapters found:
0: NVIDIA GeForce RTX 2080
1: Microsoft Basic Render Driver
Using adapter 0

Running 30 warm-up frames and 30 benchmark frames:
.............................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Performance compared to Buffer.Load random

Buffer.Load uniform: 0.584ms 40.050x
Buffer.Load linear: 22.632ms 1.033x
Buffer.Load random: 22.652ms 1.032x
Buffer.Load uniform: 0.702ms 33.313x
Buffer.Load linear: 22.656ms 1.032x
Buffer.Load random: 24.384ms 0.959x
Buffer.Load uniform: 1.021ms 22.896x
Buffer.Load linear: 23.277ms 1.004x
Buffer.Load random: 23.380ms 1.000x
Buffer.Load uniform: 0.575ms 40.642x
Buffer.Load linear: 23.506ms 0.995x
Buffer.Load random: 23.431ms 0.998x
Buffer.Load uniform: 0.701ms 33.344x
Buffer.Load linear: 23.438ms 0.998x
Buffer.Load random: 24.307ms 0.962x
Buffer.Load uniform: 0.998ms 23.432x
Buffer.Load linear: 24.054ms 0.972x
Buffer.Load random: 24.884ms 0.940x
Buffer.Load uniform: 0.575ms 40.681x
Buffer.Load linear: 22.650ms 1.032x
Buffer.Load random: 26.428ms 0.885x
Buffer.Load uniform: 0.712ms 32.833x
Buffer.Load linear: 22.647ms 1.032x
Buffer.Load random: 24.314ms 0.962x
Buffer.Load uniform: 1.004ms 23.282x
Buffer.Load linear: 48.893ms 0.478x
Buffer.Load random: 45.172ms 0.518x
ByteAddressBuffer.Load uniform: 10.761ms 2.173x
ByteAddressBuffer.Load linear: 10.822ms 2.160x
ByteAddressBuffer.Load random: 10.418ms 2.244x
ByteAddressBuffer.Load2 uniform: 13.868ms 1.686x
ByteAddressBuffer.Load2 linear: 25.584ms 0.914x
ByteAddressBuffer.Load2 random: 14.147ms 1.653x
ByteAddressBuffer.Load3 uniform: 20.011ms 1.168x
ByteAddressBuffer.Load3 linear: 26.011ms 0.899x
ByteAddressBuffer.Load3 random: 21.576ms 1.084x
ByteAddressBuffer.Load4 uniform: 27.127ms 0.862x
ByteAddressBuffer.Load4 linear: 64.374ms 0.363x
ByteAddressBuffer.Load4 random: 54.111ms 0.432x
ByteAddressBuffer.Load2 unaligned uniform: 20.559ms 1.137x
ByteAddressBuffer.Load2 unaligned linear: 32.326ms 0.723x
ByteAddressBuffer.Load2 unaligned random: 14.630ms 1.598x
ByteAddressBuffer.Load4 unaligned uniform: 30.289ms 0.772x
ByteAddressBuffer.Load4 unaligned linear: 61.641ms 0.379x
ByteAddressBuffer.Load4 unaligned random: 53.058ms 0.441x
StructuredBuffer.Load uniform: 13.206ms 1.770x
StructuredBuffer.Load linear: 12.844ms 1.820x
StructuredBuffer.Load random: 10.077ms 2.320x
StructuredBuffer.Load uniform: 11.750ms 1.990x
StructuredBuffer.Load linear: 24.699ms 0.947x
StructuredBuffer.Load random: 15.797ms 1.480x
StructuredBuffer.Load uniform: 23.440ms 0.997x
StructuredBuffer.Load linear: 36.338ms 0.643x
StructuredBuffer.Load random: 35.524ms 0.658x
cbuffer{float4} load uniform: 0.906ms 25.802x
cbuffer{float4} load linear: 541.480ms 0.043x
cbuffer{float4} load random: 195.860ms 0.119x
Texture2D.Load uniform: 0.670ms 34.905x
Texture2D.Load linear: 23.513ms 0.994x
Texture2D.Load random: 23.501ms 0.995x
Texture2D.Load uniform: 0.831ms 28.125x
Texture2D.Load linear: 22.675ms 1.031x
Texture2D.Load random: 24.535ms 0.953x
Texture2D.Load uniform: 1.374ms 17.016x
Texture2D.Load linear: 23.982ms 0.975x
Texture2D.Load random: 24.520ms 0.953x
Texture2D.Load uniform: 0.632ms 36.995x
Texture2D.Load linear: 22.657ms 1.032x
Texture2D.Load random: 24.281ms 0.963x
Texture2D.Load uniform: 0.822ms 28.430x
Texture2D.Load linear: 23.423ms 0.998x
Texture2D.Load random: 22.666ms 1.032x
Texture2D.Load uniform: 1.363ms 17.155x
Texture2D.Load linear: 23.045ms 1.015x
Texture2D.Load random: 46.803ms 0.500x
Texture2D.Load uniform: 4.187ms 5.583x
Texture2D.Load linear: 27.111ms 0.862x
Texture2D.Load random: 23.073ms 1.013x
Texture2D.Load uniform: 1.543ms 15.151x
Texture2D.Load linear: 25.248ms 0.926x
Texture2D.Load random: 45.187ms 0.517x
Texture2D.Load uniform: 1.376ms 16.986x
Texture2D.Load linear: 46.115ms 0.507x
Texture2D.Load random: 45.166ms 0.518x
PS C:\Code\perftest\perftest>

GTX 980 Ti Results

PerfTest
To select adapter, use: PerfTest.exe [ADAPTER_INDEX]

Adapters found:
0: NVIDIA GeForce GTX 980 Ti
1: Microsoft Basic Render Driver
Using adapter 0

Running 30 warm-up frames and 30 benchmark frames:
.............................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Performance compared to Buffer.Load random

Buffer.Load uniform: 1.249ms 28.812x
Buffer.Load linear: 34.105ms 1.055x
Buffer.Load random: 34.187ms 1.053x
Buffer.Load uniform: 1.847ms 19.485x
Buffer.Load linear: 34.106ms 1.055x
Buffer.Load random: 34.477ms 1.044x
Buffer.Load uniform: 2.452ms 14.680x
Buffer.Load linear: 35.773ms 1.006x
Buffer.Load random: 35.996ms 1.000x
Buffer.Load uniform: 1.491ms 24.148x
Buffer.Load linear: 34.077ms 1.056x
Buffer.Load random: 34.463ms 1.044x
Buffer.Load uniform: 1.916ms 18.785x
Buffer.Load linear: 34.229ms 1.052x
Buffer.Load random: 34.597ms 1.040x
Buffer.Load uniform: 2.519ms 14.291x
Buffer.Load linear: 35.787ms 1.006x
Buffer.Load random: 35.996ms 1.000x
Buffer.Load uniform: 1.478ms 24.350x
Buffer.Load linear: 34.098ms 1.056x
Buffer.Load random: 34.353ms 1.048x
Buffer.Load uniform: 1.845ms 19.514x
Buffer.Load linear: 34.138ms 1.054x
Buffer.Load random: 34.495ms 1.044x
Buffer.Load uniform: 2.374ms 15.163x
Buffer.Load linear: 67.973ms 0.530x
Buffer.Load random: 68.054ms 0.529x
ByteAddressBuffer.Load uniform: 21.403ms 1.682x
ByteAddressBuffer.Load linear: 21.906ms 1.643x
ByteAddressBuffer.Load random: 24.336ms 1.479x
ByteAddressBuffer.Load2 uniform: 45.620ms 0.789x
ByteAddressBuffer.Load2 linear: 55.815ms 0.645x
ByteAddressBuffer.Load2 random: 48.744ms 0.738x
ByteAddressBuffer.Load3 uniform: 52.929ms 0.680x
ByteAddressBuffer.Load3 linear: 79.057ms 0.455x
ByteAddressBuffer.Load3 random: 93.636ms 0.384x
ByteAddressBuffer.Load4 uniform: 68.510ms 0.525x
ByteAddressBuffer.Load4 linear: 114.561ms 0.314x
ByteAddressBuffer.Load4 random: 209.280ms 0.172x
ByteAddressBuffer.Load2 unaligned uniform: 45.640ms 0.789x
ByteAddressBuffer.Load2 unaligned linear: 55.802ms 0.645x
ByteAddressBuffer.Load2 unaligned random: 48.717ms 0.739x
ByteAddressBuffer.Load4 unaligned uniform: 68.685ms 0.524x
ByteAddressBuffer.Load4 unaligned linear: 115.244ms 0.312x
ByteAddressBuffer.Load4 unaligned random: 210.358ms 0.171x
StructuredBuffer.Load uniform: 1.116ms 32.267x
StructuredBuffer.Load linear: 34.094ms 1.056x
StructuredBuffer.Load random: 34.092ms 1.056x
StructuredBuffer.Load uniform: 1.569ms 22.942x
StructuredBuffer.Load linear: 34.143ms 1.054x
StructuredBuffer.Load random: 34.125ms 1.055x
StructuredBuffer.Load uniform: 2.087ms 17.245x
StructuredBuffer.Load linear: 67.959ms 0.530x
StructuredBuffer.Load random: 67.950ms 0.530x
cbuffer{float4} load uniform: 1.298ms 27.733x
cbuffer{float4} load linear: 798.703ms 0.045x
cbuffer{float4} load random: 324.356ms 0.111x
Texture2D.Load uniform: 1.962ms 18.351x
Texture2D.Load linear: 34.027ms 1.058x
Texture2D.Load random: 34.029ms 1.058x
Texture2D.Load uniform: 1.994ms 18.054x
Texture2D.Load linear: 34.334ms 1.048x
Texture2D.Load random: 34.102ms 1.056x
Texture2D.Load uniform: 2.247ms 16.018x
Texture2D.Load linear: 36.077ms 0.998x
Texture2D.Load random: 35.930ms 1.002x
Texture2D.Load uniform: 2.021ms 17.814x
Texture2D.Load linear: 34.040ms 1.057x
Texture2D.Load random: 34.021ms 1.058x
Texture2D.Load uniform: 2.020ms 17.822x
Texture2D.Load linear: 34.308ms 1.049x
Texture2D.Load random: 34.095ms 1.056x
Texture2D.Load uniform: 2.199ms 16.372x
Texture2D.Load linear: 36.074ms 0.998x
Texture2D.Load random: 68.064ms 0.529x
Texture2D.Load uniform: 2.014ms 17.869x
Texture2D.Load linear: 34.042ms 1.057x
Texture2D.Load random: 34.028ms 1.058x
Texture2D.Load uniform: 1.981ms 18.166x
Texture2D.Load linear: 34.320ms 1.049x
Texture2D.Load random: 67.948ms 0.530x
Texture2D.Load uniform: 2.064ms 17.440x
Texture2D.Load linear: 67.974ms 0.530x
Texture2D.Load random: 68.049ms 0.529x

Titan V results..

Hi,
I read your tweet:
"Nvidia driver seems to generate awful code for DirectX ByteAddressBuffers: https://github.com/sebbbi/perftest . Never got an answer from Nvidia why, but I assume it’s because their wide raw load instructions (vec2, vec4) require alignment. At least in CUDA that’s true."
I tested on Titan V and seems Load1 raw32..
seems raw Load2 pretty fast vs your Titan X Pascal scores..
can you comment if Volta has some enhancements to "generate awful code" with these results below..
and in general Volta enhancements.. for example I see Volta seems first NV architecture to greatly improve "invariant" loads vs linear or random as Load R8 results (and others too) show..
also seems before 1 component fp16 linear loads were 2x slower than random!:
Titan X Pascal:

Load R16f invariant: 0.646ms
Load R16f linear: 1.170ms
Load R16f random: 0.646ms

now no more:

Load R16f invariant: 0.074ms
Load R16f linear: 0.496ms
Load R16f random: 0.487ms

output:

0: NVIDIA TITAN V
1: Microsoft Basic Render Driver
2: Microsoft Basic Render Driver
Using adapter 0
Load R8 invariant: 0.098ms
Load R8 linear: 0.512ms
Load R8 random: 0.507ms
Load RG8 invariant: 0.153ms
Load RG8 linear: 0.503ms
Load RG8 random: 0.512ms
Load RGBA8 invariant: 0.301ms
Load RGBA8 linear: 0.510ms
Load RGBA8 random: 0.517ms
Load R16f invariant: 0.076ms
Load R16f linear: 0.495ms
Load R16f random: 0.486ms
Load RG16f invariant: 0.138ms
Load RG16f linear: 0.505ms
Load RG16f random: 0.505ms
Load RGBA16f invariant: 0.278ms
Load RGBA16f linear: 0.489ms
Load RGBA16f random: 0.494ms
Load R32f invariant: 0.076ms
Load R32f linear: 0.504ms
Load R32f random: 0.496ms
Load RG32f invariant: 0.138ms
Load RG32f linear: 0.496ms
Load RG32f random: 0.496ms
Load RGBA32f invariant: 0.276ms
Load RGBA32f linear: 0.949ms
Load RGBA32f random: 0.967ms
Load1 raw32 invariant: 0.290ms
Load1 raw32 linear: 0.289ms
Load1 raw32 random: 0.288ms
Load2 raw32 invariant: 0.353ms
Load2 raw32 linear: 0.362ms
Load2 raw32 random: 0.361ms
Load3 raw32 invariant: 0.415ms
Load3 raw32 linear: 0.459ms
Load3 raw32 random: 0.428ms
Load4 raw32 invariant: 1.056ms
Load4 raw32 linear: 0.834ms
Load4 raw32 random: 0.599ms
Load2 raw32 unaligned invariant: 0.344ms
Load2 raw32 unaligned linear: 0.359ms
Load2 raw32 unaligned random: 0.362ms
Load4 raw32 unaligned invariant: 0.516ms
Load4 raw32 unaligned linear: 0.817ms
Load4 raw32 unaligned random: 0.581ms
Tex2D load R8 invariant: 0.482ms
Tex2D load R8 linear: 0.483ms
Tex2D load R8 random: 0.480ms
Tex2D load RG8 invariant: 0.606ms
Tex2D load RG8 linear: 0.511ms
Tex2D load RG8 random: 0.559ms
Tex2D load RGBA8 invariant: 0.512ms
Tex2D load RGBA8 linear: 0.514ms
Tex2D load RGBA8 random: 0.885ms
Tex2D load R16F invariant: 0.482ms
Tex2D load R16F linear: 0.483ms
Tex2D load R16F random: 0.536ms
Tex2D load RG16F invariant: 0.157ms
Tex2D load RG16F linear: 0.485ms
Tex2D load RG16F random: 0.847ms
Tex2D load RGBA16F invariant: 0.503ms
Tex2D load RGBA16F linear: 0.508ms
Tex2D load RGBA16F random: 1.015ms
Tex2D load R32F invariant: 0.481ms
Tex2D load R32F linear: 0.490ms
Tex2D load R32F random: 0.845ms
Tex2D load RG32F invariant: 0.157ms
Tex2D load RG32F linear: 0.485ms
Tex2D load RG32F random: 0.978ms
Tex2D load RGBA32F invariant: 0.950ms
Tex2D load RGBA32F linear: 0.966ms
Tex2D load RGBA32F random: 1.207ms
Load R8 invariant: 0.077ms
Load R8 linear: 0.494ms
Load R8 random: 0.485ms
Load RG8 invariant: 0.138ms
Load RG8 linear: 0.505ms
Load RG8 random: 0.497ms
Load RGBA8 invariant: 0.276ms
Load RGBA8 linear: 0.495ms
Load RGBA8 random: 0.495ms
Load R16f invariant: 0.074ms
Load R16f linear: 0.496ms
Load R16f random: 0.487ms
Load RG16f invariant: 0.136ms
Load RG16f linear: 0.487ms
Load RG16f random: 0.487ms
Load RGBA16f invariant: 0.278ms
Load RGBA16f linear: 0.492ms
Load RGBA16f random: 0.493ms
Load R32f invariant: 0.074ms
Load R32f linear: 0.497ms
Load R32f random: 0.506ms
Load RG32f invariant: 0.142ms
Load RG32f linear: 0.498ms
Load RG32f random: 0.506ms
Load RGBA32f invariant: 0.279ms
Load RGBA32f linear: 0.952ms
Load RGBA32f random: 1.452ms
Load1 raw32 invariant: 0.289ms
Load1 raw32 linear: 0.288ms
Load1 raw32 random: 0.288ms
Load2 raw32 invariant: 0.339ms
Load2 raw32 linear: 0.356ms
Load2 raw32 random: 0.357ms
Load3 raw32 invariant: 0.415ms
Load3 raw32 linear: 0.443ms
Load3 raw32 random: 0.429ms
Load4 raw32 invariant: 0.516ms
Load4 raw32 linear: 0.838ms
Load4 raw32 random: 0.587ms
Load2 raw32 unaligned invariant: 0.338ms
Load2 raw32 unaligned linear: 0.359ms
Load2 raw32 unaligned random: 0.360ms
Load4 raw32 unaligned invariant: 0.516ms
Load4 raw32 unaligned linear: 0.821ms
Load4 raw32 unaligned random: 0.584ms
Tex2D load R8 invariant: 0.482ms
Tex2D load R8 linear: 0.481ms
Tex2D load R8 random: 0.481ms
Tex2D load RG8 invariant: 0.155ms
Tex2D load RG8 linear: 0.487ms
Tex2D load RG8 random: 0.538ms
Tex2D load RGBA8 invariant: 0.506ms
Tex2D load RGBA8 linear: 0.502ms
Tex2D load RGBA8 random: 0.849ms
Tex2D load R16F invariant: 0.480ms
Tex2D load R16F linear: 0.481ms
Tex2D load R16F random: 0.538ms
Tex2D load RG16F invariant: 0.155ms
Tex2D load RG16F linear: 0.487ms
Tex2D load RG16F random: 0.849ms
Tex2D load RGBA16F invariant: 0.507ms
Tex2D load RGBA16F linear: 0.510ms
Tex2D load RGBA16F random: 1.020ms
Tex2D load R32F invariant: 0.481ms
Tex2D load R32F linear: 0.481ms
Tex2D load R32F random: 0.864ms
Tex2D load RG32F invariant: 0.155ms
Tex2D load RG32F linear: 0.487ms
Tex2D load RG32F random: 0.982ms
Tex2D load RGBA32F invariant: 0.951ms
Tex2D load RGBA32F linear: 0.951ms
Tex2D load RGBA32F random: 1.213ms
Load R8 invariant: 0.076ms
Load R8 linear: 0.497ms
Load R8 random: 1.018ms
Load RG8 invariant: 0.141ms
Load RG8 linear: 0.498ms
Load RG8 random: 0.496ms
Load RGBA8 invariant: 0.279ms
Load RGBA8 linear: 0.490ms
Load RGBA8 random: 0.493ms
Load R16f invariant: 0.073ms
Load R16f linear: 0.497ms
Load R16f random: 0.486ms
Load RG16f invariant: 0.136ms
Load RG16f linear: 0.498ms
Load RG16f random: 0.499ms
Load RGBA16f invariant: 0.279ms
Load RGBA16f linear: 0.493ms
Load RGBA16f random: 0.496ms
Load R32f invariant: 0.073ms
Load R32f linear: 0.489ms
Load R32f random: 1.004ms
Load RG32f invariant: 0.196ms
Load RG32f linear: 0.500ms
Load RG32f random: 0.509ms
Load RGBA32f invariant: 0.281ms
Load RGBA32f linear: 0.956ms
Load RGBA32f random: 0.975ms
Load1 raw32 invariant: 0.288ms
Load1 raw32 linear: 0.287ms
Load1 raw32 random: 0.286ms
Load2 raw32 invariant: 0.341ms
Load2 raw32 linear: 0.361ms
Load2 raw32 random: 0.361ms
Load3 raw32 invariant: 0.421ms
Load3 raw32 linear: 0.446ms
Load3 raw32 random: 0.429ms
Load4 raw32 invariant: 0.515ms
Load4 raw32 linear: 0.825ms
Load4 raw32 random: 0.591ms
Load2 raw32 unaligned invariant: 0.344ms
Load2 raw32 unaligned linear: 0.361ms
Load2 raw32 unaligned random: 0.360ms
Load4 raw32 unaligned invariant: 0.515ms
Load4 raw32 unaligned linear: 0.826ms
Load4 raw32 unaligned random: 0.586ms
Tex2D load R8 invariant: 0.483ms
Tex2D load R8 linear: 0.483ms
Tex2D load R8 random: 0.483ms
Tex2D load RG8 invariant: 0.155ms
Tex2D load RG8 linear: 0.490ms
Tex2D load RG8 random: 0.541ms
Tex2D load RGBA8 invariant: 0.508ms
Tex2D load RGBA8 linear: 0.513ms
Tex2D load RGBA8 random: 0.868ms
Tex2D load R16F invariant: 0.487ms
Tex2D load R16F linear: 0.486ms
Tex2D load R16F random: 0.566ms
Tex2D load RG16F invariant: 0.157ms
Tex2D load RG16F linear: 0.497ms
Tex2D load RG16F random: 0.862ms
Tex2D load RGBA16F invariant: 0.515ms
Tex2D load RGBA16F linear: 0.519ms
Tex2D load RGBA16F random: 1.018ms
Tex2D load R32F invariant: 0.488ms
Tex2D load R32F linear: 0.490ms
Tex2D load R32F random: 0.862ms
Tex2D load RG32F invariant: 0.157ms
Tex2D load RG32F linear: 0.497ms
Tex2D load RG32F random: 0.998ms
Tex2D load RGBA32F invariant: 0.967ms
Tex2D load RGBA32F linear: 1.564ms
Tex2D load RGBA32F random: 1.214ms

Ada results (4070)!

PerfTest
To select adapter, use: PerfTest.exe [ADAPTER_INDEX]

Adapters found:
0: NVIDIA GeForce RTX 4070
1: Microsoft Basic Render Driver
Using adapter 0

Running 30 warm-up frames and 30 benchmark frames:
.............................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Performance compared to Buffer<RGBA8>.Load random

Buffer<R8>.Load uniform: 8.139ms 1.463x
Buffer<R8>.Load linear: 8.502ms 1.401x
Buffer<R8>.Load random: 10.092ms 1.180x
Buffer<RG8>.Load uniform: 10.089ms 1.180x
Buffer<RG8>.Load linear: 9.256ms 1.287x
Buffer<RG8>.Load random: 8.333ms 1.429x
Buffer<RGBA8>.Load uniform: 8.154ms 1.460x
Buffer<RGBA8>.Load linear: 8.229ms 1.447x
Buffer<RGBA8>.Load random: 11.908ms 1.000x
Buffer<R16f>.Load uniform: 7.930ms 1.502x
Buffer<R16f>.Load linear: 7.982ms 1.492x
Buffer<R16f>.Load random: 8.136ms 1.464x
Buffer<RG16f>.Load uniform: 7.984ms 1.491x
Buffer<RG16f>.Load linear: 8.150ms 1.461x
Buffer<RG16f>.Load random: 11.856ms 1.004x
Buffer<RGBA16f>.Load uniform: 8.125ms 1.466x
Buffer<RGBA16f>.Load linear: 8.248ms 1.444x
Buffer<RGBA16f>.Load random: 8.623ms 1.381x
Buffer<R32f>.Load uniform: 7.950ms 1.498x
Buffer<R32f>.Load linear: 7.956ms 1.497x
Buffer<R32f>.Load random: 11.854ms 1.005x
Buffer<RG32f>.Load uniform: 7.944ms 1.499x
Buffer<RG32f>.Load linear: 8.006ms 1.487x
Buffer<RG32f>.Load random: 7.972ms 1.494x
Buffer<RGBA32f>.Load uniform: 15.772ms 0.755x
Buffer<RGBA32f>.Load linear: 15.816ms 0.753x
Buffer<RGBA32f>.Load random: 15.837ms 0.752x
ByteAddressBuffer.Load uniform: 7.205ms 1.653x
ByteAddressBuffer.Load linear: 6.301ms 1.890x
ByteAddressBuffer.Load random: 6.112ms 1.948x
ByteAddressBuffer.Load2 uniform: 10.265ms 1.160x
ByteAddressBuffer.Load2 linear: 9.044ms 1.317x
ByteAddressBuffer.Load2 random: 9.039ms 1.317x
ByteAddressBuffer.Load3 uniform: 12.291ms 0.969x
ByteAddressBuffer.Load3 linear: 12.033ms 0.990x
ByteAddressBuffer.Load3 random: 11.978ms 0.994x
ByteAddressBuffer.Load4 uniform: 15.934ms 0.747x
ByteAddressBuffer.Load4 linear: 19.940ms 0.597x
ByteAddressBuffer.Load4 random: 15.964ms 0.746x
ByteAddressBuffer.Load2 unaligned uniform: 10.423ms 1.142x
ByteAddressBuffer.Load2 unaligned linear: 9.037ms 1.318x
ByteAddressBuffer.Load2 unaligned random: 9.016ms 1.321x
ByteAddressBuffer.Load4 unaligned uniform: 15.938ms 0.747x
ByteAddressBuffer.Load4 unaligned linear: 19.903ms 0.598x
ByteAddressBuffer.Load4 unaligned random: 15.955ms 0.746x
StructuredBuffer<float>.Load uniform: 7.030ms 1.694x
StructuredBuffer<float>.Load linear: 5.768ms 2.064x
StructuredBuffer<float>.Load random: 5.749ms 2.071x
StructuredBuffer<float2>.Load uniform: 8.017ms 1.485x
StructuredBuffer<float2>.Load linear: 8.032ms 1.483x
StructuredBuffer<float2>.Load random: 5.807ms 2.051x
StructuredBuffer<float4>.Load uniform: 8.560ms 1.391x
StructuredBuffer<float4>.Load linear: 8.521ms 1.398x
StructuredBuffer<float4>.Load random: 8.696ms 1.369x
cbuffer{float4} load uniform: 78.939ms 0.151x
cbuffer{float4} load linear: 330.084ms 0.036x
cbuffer{float4} load random: 125.805ms 0.095x
Texture2D<R8>.Load uniform: 7.969ms 1.494x
Texture2D<R8>.Load linear: 7.993ms 1.490x
Texture2D<R8>.Load random: 7.967ms 1.495x
Texture2D<RG8>.Load uniform: 8.197ms 1.453x
Texture2D<RG8>.Load linear: 8.385ms 1.420x
Texture2D<RG8>.Load random: 8.205ms 1.451x
Texture2D<RGBA8>.Load uniform: 8.318ms 1.432x
Texture2D<RGBA8>.Load linear: 11.926ms 0.999x
Texture2D<RGBA8>.Load random: 16.152ms 0.737x
Texture2D<R16F>.Load uniform: 7.970ms 1.494x
Texture2D<R16F>.Load linear: 7.970ms 1.494x
Texture2D<R16F>.Load random: 7.979ms 1.492x
Texture2D<RG16F>.Load uniform: 7.979ms 1.492x
Texture2D<RG16F>.Load linear: 12.097ms 0.984x
Texture2D<RG16F>.Load random: 16.136ms 0.738x
Texture2D<RGBA16F>.Load uniform: 8.157ms 1.460x
Texture2D<RGBA16F>.Load linear: 21.618ms 0.551x
Texture2D<RGBA16F>.Load random: 31.902ms 0.373x
Texture2D<R32F>.Load uniform: 7.944ms 1.499x
Texture2D<R32F>.Load linear: 12.044ms 0.989x
Texture2D<R32F>.Load random: 16.292ms 0.731x
Texture2D<RG32F>.Load uniform: 7.999ms 1.489x
Texture2D<RG32F>.Load linear: 21.805ms 0.546x
Texture2D<RG32F>.Load random: 31.726ms 0.375x
Texture2D<RGBA32F>.Load uniform: 15.820ms 0.753x
Texture2D<RGBA32F>.Load linear: 32.516ms 0.366x
Texture2D<RGBA32F>.Load random: 31.546ms 0.377x
Texture2D<R8>.Sample(nearest) uniform: 16.020ms 0.743x
Texture2D<R8>.Sample(nearest) linear: 15.839ms 0.752x
Texture2D<R8>.Sample(nearest) random: 16.225ms 0.734x
Texture2D<RG8>.Sample(nearest) uniform: 16.323ms 0.730x
Texture2D<RG8>.Sample(nearest) linear: 15.803ms 0.754x
Texture2D<RG8>.Sample(nearest) random: 15.788ms 0.754x
Texture2D<RGBA8>.Sample(nearest) uniform: 15.974ms 0.745x
Texture2D<RGBA8>.Sample(nearest) linear: 16.169ms 0.736x
Texture2D<RGBA8>.Sample(nearest) random: 16.185ms 0.736x
Texture2D<R16F>.Sample(nearest) uniform: 16.365ms 0.728x
Texture2D<R16F>.Sample(nearest) linear: 16.029ms 0.743x
Texture2D<R16F>.Sample(nearest) random: 15.818ms 0.753x
Texture2D<RG16F>.Sample(nearest) uniform: 15.780ms 0.755x
Texture2D<RG16F>.Sample(nearest) linear: 16.151ms 0.737x
Texture2D<RG16F>.Sample(nearest) random: 15.795ms 0.754x
Texture2D<RGBA16F>.Sample(nearest) uniform: 16.326ms 0.729x
Texture2D<RGBA16F>.Sample(nearest) linear: 16.014ms 0.744x
Texture2D<RGBA16F>.Sample(nearest) random: 31.503ms 0.378x
Texture2D<R32F>.Sample(nearest) uniform: 16.004ms 0.744x
Texture2D<R32F>.Sample(nearest) linear: 15.830ms 0.752x
Texture2D<R32F>.Sample(nearest) random: 16.198ms 0.735x
Texture2D<RG32F>.Sample(nearest) uniform: 15.928ms 0.748x
Texture2D<RG32F>.Sample(nearest) linear: 15.985ms 0.745x
Texture2D<RG32F>.Sample(nearest) random: 31.506ms 0.378x
Texture2D<RGBA32F>.Sample(bilinear) uniform: 31.343ms 0.380x
Texture2D<RGBA32F>.Sample(nearest) linear: 31.767ms 0.375x
Texture2D<RGBA32F>.Sample(nearest) random: 31.557ms 0.377x
Texture2D<R8>.Sample(bilinear) uniform: 15.994ms 0.745x
Texture2D<R8>.Sample(bilinear) linear: 16.214ms 0.734x
Texture2D<R8>.Sample(bilinear) random: 15.821ms 0.753x
Texture2D<RG8>.Sample(bilinear) uniform: 15.786ms 0.754x
Texture2D<RG8>.Sample(bilinear) linear: 15.774ms 0.755x
Texture2D<RG8>.Sample(bilinear) random: 15.800ms 0.754x
Texture2D<RGBA8>.Sample(bilinear) uniform: 15.939ms 0.747x
Texture2D<RGBA8>.Sample(bilinear) linear: 15.820ms 0.753x
Texture2D<RGBA8>.Sample(bilinear) random: 15.778ms 0.755x
Texture2D<R16F>.Sample(bilinear) uniform: 15.992ms 0.745x
Texture2D<R16F>.Sample(bilinear) linear: 15.820ms 0.753x
Texture2D<R16F>.Sample(bilinear) random: 15.821ms 0.753x
Texture2D<RG16F>.Sample(bilinear) uniform: 15.756ms 0.756x
Texture2D<RG16F>.Sample(bilinear) linear: 15.796ms 0.754x
Texture2D<RG16F>.Sample(bilinear) random: 15.760ms 0.756x
Texture2D<RGBA16F>.Sample(bilinear) uniform: 15.779ms 0.755x
Texture2D<RGBA16F>.Sample(bilinear) linear: 15.790ms 0.754x
Texture2D<RGBA16F>.Sample(bilinear) random: 31.697ms 0.376x
Texture2D<R32F>.Sample(bilinear) uniform: 15.805ms 0.753x
Texture2D<R32F>.Sample(bilinear) linear: 15.847ms 0.751x
Texture2D<R32F>.Sample(bilinear) random: 15.996ms 0.744x
Texture2D<RG32F>.Sample(bilinear) uniform: 15.761ms 0.756x
Texture2D<RG32F>.Sample(bilinear) linear: 15.770ms 0.755x
Texture2D<RG32F>.Sample(bilinear) random: 31.517ms 0.378x
Texture2D<RGBA32F>.Sample(bilinear) uniform: 62.698ms 0.190x
Texture2D<RGBA32F>.Sample(bilinear) linear: 62.823ms 0.190x
Texture2D<RGBA32F>.Sample(bilinear) random: 93.925ms 0.127x

Awesome project

Thanks for making this available. Very helpful to see hard data backing up (or not) my
instinct about memory speed for different image/buffer configurations.

I have a pair of RX 470s if you're interested in data for Polaris arch.

Aaron

NVIDIA Kepler results (GTX 760M, driver version 419.67)

PerfTest
To select adapter, use: PerfTest.exe [ADAPTER_INDEX]

Adapters found:
0: NVIDIA GeForce GTX 760M
1: Intel(R) HD Graphics 4600
2: Microsoft Basic Render Driver
Using adapter 0

Running 30 warm-up frames and 30 benchmark frames:
.............................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Performance compared to Buffer.Load random

Buffer.Load uniform: 3.073ms 62.440x
Buffer.Load linear: 195.662ms 0.981x
Buffer.Load random: 197.022ms 0.974x
Buffer.Load uniform: 3.227ms 59.465x
Buffer.Load linear: 195.179ms 0.983x
Buffer.Load random: 196.785ms 0.975x
Buffer.Load uniform: 3.598ms 53.329x
Buffer.Load linear: 193.676ms 0.991x
Buffer.Load random: 191.866ms 1.000x
Buffer.Load uniform: 3.031ms 63.308x
Buffer.Load linear: 195.622ms 0.981x
Buffer.Load random: 197.009ms 0.974x
Buffer.Load uniform: 3.025ms 63.434x
Buffer.Load linear: 195.135ms 0.983x
Buffer.Load random: 196.860ms 0.975x
Buffer.Load uniform: 3.443ms 55.728x
Buffer.Load linear: 193.744ms 0.990x
Buffer.Load random: 191.929ms 1.000x
Buffer.Load uniform: 2.970ms 64.605x
Buffer.Load linear: 195.751ms 0.980x
Buffer.Load random: 197.141ms 0.973x
Buffer.Load uniform: 3.175ms 60.425x
Buffer.Load linear: 195.351ms 0.982x
Buffer.Load random: 196.911ms 0.974x
Buffer.Load uniform: 3.621ms 52.985x
Buffer.Load linear: 350.658ms 0.547x
Buffer.Load random: 350.633ms 0.547x
ByteAddressBuffer.Load uniform: 3.758ms 51.055x
ByteAddressBuffer.Load linear: 191.898ms 1.000x
ByteAddressBuffer.Load random: 216.928ms 0.884x
ByteAddressBuffer.Load2 uniform: 4.682ms 40.977x
ByteAddressBuffer.Load2 linear: 390.852ms 0.491x
ByteAddressBuffer.Load2 random: 442.053ms 0.434x
ByteAddressBuffer.Load3 uniform: 572.822ms 0.335x
ByteAddressBuffer.Load3 linear: 568.316ms 0.338x
ByteAddressBuffer.Load3 random: 570.361ms 0.336x
ByteAddressBuffer.Load4 uniform: 752.691ms 0.255x
ByteAddressBuffer.Load4 linear: 758.795ms 0.253x
ByteAddressBuffer.Load4 random: 763.638ms 0.251x
ByteAddressBuffer.Load2 unaligned uniform: 4.199ms 45.692x
ByteAddressBuffer.Load2 unaligned linear: 391.542ms 0.490x
ByteAddressBuffer.Load2 unaligned random: 442.574ms 0.434x
ByteAddressBuffer.Load4 unaligned uniform: 752.793ms 0.255x
ByteAddressBuffer.Load4 unaligned linear: 758.698ms 0.253x
ByteAddressBuffer.Load4 unaligned random: 763.679ms 0.251x
StructuredBuffer.Load uniform: 3.103ms 61.827x
StructuredBuffer.Load linear: 195.674ms 0.981x
StructuredBuffer.Load random: 196.991ms 0.974x
StructuredBuffer.Load uniform: 3.301ms 58.120x
StructuredBuffer.Load linear: 195.167ms 0.983x
StructuredBuffer.Load random: 196.749ms 0.975x
StructuredBuffer.Load uniform: 3.846ms 49.882x
StructuredBuffer.Load linear: 350.461ms 0.547x
StructuredBuffer.Load random: 350.494ms 0.547x
cbuffer{float4} load uniform: 4.478ms 42.844x
cbuffer{float4} load linear: 9217.404ms 0.021x
cbuffer{float4} load random: 3333.476ms 0.058x
Texture2D.Load uniform: 3.384ms 56.695x
Texture2D.Load linear: 202.197ms 0.949x
Texture2D.Load random: 204.327ms 0.939x
Texture2D.Load uniform: 3.731ms 51.424x
Texture2D.Load linear: 198.542ms 0.966x
Texture2D.Load random: 211.881ms 0.906x
Texture2D.Load uniform: 4.306ms 44.558x
Texture2D.Load linear: 196.088ms 0.978x
Texture2D.Load random: 195.847ms 0.980x
Texture2D.Load uniform: 3.419ms 56.118x
Texture2D.Load linear: 202.264ms 0.949x
Texture2D.Load random: 204.311ms 0.939x
Texture2D.Load uniform: 3.673ms 52.243x
Texture2D.Load linear: 198.553ms 0.966x
Texture2D.Load random: 211.917ms 0.905x
Texture2D.Load uniform: 4.115ms 46.626x
Texture2D.Load linear: 196.084ms 0.978x
Texture2D.Load random: 350.561ms 0.547x
Texture2D.Load uniform: 3.517ms 54.547x
Texture2D.Load linear: 202.339ms 0.948x
Texture2D.Load random: 204.392ms 0.939x
Texture2D.Load uniform: 3.705ms 51.783x
Texture2D.Load linear: 198.537ms 0.966x
Texture2D.Load random: 350.591ms 0.547x
Texture2D.Load uniform: 4.028ms 47.637x
Texture2D.Load linear: 350.589ms 0.547x
Texture2D.Load random: 350.519ms 0.547x

Could you clarify that the bandwidth used?

Right now I see the timings, but the problem is that a RG8 load is 8x less memory being fetched than a RGBA32F

So its really hard to translate this to a meaningful and useful number if you're trying to decide on the format to store armature bone matrices for vertex skinning.

R9 390X Results (GCN2)

Buffer.Load uniform: 11.302ms 3.907x
Buffer.Load linear: 11.327ms 3.899x
Buffer.Load random: 44.150ms 1.000x
Buffer.Load uniform: 49.611ms 0.890x
Buffer.Load linear: 49.835ms 0.886x
Buffer.Load random: 49.615ms 0.890x
Buffer.Load uniform: 44.149ms 1.000x
Buffer.Load linear: 44.806ms 0.986x
Buffer.Load random: 44.164ms 1.000x
Buffer.Load uniform: 11.131ms 3.968x
Buffer.Load linear: 11.139ms 3.965x
Buffer.Load random: 44.076ms 1.002x
Buffer.Load uniform: 49.552ms 0.891x
Buffer.Load linear: 49.560ms 0.891x
Buffer.Load random: 49.559ms 0.891x
Buffer.Load uniform: 44.066ms 1.002x
Buffer.Load linear: 44.687ms 0.988x
Buffer.Load random: 44.066ms 1.002x
Buffer.Load uniform: 11.132ms 3.967x
Buffer.Load linear: 11.139ms 3.965x
Buffer.Load random: 44.071ms 1.002x
Buffer.Load uniform: 49.558ms 0.891x
Buffer.Load linear: 49.560ms 0.891x
Buffer.Load random: 49.559ms 0.891x
Buffer.Load uniform: 44.061ms 1.002x
Buffer.Load linear: 44.613ms 0.990x
Buffer.Load random: 49.583ms 0.891x
ByteAddressBuffer.Load uniform: 10.322ms 4.278x
ByteAddressBuffer.Load linear: 11.546ms 3.825x
ByteAddressBuffer.Load random: 44.153ms 1.000x
ByteAddressBuffer.Load2 uniform: 11.499ms 3.841x
ByteAddressBuffer.Load2 linear: 49.628ms 0.890x
ByteAddressBuffer.Load2 random: 49.651ms 0.889x
ByteAddressBuffer.Load3 uniform: 16.985ms 2.600x
ByteAddressBuffer.Load3 linear: 44.142ms 1.000x
ByteAddressBuffer.Load3 random: 88.176ms 0.501x
ByteAddressBuffer.Load4 uniform: 22.472ms 1.965x
ByteAddressBuffer.Load4 linear: 44.212ms 0.999x
ByteAddressBuffer.Load4 random: 49.346ms 0.895x
ByteAddressBuffer.Load2 unaligned uniform: 11.422ms 3.867x
ByteAddressBuffer.Load2 unaligned linear: 49.552ms 0.891x
ByteAddressBuffer.Load2 unaligned random: 49.561ms 0.891x
ByteAddressBuffer.Load4 unaligned uniform: 22.373ms 1.974x
ByteAddressBuffer.Load4 unaligned linear: 44.095ms 1.002x
ByteAddressBuffer.Load4 unaligned random: 54.464ms 0.811x
StructuredBuffer.Load uniform: 12.585ms 3.509x
StructuredBuffer.Load linear: 11.770ms 3.752x
StructuredBuffer.Load random: 44.176ms 1.000x
StructuredBuffer.Load uniform: 13.210ms 3.343x
StructuredBuffer.Load linear: 50.217ms 0.879x
StructuredBuffer.Load random: 49.645ms 0.890x
StructuredBuffer.Load uniform: 13.818ms 3.196x
StructuredBuffer.Load linear: 44.721ms 0.988x
StructuredBuffer.Load random: 49.666ms 0.889x
cbuffer{float4} load uniform: 16.702ms 2.644x
cbuffer{float4} load linear: 44.447ms 0.994x
cbuffer{float4} load random: 49.656ms 0.889x
Texture2D.Load uniform: 44.214ms 0.999x
Texture2D.Load linear: 44.795ms 0.986x
Texture2D.Load random: 44.808ms 0.986x
Texture2D.Load uniform: 49.706ms 0.888x
Texture2D.Load linear: 50.231ms 0.879x
Texture2D.Load random: 50.200ms 0.880x
Texture2D.Load uniform: 44.760ms 0.987x
Texture2D.Load linear: 45.339ms 0.974x
Texture2D.Load random: 45.405ms 0.973x
Texture2D.Load uniform: 44.175ms 1.000x
Texture2D.Load linear: 44.157ms 1.000x
Texture2D.Load random: 44.096ms 1.002x
Texture2D.Load uniform: 49.739ms 0.888x
Texture2D.Load linear: 49.661ms 0.889x
Texture2D.Load random: 49.622ms 0.890x
Texture2D.Load uniform: 44.257ms 0.998x
Texture2D.Load linear: 44.267ms 0.998x
Texture2D.Load random: 88.126ms 0.501x
Texture2D.Load uniform: 44.259ms 0.998x
Texture2D.Load linear: 44.193ms 0.999x
Texture2D.Load random: 44.099ms 1.001x
Texture2D.Load uniform: 49.739ms 0.888x
Texture2D.Load linear: 49.667ms 0.889x
Texture2D.Load random: 88.110ms 0.501x
Texture2D.Load uniform: 44.288ms 0.997x
Texture2D.Load linear: 66.145ms 0.668x
Texture2D.Load random: 88.124ms 0.501x

Raw tests seem to have extra conversions

I'm seeing a lot of extra v_cvt_f32_u32 instructions comparing raw to structured tests. They go away if I wrap the raw loads with asfloat(). Did I make a mistake?

R9 Fury results

R9 Fury Nitro - 56 CU (3584 sp) @1050 MHz

Adapters found:
0: AMD Radeon (TM) R9 Fury Series
1: Microsoft Basic Render Driver
Using adapter 0

Running 30 warm-up frames and 30 benchmark frames:
.............................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Performance compared to Buffer.Load random

Buffer.Load uniform: 8.963ms 3.911x
Buffer.Load linear: 8.917ms 3.931x
Buffer.Load random: 35.058ms 1.000x
Buffer.Load uniform: 39.416ms 0.889x
Buffer.Load linear: 39.447ms 0.889x
Buffer.Load random: 39.413ms 0.889x
Buffer.Load uniform: 35.051ms 1.000x
Buffer.Load linear: 35.048ms 1.000x
Buffer.Load random: 35.051ms 1.000x
Buffer.Load uniform: 8.898ms 3.939x
Buffer.Load linear: 8.909ms 3.934x
Buffer.Load random: 35.050ms 1.000x
Buffer.Load uniform: 39.405ms 0.890x
Buffer.Load linear: 39.435ms 0.889x
Buffer.Load random: 39.407ms 0.889x
Buffer.Load uniform: 35.041ms 1.000x
Buffer.Load linear: 35.043ms 1.000x
Buffer.Load random: 35.046ms 1.000x
Buffer.Load uniform: 8.897ms 3.940x
Buffer.Load linear: 8.910ms 3.934x
Buffer.Load random: 35.048ms 1.000x
Buffer.Load uniform: 39.407ms 0.889x
Buffer.Load linear: 39.433ms 0.889x
Buffer.Load random: 39.406ms 0.889x
Buffer.Load uniform: 35.043ms 1.000x
Buffer.Load linear: 35.045ms 1.000x
Buffer.Load random: 39.405ms 0.890x
ByteAddressBuffer.Load uniform: 10.956ms 3.199x
ByteAddressBuffer.Load linear: 9.100ms 3.852x
ByteAddressBuffer.Load random: 35.038ms 1.000x
ByteAddressBuffer.Load2 uniform: 11.070ms 3.166x
ByteAddressBuffer.Load2 linear: 39.413ms 0.889x
ByteAddressBuffer.Load2 random: 39.411ms 0.889x
ByteAddressBuffer.Load3 uniform: 13.534ms 2.590x
ByteAddressBuffer.Load3 linear: 35.047ms 1.000x
ByteAddressBuffer.Load3 random: 70.033ms 0.500x
ByteAddressBuffer.Load4 uniform: 17.944ms 1.953x
ByteAddressBuffer.Load4 linear: 35.072ms 0.999x
ByteAddressBuffer.Load4 random: 39.149ms 0.895x
ByteAddressBuffer.Load2 unaligned uniform: 11.209ms 3.127x
ByteAddressBuffer.Load2 unaligned linear: 39.408ms 0.889x
ByteAddressBuffer.Load2 unaligned random: 39.406ms 0.890x
ByteAddressBuffer.Load4 unaligned uniform: 17.933ms 1.955x
ByteAddressBuffer.Load4 unaligned linear: 35.066ms 1.000x
ByteAddressBuffer.Load4 unaligned random: 43.241ms 0.811x
StructuredBuffer.Load uniform: 12.653ms 2.770x
StructuredBuffer.Load linear: 8.913ms 3.932x
StructuredBuffer.Load random: 35.059ms 1.000x
StructuredBuffer.Load uniform: 12.799ms 2.739x
StructuredBuffer.Load linear: 39.445ms 0.889x
StructuredBuffer.Load random: 39.413ms 0.889x
StructuredBuffer.Load uniform: 12.834ms 2.731x
StructuredBuffer.Load linear: 35.049ms 1.000x
StructuredBuffer.Load random: 39.411ms 0.889x
cbuffer{float4} load uniform: 14.861ms 2.359x
cbuffer{float4} load linear: 35.534ms 0.986x
cbuffer{float4} load random: 39.412ms 0.889x
Texture2D.Load uniform: 35.063ms 1.000x
Texture2D.Load linear: 35.038ms 1.000x
Texture2D.Load random: 35.040ms 1.000x
Texture2D.Load uniform: 39.430ms 0.889x
Texture2D.Load linear: 39.436ms 0.889x
Texture2D.Load random: 39.436ms 0.889x
Texture2D.Load uniform: 35.059ms 1.000x
Texture2D.Load linear: 35.061ms 1.000x
Texture2D.Load random: 35.055ms 1.000x
Texture2D.Load uniform: 35.056ms 1.000x
Texture2D.Load linear: 35.038ms 1.000x
Texture2D.Load random: 35.040ms 1.000x
Texture2D.Load uniform: 39.431ms 0.889x
Texture2D.Load linear: 39.440ms 0.889x
Texture2D.Load random: 39.436ms 0.889x
Texture2D.Load uniform: 35.054ms 1.000x
Texture2D.Load linear: 35.061ms 1.000x
Texture2D.Load random: 70.037ms 0.500x
Texture2D.Load uniform: 35.055ms 1.000x
Texture2D.Load linear: 35.041ms 1.000x
Texture2D.Load random: 35.041ms 1.000x
Texture2D.Load uniform: 39.433ms 0.889x
Texture2D.Load linear: 39.439ms 0.889x
Texture2D.Load random: 70.039ms 0.500x
Texture2D.Load uniform: 35.054ms 1.000x
Texture2D.Load linear: 52.549ms 0.667x
Texture2D.Load random: 70.037ms 0.500x

Vega results..

just in case useful..

Load R8 invariant: 0.212ms
Load R8 linear: 0.209ms
Load R8 random: 0.748ms
Load RG8 invariant: 0.748ms
Load RG8 linear: 0.751ms
Load RG8 random: 0.749ms
Load RGBA8 invariant: 0.750ms
Load RGBA8 linear: 0.753ms
Load RGBA8 random: 0.751ms
Load R16f invariant: 0.210ms
Load R16f linear: 0.209ms
Load R16f random: 0.748ms
Load RG16f invariant: 0.748ms
Load RG16f linear: 0.751ms
Load RG16f random: 0.749ms
Load RGBA16f invariant: 0.750ms
Load RGBA16f linear: 0.754ms
Load RGBA16f random: 0.752ms
Load R32f invariant: 0.210ms
Load R32f linear: 0.209ms
Load R32f random: 0.749ms
Load RG32f invariant: 0.748ms
Load RG32f linear: 0.752ms
Load RG32f random: 0.750ms
Load RGBA32f invariant: 0.751ms
Load RGBA32f linear: 0.754ms
Load RGBA32f random: 0.845ms
Load1 raw32 invariant: 0.175ms
Load1 raw32 linear: 0.211ms
Load1 raw32 random: 0.750ms
Load2 raw32 invariant: 0.195ms
Load2 raw32 linear: 0.753ms
Load2 raw32 random: 0.752ms
Load3 raw32 invariant: 0.292ms
Load3 raw32 linear: 0.755ms
Load3 raw32 random: 1.492ms
Load4 raw32 invariant: 0.379ms
Load4 raw32 linear: 0.756ms
Load4 raw32 random: 0.841ms
Load2 raw32 unaligned invariant: 0.196ms
Load2 raw32 unaligned linear: 0.753ms
Load2 raw32 unaligned random: 0.751ms
Load4 raw32 unaligned invariant: 0.379ms
Load4 raw32 unaligned linear: 0.755ms
Load4 raw32 unaligned random: 0.928ms
Tex2D load R8 invariant: 0.748ms
Tex2D load R8 linear: 0.749ms
Tex2D load R8 random: 1.027ms
Tex2D load RG8 invariant: 0.750ms
Tex2D load RG8 linear: 0.751ms
Tex2D load RG8 random: 1.028ms
Tex2D load RGBA8 invariant: 0.752ms
Tex2D load RGBA8 linear: 0.754ms
Tex2D load RGBA8 random: 1.031ms
Tex2D load R16F invariant: 0.749ms
Tex2D load R16F linear: 0.750ms
Tex2D load R16F random: 1.027ms
Tex2D load RG16F invariant: 0.750ms
Tex2D load RG16F linear: 0.752ms
Tex2D load RG16F random: 1.029ms
Tex2D load RGBA16F invariant: 0.753ms
Tex2D load RGBA16F linear: 0.754ms
Tex2D load RGBA16F random: 1.539ms
Tex2D load R32F invariant: 0.748ms
Tex2D load R32F linear: 0.751ms
Tex2D load R32F random: 1.028ms
Tex2D load RG32F invariant: 0.750ms
Tex2D load RG32F linear: 0.753ms
Tex2D load RG32F random: 1.537ms
Tex2D load RGBA32F invariant: 0.753ms
Tex2D load RGBA32F linear: 1.125ms
Tex2D load RGBA32F random: 1.495ms

Zero results on NVIDIA Fermi

Benchmark show zero result on NVIDIA Fermi (GeForce GRX 460). Maybe it is bug in driver.
I inserted Flush() between dispatch and End for timestamp query and this fixed issue. It does not affected results on other GPUs.

GTX 1070 Ti Results

PerfTest
To select adapter, use: PerfTest.exe [ADAPTER_INDEX]

Adapters found:
0: NVIDIA GeForce GTX 1070 Ti
1: Microsoft Basic Render Driver
Using adapter 0

Running 30 warm-up frames and 30 benchmark frames:
.............................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Performance compared to Buffer.Load random

Buffer.Load uniform: 0.845ms 36.835x
Buffer.Load linear: 29.335ms 1.061x
Buffer.Load random: 28.981ms 1.074x
Buffer.Load uniform: 1.151ms 27.036x
Buffer.Load linear: 30.267ms 1.028x
Buffer.Load random: 29.359ms 1.060x
Buffer.Load uniform: 1.534ms 20.286x
Buffer.Load linear: 31.214ms 0.997x
Buffer.Load random: 31.118ms 1.000x
Buffer.Load uniform: 0.808ms 38.516x
Buffer.Load linear: 28.943ms 1.075x
Buffer.Load random: 29.870ms 1.042x
Buffer.Load uniform: 1.119ms 27.803x
Buffer.Load linear: 29.458ms 1.056x
Buffer.Load random: 29.904ms 1.041x
Buffer.Load uniform: 1.467ms 21.207x
Buffer.Load linear: 31.222ms 0.997x
Buffer.Load random: 30.223ms 1.030x
Buffer.Load uniform: 0.847ms 36.746x
Buffer.Load linear: 30.240ms 1.029x
Buffer.Load random: 28.963ms 1.074x
Buffer.Load uniform: 1.087ms 28.615x
Buffer.Load linear: 30.391ms 1.024x
Buffer.Load random: 29.475ms 1.056x
Buffer.Load uniform: 1.434ms 21.706x
Buffer.Load linear: 59.394ms 0.524x
Buffer.Load random: 57.593ms 0.540x
ByteAddressBuffer.Load uniform: 18.151ms 1.714x
ByteAddressBuffer.Load linear: 18.451ms 1.686x
ByteAddressBuffer.Load random: 21.305ms 1.461x
ByteAddressBuffer.Load2 uniform: 41.123ms 0.757x
ByteAddressBuffer.Load2 linear: 40.461ms 0.769x
ByteAddressBuffer.Load2 random: 49.244ms 0.632x
ByteAddressBuffer.Load3 uniform: 44.836ms 0.694x
ByteAddressBuffer.Load3 linear: 65.966ms 0.472x
ByteAddressBuffer.Load3 random: 77.712ms 0.400x
ByteAddressBuffer.Load4 uniform: 58.439ms 0.532x
ByteAddressBuffer.Load4 linear: 97.260ms 0.320x
ByteAddressBuffer.Load4 random: 174.779ms 0.178x
ByteAddressBuffer.Load2 unaligned uniform: 41.147ms 0.756x
ByteAddressBuffer.Load2 unaligned linear: 40.483ms 0.769x
ByteAddressBuffer.Load2 unaligned random: 55.911ms 0.557x
ByteAddressBuffer.Load4 unaligned uniform: 58.126ms 0.535x
ByteAddressBuffer.Load4 unaligned linear: 99.081ms 0.314x
ByteAddressBuffer.Load4 unaligned random: 179.514ms 0.173x
StructuredBuffer.Load uniform: 0.887ms 35.091x
StructuredBuffer.Load linear: 29.878ms 1.042x
StructuredBuffer.Load random: 29.408ms 1.058x
StructuredBuffer.Load uniform: 1.141ms 27.279x
StructuredBuffer.Load linear: 30.575ms 1.018x
StructuredBuffer.Load random: 28.985ms 1.074x
StructuredBuffer.Load uniform: 1.523ms 20.436x
StructuredBuffer.Load linear: 58.493ms 0.532x
StructuredBuffer.Load random: 58.546ms 0.532x
cbuffer{float4} load uniform: 1.390ms 22.394x
cbuffer{float4} load linear: 684.120ms 0.045x
cbuffer{float4} load random: 273.085ms 0.114x
Texture2D.Load uniform: 1.627ms 19.125x
Texture2D.Load linear: 28.924ms 1.076x
Texture2D.Load random: 28.923ms 1.076x
Texture2D.Load uniform: 1.378ms 22.577x
Texture2D.Load linear: 29.041ms 1.072x
Texture2D.Load random: 28.938ms 1.075x
Texture2D.Load uniform: 1.563ms 19.914x
Texture2D.Load linear: 30.666ms 1.015x
Texture2D.Load random: 30.334ms 1.026x
Texture2D.Load uniform: 1.313ms 23.704x
Texture2D.Load linear: 28.961ms 1.074x
Texture2D.Load random: 28.968ms 1.074x
Texture2D.Load uniform: 1.360ms 22.883x
Texture2D.Load linear: 29.048ms 1.071x
Texture2D.Load random: 28.926ms 1.076x
Texture2D.Load uniform: 1.501ms 20.729x
Texture2D.Load linear: 30.649ms 1.015x
Texture2D.Load random: 57.629ms 0.540x
Texture2D.Load uniform: 1.384ms 22.477x
Texture2D.Load linear: 28.955ms 1.075x
Texture2D.Load random: 28.968ms 1.074x
Texture2D.Load uniform: 1.408ms 22.101x
Texture2D.Load linear: 29.056ms 1.071x
Texture2D.Load random: 57.672ms 0.540x
Texture2D.Load uniform: 1.538ms 20.232x
Texture2D.Load linear: 57.653ms 0.540x
Texture2D.Load random: 57.557ms 0.541x

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.