gpuopen-drivers / xgl Goto Github PK

View Code? Open in Web Editor NEW

204.0 204.0 70.0 10.51 MB

Vulkan API Layer

License: MIT License

CMake 1.21% C++ 84.47% C 11.44% Python 2.51% Objective-C 0.07% Dockerfile 0.02% HLSL 0.28% GLSL 0.01%

xgl's People

Contributors

Stargazers

Watchers

xgl's Issues

Missing VK_COMPOSITE_ALPHA_PRE_MULTIPLIED_BIT_KHR in supportedCompositeAlpha

When querying for surface capabilities via vkGetPhysicalDeviceSurfaceCapabilitiesKHR, the supportedCompositeAlpha field always reports opaque-only even if a 32-bit visual is used. Alpha surface support would be useful for Vulkan-rendered client-side window decorations and splash screens.

When an X surface is in use, the window surface alpha-composites despite the swapchain being created opaque.

When a Wayland surface is in use, the window appears opaque when swapchain is created that way, and PAL implements VK_COMPOSITE_ALPHA_PRE_MULTIPLIED_BIT_KHR for wayland surfaces. Unfortunately, this upsets the vulkan validation layers because they reconcile the swapchain create flags with the results from vkGetPhysicalDeviceSurfaceCapabilitiesKHR.

Documentation for guards using `LLPC_CLIENT_INTERFACE_MAJOR_VERSION`

LLPC_CLIENT_INTERFACE_MAJOR_VERSION helps us avoid build failures when we use the old XGL with the new LLPC.

1.

In my opinion, if a document briefly explained the rule, it would be helpful to understand the context.

2.

In addition, I am still confused how to quickly check the next LLPC_CLIENT_INTERFACE_MAJOR_VERSION number.
If there was a specific file somewhere contains the next number, it would be helpful.
Please let me know if I missed the already existing file for it.

cmake takes about 15 minutes to go from -- Configuring done to -- Generating done

cmake version 3.10.2 on archlinux.

  cmake -H. -Bbuilds/Release64 -DCMAKE_BUILD_TYPE=Release -DCMAKE_MODULE_PATH=$PWD/../pal/cmake/Modules -DXGL_PAL_PATH:PATH=$PWD/../pal \
    -DCMAKE_C_FLAGS="-DLINUX -D__x86_64__ -D__AMD64__" -DCMAKE_CXX_FLAGS="-DLINUX -D__x86_64__ -D__AMD64__" -DXGL_LLVM_SRC_PATH=$PWD/../llvm-amdvlk

I'm not sure what's happening there but this seems excessive.

shaderdb test Ext16bitStorage_TestTesOutput.tese hits infinite loop

When I run the shaderdb test Ext16bitStorage_TestTesOutput.tese, it hits an infinite loop in the DAG combiner during backend code generation.

GCC 7.2 vk_physical_device.h:138:9 compile error: ISO C++ forbids comparison between pointer and integer

/home/constantine/AMDVLK/pal/inc/util/palAssert.h:96:20: error: ISO C++ forbids comparison between pointer and integer [-fpermissive]
     if ((_expr) == false)                                                                         \
                    ^
/home/constantine/AMDVLK/pal/inc/util/palAssert.h:104:27: note: in expansion of macro ‘PAL_ASSERT_MSG’
 #define PAL_ASSERT(_expr) PAL_ASSERT_MSG(_expr, "%s", "Unknown")
                           ^~~~~~~~~~~~~~
/home/constantine/AMDVLK/xgl/icd/api/include/vk_utils.h:54:19: note: in expansion of macro ‘PAL_ASSERT’
 #define VK_ASSERT PAL_ASSERT
                   ^~~~~~~~~~
/home/constantine/AMDVLK/xgl/icd/api/include/vk_physical_device.h:138:9: note: in expansion of macro ‘VK_ASSERT’
         VK_ASSERT(pVkIndex);
         ^~~~~~~~~
make[2]: *** [icd/CMakeFiles/xgl.dir/api/app_profile.cpp.o] Error 1
make[1]: *** [icd/CMakeFiles/xgl.dir/all] Error 2
make: *** [all] Error 2

constantine@linux:~/AMDVLK$ gcc --version
gcc (Ubuntu 7.2.0-18ubuntu2) 7.2.0

pMaxMemoryAllocationSize change to 2G , why ? and it break the expectations.

Hi dear amd vulkan developer:
I notice that in recent release , DeviceMaintenance3Properties.MaxMemoryAllocationSize change from 4G to 2G, what's the rational behind it?

In most case , vulkan developer expect this value usually 4G , ( before this change , both nvidia and amd actually was. ) , but now it change to 2G, it will bring some inconvenience and downgrade some app which need large memory.

Thanks if any hint for this change.

Remove unnecessary null pointer checks

An extra null pointer check is not needed in functions like the following.

VK_EXT_vertex_attribute_divisor support

This extension will be required by https://github.com/doitsujin/dxvk to efficiently support instancing step rates. It's already exposed by latest RADV and nvidia drivers.

Build error: no matching function for call to 'CloneModule(llvm::Module&)'

I'm getting the following build error with 03a38de and gcc 7.3.1 20180303 (Red Hat 7.3.1-5)

[1579/1909] Building CXX object llpc/CMakeFiles/llpc.dir/context/llpcContext.cpp.o
FAILED: llpc/CMakeFiles/llpc.dir/context/llpcContext.cpp.o 
/usr/bin/c++  -DICD_VULKAN_1_1 -DLITTLEENDIAN_CPU -D_SPIRV_LLVM_API -I../icd/api/llpc/include -I../icd/api/llpc/context -I../icd/api/llpc/imported/metrohash/src -I../icd/api/llpc/lower -I../icd/api/llpc/patch -I../icd/api/llpc/patch/gfx6/chip -I../icd/api/llpc/patch/gfx9/chip -I../icd/api/llpc/patch/gfx9 -I../icd/api/llpc/patch/generate -I../icd/api/llpc/translator -I../icd/api/llpc/translator/libSPIRV -I../icd/api/llpc/translator/Mangler -I../icd/api/llpc/util -I../../pal/src/core/hw/gfxip/gfx6/chip -I../../pal/src/core/hw/gfxip/gfx9/chip -I../../pal/inc/core -I../../pal/inc/util -I/builddir/build/BUILD/amdvlk-vulkan-driver-2.19/llvm/include -Illvm/include -I../icd/api/llpc/../include/khronos -O2 -g -DNDEBUG   -Wno-unused-parameter -Wno-type-limits -Wno-switch -Wno-parentheses -Wno-maybe-uninitialized -Wno-delete-non-virtual-dtor -Wno-sign-compare -Wno-delete-incomplete -Wno-unused -Wno-ignored-qualifiers -Wno-missing-field-initializers -Wno-invalid-offsetof -fno-strict-aliasing -std=c++0x -fno-rtti -fPIC -MD -MT llpc/CMakeFiles/llpc.dir/context/llpcContext.cpp.o -MF llpc/CMakeFiles/llpc.dir/context/llpcContext.cpp.o.d -o llpc/CMakeFiles/llpc.dir/context/llpcContext.cpp.o -c ../icd/api/llpc/context/llpcContext.cpp
../icd/api/llpc/context/llpcContext.cpp: In constructor 'Llpc::Context::Context(Llpc::GfxIpVersion)':
../icd/api/llpc/context/llpcContext.cpp:159:63: error: no matching function for call to 'CloneModule(llvm::Module&)'
         m_pNativeGlslEmuLib = CloneModule(*m_pGlslEmuLib.get());
                                                               ^
In file included from ../icd/api/llpc/context/llpcContext.cpp:47:0:
/builddir/build/BUILD/amdvlk-vulkan-driver-2.19/llvm/include/llvm/Transforms/Utils/Cloning.h:52:25: note: candidate: std::unique_ptr<llvm::Module> llvm::CloneModule(const llvm::Module*)
 std::unique_ptr<Module> CloneModule(const Module *M);
                         ^~~~~~~~~~~
/builddir/build/BUILD/amdvlk-vulkan-driver-2.19/llvm/include/llvm/Transforms/Utils/Cloning.h:52:25: note:   no known conversion for argument 1 from 'llvm::Module' to 'const llvm::Module*'
/builddir/build/BUILD/amdvlk-vulkan-driver-2.19/llvm/include/llvm/Transforms/Utils/Cloning.h:53:25: note: candidate: std::unique_ptr<llvm::Module> llvm::CloneModule(const llvm::Module*, llvm::ValueToValueMapTy&)
 std::unique_ptr<Module> CloneModule(const Module *M, ValueToValueMapTy &VMap);
                         ^~~~~~~~~~~
/builddir/build/BUILD/amdvlk-vulkan-driver-2.19/llvm/include/llvm/Transforms/Utils/Cloning.h:53:25: note:   candidate expects 2 arguments, 1 provided
/builddir/build/BUILD/amdvlk-vulkan-driver-2.19/llvm/include/llvm/Transforms/Utils/Cloning.h:60:1: note: candidate: std::unique_ptr<llvm::Module> llvm::CloneModule(const llvm::Module*, llvm::ValueToValueMapTy&, llvm::function_ref<bool(const llvm::GlobalValue*)>)
 CloneModule(const Module *M, ValueToValueMapTy &VMap,
 ^~~~~~~~~~~
/builddir/build/BUILD/amdvlk-vulkan-driver-2.19/llvm/include/llvm/Transforms/Utils/Cloning.h:60:1: note:   candidate expects 3 arguments, 1 provided

XGL cache creator tool

Background

There has been a lot of work in LLPC over the past 9 months to implement relocatable shaders. These were intended to provide a way to compile shaders “offline”, that is without running a vulkan application. However for that to be useful, there must be a way for a vulkan application to make use of the precompiled shaders.

To this end, we want to write a tool that will take the precompiled shaders and build a file whose contents can be passed as the initial data to vkCreatePipelineCache. This tool should live in the XGL repository because XGL controls the format of the cache when it is serialized by vkGetPipelineCacheData.

This will be a standalone tool that will have its own subdirectory in the tools directory.

Implementation details

Prerequisites

Game developers must be able to use amdllpc to compile shaders and get an elf file.
The elf file will contain the cache hash for the shader/pipeline in the PAL metadata.
- The PAL metadata already contains the internal pipeline hash, but that seems to be compacted to 64-bits before it is assigned. Could this be expanded to 128-bits?
- This entry is only added during pipeline finalization, so the PAL metadata for a relocatable shader does not currently contain it.
- If we decide we only want this to work for relocatable shaders, then we could add something specific to relocatable shaders, but I would like something more general.

XGL cache creator

Command line interface

xlg_cache_creator [options] <input elf files>

Options:

-o <filename>          - The filename to output the cache data to. 
                         Required.
-device_id=<device id> - The device id of the device this cache will be used on.
                         The device id can be found at
                         https://devicehunt.com/view/type/pci/vendor/1002.
                         If this option is not provided, the device id will be
                         queried from the runtime.
-uuid=<uuid>           - The uuid for the specific driver and machine.
                         <How can the uuid be found?>
                         If this option is not provided, the device id will be
                         queried from the runtime.

Algorithm

Open the output file and set the position past the header size
Initialize the key platform using the uuid.
For each input elf file
- Open the file, and copy the contents to the output buffer.
- Add the contents of the file to the hash context
  - I would like to avoid having everything in memory at the same time.
Output the PipelineBinaryCachePrivateHeader using the standard malloc and free as the allocators.
Output the header generated by vkGetPipelineCacheData

Task list

Modify the internal pipeline hash to be a 128-bit value for the cache hash. (Cannot do)
Modify LLPC to emit a new elf section llpc_cache_hash contianing the 128-bit hash for the ELF file being generated.
Modify the unlinked shader path in LLPC to add the internal pipeline hash to the metadata.
Refactor PhysicalDevice::InitializePlatformKey so:
- The UUID is passed in as a parameter and used in place of the device properties.
- Allocation functions are passed as parameters so the vk instance is not needed.
- The time stamp is not used since the UUID is already the result of hashing the time stamp.
- The platform key that is created is returned.
- Make it available to the cache creator tool without needing a PhysicalDevice.
Refactor CalculateHashId:
- Replace the pInstance parameter with the allocation and deallocation functions, so that an Instance is not needed.
- Make it available to the cache creator tool.
Refactor vkGetPipelineCacheData code that writes the header into a function (WriteVkCacheHeader?) the cache creator tool can call.
Write the cache creator tool:
- Uses the new InitializePlatformKey, CalculateHashId, and WriteVkCacheHeader.
- Uses the ElfReader to read the elf and extract the PAL metadata.
- Uses MsgPackReader to read the PAL metadata to get the hash.

Cache info test reads unaligned memory

The Undefined Behavior sanitizer reports an uninitialized read in a CacheInfo unit test:

Script:
--
/vulkandriver/builds/ci-build/tools/test/cache-creator/unittests/./CacheCreatorUnitTests --gtest_filter=CacheInfoTest.ValidBlobOneEntry
--
Note: Google Test filter = CacheInfoTest.ValidBlobOneEntry
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from CacheInfoTest
[ RUN      ] CacheInfoTest.ValidBlobOneEntry
/vulkandriver/drivers/xgl/tools/cache_creator/unittests/cache_info_tests.cpp:160:38: runtime error: constructor call on misaligned address 0x608000000264 for type 'vk::BinaryCacheEntry', which requires 8 byte alignment
0x608000000264: note: pointer points here
  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /vulkandriver/drivers/xgl/tools/cache_creator/unittests/cache_info_tests.cpp:160:38 in

For the current code to work, cache blob entry sizes need to be multiples of alignof(vk::BinaryCacheEntry). We should either make sure that this is the case or copy raw bytes to aligned memory before reading.

Switchable graphics layer filters out cards AMDVLK dropped support for

AMDVLK dropped support for pre-Polaris at some point.

The switchable graphics layer shouldn't filter out these cards because they're still supported in RADV.

Right now, if a user installs AMDVLK by accident, or upgrades AMDVLK and finds their card no longer supported, they can no longer use Vulkan on their system.

https://gitlab.freedesktop.org/mesa/mesa/-/issues/5196#note_1023202

Pipeline compilation failure When running ManiaPlanet under Wine

When running ManiaPlanet under Wine with DXVK and amdvlk 9e6992f, I'm getting some pipeline compilations failures.

SPIR-V dumps are available here: http://dl.free.fr/bWMhdEP08

vkCreateRenderPass2 crash with null pDepthStencilResolveAttachment

xgl/icd/api/vk_render_pass.cpp

Lines 578 to 579 in 6288905

 depthStencilResolveAttachment.Init( 

 *(pExtInfo->pDepthStencilResolveAttachment));

This can crash, as it may dereference a NULL pointer:

https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkSubpassDescriptionDepthStencilResolveKHR.html

pDepthStencilResolveAttachment is NULL or a pointer to a VkAttachmentReference2 structure defining the depth/stencil resolve attachment for this subpass and its layout.

If pDepthStencilResolveAttachment is NULL, or if its attachment index is VK_ATTACHMENT_UNUSED, it indicates that no depth/stencil resolve attachment will be used in the subpass.

And on ANGLE, we were relying on drivers accepting that pDepthStencilResolveAttachment can be NULL. The crash was noticed on Windows with Adrenalin version 23.5.2, but I suspect the open source amdvlk driver is using some of the same code.

cache_creator build failed since https://github.com/GPUOpen-Drivers/llpc/actions/runs/4005730865

reserved identifier violation

I would like to point out that identifiers like “__COLOR_SPACE_HELPER_H__” and “__VK_CMDBUFFER_H__” do not fit to the expected naming convention of the C++ language standard.
Would you like to adjust your selection for unique names?

CTS random fail with dEQP-VK.pipeline.creation_feedback.graphics_tests.vertex_stage_fragment_stage_delayed_destroy

dEQP-VK.pipeline.creation_feedback.graphics_tests.vertex_stage_fragment_stage_delayed_destroy

seems to be related with pull request #68 , but not quite sure.

Making cache creator usable with the open-source AMDVLK

Objective

In addition to making cache creator usable for our internal partners/developers, we want to reach an agreement on how to make the open source XGL cache creator implementation usable. Open-source users/developers should be able to compile shaders offline and bundle them as a valid pipeline binary cache file on a machine without the target GPU, and then be able to load it on a machine with a compatible AMDVLK/AMDGPU installation.

We propose to change how hashes included in the pipeline cache blobs are calculated and a new tool, cache-info, to allow this cache creation flow.

Background

Our prototype cache-creator implementation is available on a public XGL fork. The intended offline relocatable shader cache creation flow looks as follows:

Compile shaders into .spv files.

Compile all .spv files offline with amdllpc into relocatable shader elfs:

amdllpc -gfxip=9.0.0 -unlinked -enable-relocatable-shader-elf shader.spv -o ./elfs/shader.elf

Bundle all elf files into a pipeline binary cache blob file with the cache-creator tools:

tools/cache-creator --device-id=0x6860 --uuid=6d4570f9-78b4-ef2f-fb61-f46919af88b4 -o cache.bin ./elfs/*.elf

Run the vulkan application and provide the pipeline cache file as a blob with vkCreatePipelineCache.

Before loading the cache contents, the ICD performs a series of checks that make sure that the cache blob: (1) is not corrupted, (2) matches the host driver/GPU. Both of these are performed based on the cache headers:

(1) vk::PipelineCacheHeaderData::headerLength and vk::PipelineBinaryCachePrivateHeader::hashId.
(2) vk::PipelineCacheHeaderData::{headerVersion,vendorID,deviceID,UUID} and vk::PipelineBinaryCachePrivateHeader::hashId.

With AMDVLK, headerVersion and vendorID are known constants, while deviceID is known on a per-GPU-model basis. UUID and hashId are calculated based on the runtime host system configuration and include bits derived from build timestamps. Some unnecessary runtime system information leaking into hashId makes it difficult to build caches offline on a non-target system, which affects both usability and testability of the cache-creator tool.

Pipeline binary cache format changes for offline relocatable cache creation

We want to be able to run the cache-creator tool on machines without the target GPU. This will also allow us to set up testing infrastructure on any VM without AMD GPU. As in the original cache-creator tool proposal, we want to be able to specify all the necessary hardware/system information through command line arguments.

Device ID (--device-id) and pipeline cache UUID (--uuid) will be extracted from a valid pipeline cache file created on the target machine.
a. We propose to add a new tool, cache-info, that given an existing pipeline cache file outputs its header information and (optionally) summary of the pipeline cache entries. You can find a prototype implementation here. A simple Vulkan application to create a pipeline cache on the target machine and save it to a file (prototype here), while the rest of the pipeline creation can run on any other machine. This can be further simplified, see the bottom of the section for details.
b. We separately propose to drop some runtime properties from pipeline cache UUID computation, namely: CPU and memory system information, as they don't affect pipeline compatibility.

A sample cache-info tool invocation looks like this:

$ tools/cache-info cache.bin --elf-source-dir=./elfs
Read: cache.bin, 407588 B                                                                                                           
                                                                                               
=== Vulkan Pipeline Cache Header ===                                                                                                                                                          
header length:          32                                                                     
header version:         1                                                                                                                                                                     
vendor ID:              0x1002                                                                 
device ID:              0x6860                                                                                                                                                                
pipeline cache UUID:    e333659f-a1b8-bafe-f19b-04dc792f9e99                                   
                                                                                                                                                                                              
=== Pipeline Binary Cache Private Header ===                                                   
header length:  20                                                                                                                                                                            
hash ID:        f1a233ff dd873311 0004432 ffff0ddf 2a303455                                   
                                                                                                                                                                                              
=== Cache Blob Info ===                                                                        
content size:   407536                                                                                                                                                                        
                                                                                               
        *** Entry 0 ***                                                                                                                                                                       
        hash ID:        4198f242 044ad73b 02283345 4845466c                                    
        data size:      2144                                                                                                                                                                  
        MD5 sum:        940c7b25c14d997209f7726611c73c1c                                       
        source elf:     ./elfs/attachmentread.frag.elf                                                                                     
                                                                                               
        *** Entry 1 ***                                                                                                                                                                       
        hash ID:        67af7c86 06930f37 7bcf4d76 57426594                                    
        data size:      1552                                                                                                                                                                  
        MD5 sum:        1c4d73ed2b8ee5535e876940cd710fb6                                       
        source elf:     ./elfs/attachmentread.vert.elf

Note that the second argument, --elf-source-dir, is optional and only used for testing/debugging.

vk::PipelineBinaryCachePrivateHeader::hashId is calculated with a hash algorithm and initial data provided by Util::IPlatformKey which includes full VkPhysicalDeviceProperties struct (including pipeline cache UUID). We cannot read this data off a pipeline cache file, because it blends in both the system information and pipeline cache data. Some possible solutions are:
a. Include the missing physical device properties (if any) in the UUID itself and use it as the initial data to calculate hashId.
b. Make it possible to serialize IPlatformKey and save it in a new field in PipelineBinaryCachePrivateHeader. This way we can read it off existing pipeline binary cache files and pass as a new argument to the cache-creator tool.

The changes in 2. could be conditionally enabled only in the relocatable compilation mode if you don't want to change the behavior in regular online full-pipeline compilation. We would prefer the first option (2a) and would like to understand which physical properties and system info fields actually affect cache compatibility and use only those.

If 2a sounds like a good way forward, we could further simplify the cache creation flow by extending the vulkaninfo tool to print pipeline cache UUID, so that we can take IDs from its output instead of having to create a cache file with some Vulkan application and read those values with cache-info.

Testing

Cache creator unit tests in XGL

We propose to use a unit testing framework in XGL, e.g., Doctest or Google Test. You can find the prototype unit test implementation with Doctest here. The main difference between the two frameworks is that Doctest consists of a single header file, which makes it a very small dependency easy to build with any build system. Concretely, unit tests will be a new cmake executable target and will execute on any build machine, i.e., a GPU won't be required.

Unit tests will be only built when the rest of cache creator is enabled with -DXGL_ENABLE_CACHE_CREATOR=ON, but could further hide them behind an additional cmake option. We will run these unit tests in the public LLPC CI and could also make a similar CI setup for XGL based on GiHub Actions easily.

LIT tests

We further propose to add offline end-to-end tests based on the LIT testing infrastructure. LIT is used in LLPC to run the shaderdb tests. The testing flow will look like follows:

Compile a few SPIR-V files with amdllpc in the relocatable compilation mode.
Create a pipeline binary cache files based on the elf files with cache-creator with fake deviceID and UUID.
Run cache-info to check if the cache header is fine and the cache contents match the elf files from 1.

Note that this doesn't require a GPU, so these tests can be placed either in XGL or LLPC's shaderdb.

End-to-end cache hit tests

We don't have a full proposal for full GPU tests that would check if we are getting expected cache hits. These would have to run under Jenkins, as public VMs don't come with AMD GPUs.

One important change that we propose is to add an option to disable the ArchiveLayer in PipelineBinaryCache, so that we don't get unintended cache hits based on system-level persistent cache files.

Summary

We propose to make cache-creator usable and testable by using a flow, where cache header IDs can be read from valid pipeline cache files using a new tool cache-info and then provided to the cache-creator tool. To make it possible, we need to modify how the cache header field hashId is handled.

We propose to introduce two levels of public tests: unit tests in XGL and LIT tests in XGL or LLPC, which won't require a GPU to run. To make it easier to test cache hits with Vulkan applications, we propose to add an option to disable the ArchiveLayer in PipelineBinaryCache.

The prototype implementation that allows the proposed offline flows and unit tests are available at https://github.com/kuhar/xgl/tree/cct/tools/cache_creator.

vkGetPhysicalDeviceXlibPresentationSupportKHR returns false when presentation is supported

This is as of the latest build on Github. Fedora 28

vkResetCommandPool slower than recreating, regardless of TRANSIENT_BIT

Hi, I hope this is the right repository,

I'm testing out a pattern where instead of using RESET_COMMAND_BUFFER_BIT and freeing individual command buffers, I reset the entire pool. I currently record 1 command buffer with 1 vkCmdFillBuffer, 1 barrier, then bind things and do 30 vkCmdDispatches with it. On the next frame, I make sure to synchronize and make sure the execution of that CB finishes, then I tried resetting the entire pool. Without the transient flag, it takes around 13ms, while with the transient flag enabled, it's still 4ms. So I ended up destroying it instead and recreating a brand new command pool to replace it, which is pretty much free by comparison, doesn't even register when profiling.

Is that a known performance cliff?

sample mask input to fs shouldn't force persample execution

From my reading of the spec, using gl_SampleId or gl_SamplePosition should force per sample execution, however the spec doesn't seem to mention the gl_SampleMask input causing per sample.

AMDVLK sets m_pResUsage->builtInUsage.fs.runAtSampleRate for gl_SampleMask.

Bugs in CmdBuffer::BindTransformFeedbackBuffers()

At vk_cmdbuffer.cpp:5382 I believe the memset() should go into the preceding conditional.
The Vulkan spec says that pSizes=NULL is a valid argument (to indicate whole buffer). But if pSizes is NULL we'll segfault at line 5401.
I think the for loop at line 5392 is wrong. It think it should be for (i = 0; i < bindingCount; i++). Then, inside the loop, the m_pTransformFeedbackState->params.target[i] array should be indexed with firstBinding + i instead. Also the 1 << i expression should then be 1 << (firstBinding + i).

LLVM error when using both OpImageSampleDref* and OpImageSample* on the same image

Sampling from the same image with both an ordinary texture sampler and a depth-compare sampler causes the shader compiler to fail with the following error:

LLVM ERROR: Error: Attempt to redefine function: 
; Function Attrs: nounwind
declare spir_func %spirv.SampledImage.float_1_1_0_0_1_0_0 addrspace(1)* @_Z12SampledImagePU3AS140__spirv_SampledImage_float_1_0_0_0_1_0_0i(%spirv.SampledImage.float_1_0_0_0_1_0_0 addrspace(1)*, i32) #0
 => %spirv.SampledImage.float_1_0_0_0_1_0_0 addrspace(1)* (%spirv.SampledImage.float_1_0_0_0_1_0_0 addrspace(1)*, i32)

An example shader which triggers the issue can be found here (both disassembled SPIR-V and GLSL):
https://gist.github.com/doitsujin/bfc03c0a432c441426f10f0493c1d4f7

Changing the SPIR-V shader so that the same OpTypeSampledImage is used for both operations fixes the error, although it obviously leads to incorrect results.

Correctness of DefaultReallocFunc

I think the implementation of DefaultReallocFunc (

xgl/icd/api/vk_alloccb.cpp

Line 75 in 63761ec

DefaultReallocFunc(

) may be running into two issues:

Out of bounds accesses to the source buffer when size is more than the size of the original allocation. This is because it copies up to the size of the destination buffer, which will typically be more than the original buffer.
Doesn't free when size is 0.

Vulkan specs: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/PFN_vkReallocationFunction.html.

Verification error using R32ui image format

ERROR: Fails to verify module (llpc-pass-external-lib-link): Invalid bitcast
  %1 = bitcast i32 %texel to <4 x float>

Shader generated from HLSL through DXC: test.spvas.txt

The image format is R32ui, but all the library functions for image writes seem to expect four component vectors (from g_glslImageOpEmu.ll):

define void @llpc.image.write.u32.Buffer(i32 %resourceDescSet, i32 %resourceBinding, i32 %resourceIdx ,i32 %coord ,<4 x i32> %texel ,i1 %glc ,i1 %slc ,i32 %imageCallMeta) #1
{
    %resource = call <4 x i32> @llpc.descriptor.load.texelbuffer(i32 %resourceDescSet, i32 %resourceBinding, i32 %resourceIdx)
    %1 = bitcast <4 x i32> %texel to <4 x float>
    call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %1, <4 x i32> %resource, i32 %coord, i32 0, i1 %glc, i1 %slc)
    ret void 
}

That image write is invoked in the shader as:

  call void @llpc.image.write.u32.Buffer(i32 4, i32 69, i32 0, i32 %6, i32 64, i1 false, i1 false, i32 326) #0

The texel operand has a mismatched type.

Mismatch of OpName on entry point and OpEntryPoint name causes crash

When compiling compile a shader where the OpEntryPoint declares one string - matching the VkPipelineShaderStageCreateInfo name, but the entry point has an OpName of a different string, the pipeline compile crashes.

The code that identifies the SPIR-V function that is the entry point we want is here in SPIRVToLLVM::translate.

  // Find the targeted entry-point in this translation
  for (unsigned I = 0, E = BM->getNumFunctions(); I != E; ++I) {
    auto BF = BM->getFunction(I);
    ExecutionModel ExecModel = ExecutionModelMax;
    bool IsEntry = BM->isEntryPoint(BF->getId(), &ExecModel);
    if (IsEntry && ExecModel == EntryExecModel && BF->getName() == EntryName) {
      EntryTarget = BF;
      break;
    }
  }

However getName() returns the OpName - it should be checking only against the declared entry point name. Consider that even if there isn't a mismatch, there could be an entry point declared earlier with a different "entry point name", but with the OpName we are searching for - OpName can be anything including duplicated.

Note that if there is no OpName at all for the entry point ID, then everything works for me.

Minimal repro compute shader SPIR-V that will crash when passed to vkCreateComputePipelines with appropriate boiler-plate pointing to entry point "rdc":

               OpCapability Shader
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %not_rdc "rdc"
               OpExecutionMode %not_rdc LocalSize 128 1 1
               OpName %main "main"
               OpName %not_rdc "not_rdc"
       %void = OpTypeVoid
          %4 = OpTypeFunction %void
       %main = OpFunction %void None %4
          %5 = OpLabel
               OpReturn
               OpFunctionEnd
    %not_rdc = OpFunction %void None %4
          %6 = OpLabel
          %7 = OpFunctionCall %void %main
               OpReturn
               OpFunctionEnd

If you remove the OpName %1(not_rdc) "not_rdc" it will work OK.

FYI I ran into this on amdvlk as built from latest of today (GPUOpen-Drivers/pal@91e30f1, 3e2d125, GPUOpen-Drivers/llvm@eb5eb1c).

vkQueueSubmit2 does not respect VkCommandBufferSubmitInfo::deviceMask

The deviceMask component of VkCommandBufferSubmitInfo isn't read by Queue::Submit, only the device mask from VkDeviceGroupSubmitInfo for vkQueueSubmit is read from. The submit instead behaves as if every device mask is 1 for only the primary/first GPU. This includes a devicemask of zero, which for vkQueueSubmit2 means execute on all devices, unlike a lack of VkDeviceGroupSubmitInfo meaning primary only.

Bugs in CmdBuffer::BeginTransformFeedback() and CmdBuffer::EndTranformFeedback()

According to the spec, "...If pCounterBufferOffsets is NULL, then it is assumed the offsets are zero." So NULL pCounterBufferOffsets should be legal.

However, in both BeginTransformFeedback() and EndTransformFeedback() function, pCounterBufferOffsets is accessed without checking for NULL, which causes segfault when the application passes NULL to both functions.

Settings folder should be configurable

In some situations, the settings folder that XGL reads its settings files from (/etc/amd) is not ideal. For instance, during development or testing when those settings should be changed, modifying files in /etc/amd would require root access. The folder used to read the settings files from should instead be made configurable (e.g., via an environment variable).

vkQueueSubmit/vkQueueSubmit2 returns VK_TIMEOUT when non-primary GPU is asked to wait on non-signaled timeline semaphore.

When using a timeline semaphore that will be signaled from the same thread after submit this line in vk_queue.cpp (and potentially others, but thats where i found it to be happening when debugging) can have a PAL timeout returned, causing the function to early out and return the invalid for vkQueueSubmit/vkQueueSubmit2 return code of VK_TIMEOUT. The timeout does also cause other spec compliance issues, but they appear to all be the result of the underlying timeout.

Does require a signal operation in the same submit, but only presents itself when the wait semaphore is not yet signaled. Presence or lack of command buffer submit does not appear to affect the result

Produced using a device group of two W5700 GPUs on Ubuntu 22.04 through vkQueueSubmit2, with a timeline semaphore wait executed on device index 1 for a semaphore that will be signaled after submit. The same operation with device index 0 completes as expected after signaling on the host. When used in a single device device group, both physical devices behave correctly.

If needed i can provide code that reproduces the issue on my machine.

Write immutable sampler to the descriptor sets

When compiling a shader using reloctable shaders in LLPC, the shader compiler does not have access to the value for the immutable sampler. In fact, the compiler does not even know that a sampler will be an immutable sampler. This means that if a pipeline has an immutable sampler, then LLPC will have to fall back to whole pipeline compilation.

I would like to fix this by having XGL write the value of the immutable sampler to its descriptor so. This way it will look like a regular sampler to the shader. Let me know if there are any objections to this. I can work on the implementation if we decide to move forward with it.

Writing out the value can be conditional on the EnableRelocatableShaders setting if you want.

Incorrect assertion for `VkRenderPassAttachmentBeginInfo` with a non-imageless `VkFramebuffer`

When running my application with a debug build of AMDVLK, I get an assertion here:

xgl/icd/api/vk_cmdbuffer.cpp

Line 7911 in 0387177

VK_ASSERT(pRenderPassAttachmentBeginInfo->attachmentCount == attachmentCount);

vkCmdBeginRenderPass is being called with a normal non-imageless VkFramebuffer, and the VkRenderPassBeginInfo chain contains a VkRenderPassAttachmentBeginInfo structure with its attachmentCount == 0.
This is explicitly permitted by the spec for non-imageless framebuffers here:

If framebuffer was created with a VkFramebufferCreateInfo::flags value that did not include VK_FRAMEBUFFER_CREATE_IMAGELESS_BIT, and the pNext chain includes a VkRenderPassAttachmentBeginInfo structure, its attachmentCount must be zero.

It looks like the following call to SetImageViews(pRenderPassAttachmentBeginInfo) doesn't actually do anything when attachmentCount == 0, so this is probably just an incorrect assertion.

Invalid unit in VkPipelineCreationFeedbackEXT::duration on Windows

Hi, we've encountered that vkCreateGraphicsPipelines/vkCreateComputePipelines return in VkPipelineCreationFeedbackEXT::duration number of cpu ticks instead of nanoseconds on Windows. By specification, VkPipelineCreationFeedbackEXT::duration is the duration spent creating a pipeline or pipeline stage in nanoseconds.

Cleanup of CreateMemoryPoolAndSubAllocate does not handle OOM properly

The Vulkan CTS has a number of OOM tests that caused double-free errors in InternalMmeMgr. CreateMemoryPoolAndSubAllocate tries to access uninitialized pointers and tries to free them.

Following snippet of code gets executed when the pool creation fails, here pInternalMemory would be uninitialized on failure.

else
    {
        auto it = pOwnerList->Begin();
        bool needEraseFromOwnerList = pOwnerList->NumElements() > 0 ?
            (it.Get()->groupMemory.PalMemory(DefaultDeviceIndex) ==
             pInternalMemory->groupMemory.PalMemory(DefaultDeviceIndex)) : false;

        // Unmap any persistently mapped memory
        pInternalMemory->groupMemory.Unmap();

        // Destroy the buddy allocator
        if (pInternalMemory->pBuddyAllocator != nullptr)
        {
            PAL_DELETE(pInternalMemory->pBuddyAllocator, m_pSysMemAllocator);
        }

        // Release this pool's base allocation
        FreeBaseGpuMem(pInternalMemory);

        // Remove this memory pool from the list if we added it
        if (needEraseFromOwnerList)
        {
            pOwnerList->Erase(&it);
        }
    }

Exposing subgroup ops for unsupported shader stages

The Vulkan spec states:

supportedStages is a bitfield of VkShaderStageFlagBits describing the shader stages that group
operations with subgroup scope are supported in. supportedStages will have the
VK_SHADER_STAGE_COMPUTE_BIT bit set if any of the physical device’s
queues support VK_QUEUE_COMPUTE_BIT.

This is exposed for a wide variety of unsupported ray-tracing related shader stages

                            VK_SHADER_STAGE_RAYGEN_BIT_KHR
                            VK_SHADER_STAGE_CLOSEST_HIT_BIT_KHR
                            VK_SHADER_STAGE_MISS_BIT_KHR
                            VK_SHADER_STAGE_INTERSECTION_BIT_KHR
                            VK_SHADER_STAGE_CALLABLE_BIT_KHR

despite these shader stages not being supported at all.

xgl/icd/api/vk_physical_device.cpp

Line 4168 in e5c5ad4

VK_SHADER_STAGE_RAYGEN_BIT_KHR |

Therefore, these shader stages do not support these subgroup operations and they should not be exposed here.

No Support for VK_EXT_graphics_pipeline_library on windows

The VK_EXT_graphics_pipeline_library extension has been around in Nvidia GPUs for a while and also in AMD RADV but not in AMD Windows yet. it's a very important extension as it completely fixes the stutters while loading shaders in some games and also very important for the use of dxvk on windows.

please let us know if any updates towards supporting this extension is going to happen soon

DOOM and Wolfenstein II not working

When running DOOM or Wolfenstein II under Wine, I'm getting a black screen and the game freezes there. I haven't been able to run those two games for at least a month now. It works fine using RADV.

Using Ryzen 1700X / R9 Fury.

amdvlk shouldn't enable NoSignedZeros option

From the spec:

Precision and Operation of SPIR-V Instructions

The following rules apply to both single and double-precision floating point instructions:

Positive and negative infinities and positive and negative zeros are generated as dictated by IEEE 754, but subject to the precisions allowed in the following table.

We've faced the same issue in radv, and the spec seems quite clear on it.

	depthStencilResolveAttachment.Init(
	*(pExtInfo->pDepthStencilResolveAttachment));

gpuopen-drivers / xgl Goto Github PK

xgl's People

Contributors

Stargazers

Watchers

Forkers

xgl's Issues

1.

2.

Background

Implementation details

Prerequisites

XGL cache creator

Command line interface

Algorithm

Task list

Objective

Background

Pipeline binary cache format changes for offline relocatable cache creation

Testing

Cache creator unit tests in XGL

LIT tests

End-to-end cache hit tests

Summary

Recommend Projects

Recommend Topics

Recommend Org