nvidiagameworks / nri Goto Github PK

View Code? Open in Web Editor NEW

150.0 150.0 24.0 1.29 MB

License: MIT License

Batchfile 0.06% CMake 0.86% C++ 91.30% C 7.11% Shell 0.05% HLSL 0.61%

nri's People

Contributors

Stargazers

Watchers

nri's Issues

Bug: X11 and Wayland should be treated mutually exclusive during swapchain creation

NRI/Source/VK/SwapChainVK.cpp

Line 103 in 919412b

#ifdef VK_USE_PLATFORM_XLIB_KHR

This code is broken, if both, X11 and Wayland are available on a system.
NRWindow is a union, so writing to it via

        m_NRIWindow.x11.dpy = glfwGetX11Display();
        m_NRIWindow.x11.window = glfwGetX11Window(m_Window);

Will make the code in SwapChainVK::Create() think, that we provided both, a X11 and a Wayland window handle. The code will then decide to create a X11 surface object first, followed by overwriting the surface handle by creating yet another Wayland surface object with the same X11 window handle. On NVIDIA drivers this will later lead vkCreateSwapchainKHR() to crash deep in the callstack.

I think, making NRWindow not be a union and instead use #ifdef to control the platform-dependent members would be the right call.

[RFE] Add support for `D3D12_QUERY_HEAP_TYPE_COPY_QUEUE_TIMESTAMP`

It looks like a new query type is needed, because VK_QUERY_TYPE_TIMESTAMP can be used in any queue.

Assignment order does not match struct member order

NRI/Source/VK/PipelineVK.cpp

Line 366 in 64120b7

stream_desc.bindingSlot,

The struct member order is :

uint32_t stride;
uint16_t bindingSlot;
VertexStreamStepRate stepRate;

However, the stride and bindingSlot are flipped during assignment:

           stream_desc.bindingSlot,
           stream_desc.stride,
           (stream_desc.stepRate == VertexStreamStepRate::PER_VERTEX) ? VK_VERTEX_INPUT_RATE_VERTEX : VK_VERTEX_INPUT_RATE_INSTANCE

ReBAR and sparse texture support

I was thinking about adding support for Resizeable BAR, but it is only supported in Vulkan and D3D12. What do you think about creating an additional interface for sparse textures and ReBAR as an extension. I would implement one of them, but without an interface I have no idea how to implement them in NRI.

Maybe we need something like "resource extension"? We can easily combine both of technologies into one interface and maybe add streaming support and other features that are not presented on D3D11.

Additional Flags to NRI init to disable features

for my use case when I create the nri device would it be possible to extend NriDeviceCreationDesc with flags to enable extensions. would be preferable if the init code assumed the minimum set of extensions?

[RFE] Add multi-view

[RFE] Draw Indirect Count

Sometimes it's useful to have an indirect buffer count, so that only the required number of draw calls or shader executions are executed.

Links:
gpuweb/gpuweb#1949
https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/vkCmdDrawIndirectCount.html

Feature request: Add support for VK_KHR_dynamic_rendering

https://www.khronos.org/blog/streamlining-render-passes

Incorrect cast

NRI/Source/D3D12/CommandBufferD3D12.cpp

Line 520 in 171b583

 barrier.Aliasing.pResourceBefore = *((BufferD3D12*)aliasingBarriers->textures[i].before); 

This line and the one below should be casting to TextureD3D12*, not BufferD3D12*

[RFE] Shared resources via NT handle

Having any interop between GAPI and CUDA would be nice.

[RFE] Work graphs support

https://microsoft.github.io/DirectX-Specs/d3d/WorkGraphs.html

[RFE] Utilize "enhanced barriers", reach parity with VK

https://microsoft.github.io/DirectX-Specs/d3d/D3D12EnhancedBarriers.html

It should help to make D3D12 backend closer to VK (with legacy resource states there are some hidden discrepancies). According to the spec "resource transitions" are emulated using "enhanced barriers" API under the hood.

Unknown interface HelperInterface

When I started using version v1.120 I have this problem:
NRI::ERROR(Creation.cpp:88) - VK::NVIDIA GeForce 930MX - Unknown interface 'nri::HelperInterface'!

Question: Queue family selection

the way family indices are handled doesn't seem quite right.

This is my gpu it has two queue families:

Queue[0]: VK_QUEUE_GRAPHICS_BIT VK_QUEUE_COMPUTE_BIT VK_QUEUE_TRANSFER_BIT 
Queue[1]: VK_QUEUE_COMPUTE_BIT VK_QUEUE_TRANSFER_BIT

how would the selection process work? would queue 0 be available for graphics and compute and queue 1 be for copy?

void DeviceVK::FillFamilyIndices(bool useEnabledFamilyIndices, const uint32_t* enabledFamilyIndices, uint32_t familyIndexNum)
{
    uint32_t familyNum = 0;
    m_VK.GetPhysicalDeviceQueueFamilyProperties(m_PhysicalDevices.front(), &familyNum, nullptr);

    Vector<VkQueueFamilyProperties> familyProps(familyNum, GetStdAllocator());
    m_VK.GetPhysicalDeviceQueueFamilyProperties(m_PhysicalDevices.front(), &familyNum, familyProps.data());

    memset(m_FamilyIndices.data(), INVALID_FAMILY_INDEX, m_FamilyIndices.size() * sizeof(uint32_t));

    for (uint32_t i = 0; i < familyProps.size(); i++)
    {
        const VkQueueFlags mask = familyProps[i].queueFlags;
        const bool graphics = mask & VK_QUEUE_GRAPHICS_BIT;
        const bool compute = mask & VK_QUEUE_COMPUTE_BIT;
        const bool copy = mask & VK_QUEUE_TRANSFER_BIT;

        if (useEnabledFamilyIndices)
        {
            bool isFamilyEnabled = false;
            for (uint32_t j = 0; j < familyIndexNum && !isFamilyEnabled; j++)
                isFamilyEnabled = enabledFamilyIndices[j] == i;

            if (!isFamilyEnabled)
                continue;
        }

        if (graphics)
            m_FamilyIndices[(uint32_t)CommandQueueType::GRAPHICS] = i;
        else if (compute)
            m_FamilyIndices[(uint32_t)CommandQueueType::COMPUTE] = i;
        else if (copy)
            m_FamilyIndices[(uint32_t)CommandQueueType::COPY] = i;
    }
}

what is the intention for how a queue is selected for a family?

[RFE] Sparse texture support

texture format utility

There is a selection of formats but sometimes you want to know how many samples there are or the bit depth of the format. would by nice to have some utilities similar to tiny_imageformat?

https://github.com/DeanoC/tiny_imageformat/blob/master/include/tiny_imageformat/tinyimageformat.h

overlapping incompatibility with VK_IMAGE_CREATE_2D_ARRAY_COMPATIBLE_BIT | VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT

these rules are not compatible:

If flags contains VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT, imageType must be VK_IMAGE_TYPE_2D
If flags contains VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT, arrayLayers must be greater than or equal to 6

If flags contains VK_IMAGE_CREATE_2D_ARRAY_COMPATIBLE_BIT, imageType must be VK_IMAGE_TYPE_3D

if its a cubemap then it has to be a 2d image but if its an array then it must be a 3d image.

m_DescriptorSets needs to be cleared after deallocating the entries in Reset

NRI/Source/D3D12/DescriptorPoolD3D12.cpp

Line 136 in 171b583

Otherwise, you will do a double free when the destructor is called which tries to deallocate all items in the list.

Parameter type should be NRIVkSemaphore here

NRI/Source/VK/DeviceSemaphoreVK.cpp

Line 43 in 0f3c47b

Result DeviceSemaphoreVK::Create(void* vkFence)

[RFE] Pipeline library extension

On D3D12 and Vulkan there is a problem with the PSO build process. Sometimes the game can hang, and not only that - you, as a developer, cannot directly predict the pipeline build time. Also, on GDK you can't build PSO at runtime, only pre-compile. So I think this feature is most needed in NRI as an extension because it affects a lot of things. What do you think? Maybe we need some priorities for our tasks to manage them properly?

Unsupported depth bounds

The new 1.118 update broke support for some hardwares where depth bounds isn't supported.
Rather than checking if version is greater than 1, it should have been D3D12_FEATURE_DATA_D3D12_OPTIONS2::DepthBoundsTestSupported
This also raises the question of other versions check like sample positions.

NRI/Source/D3D12/CommandBufferD3D12.cpp

Lines 297 to 301 in 1d52b40

 inline void CommandBufferD3D12::SetDepthBounds(float boundsMin, float boundsMax) 

 { 

 if (m_Version >= 1) 

 m_GraphicsCommandList->OMSetDepthBounds(boundsMin, boundsMax); 

 }

Resource management improvements

So I was thinking about how to manage resources on GPU/CPU side, and I thought it may be useful to have an ability to Map/Unmap ranges on CPU bound buffers and upload multiple ranges for the same buffer (now you can only upload one range per buffer/texture). Also it might be useful for geometry update (you have two big vertex and index buffers and you only need to update some region of them at once).

Improved Metal / Apple Silicon Support

Hi, I have been taking this abstraction for a spin (thank you for making it), and in order for it to work on the M1 MacBook I am using, I needed to adjust Vulkan instance and device creation to correctly utilize the VK_KHR_portability_enumeration/subset extension.

[RFE] Add missing "indirect" commands

[RFE] Add shading rate

https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_fragment_shading_rate.html

Destructor not actually freeing allocated descriptor sets for DescriptorPoolVK

@dzhdanNV The code here seems to be incorrect:

https://github.com/NVIDIAGameWorks/NRI/blob/main/Source/VK/DescriptorPoolVK.cpp#L31-L32

It seems it should be attempting to free 0-m_UsedSets instead of m_UsedSets-m_AllocatedSets.size() (which is the same in some cases.)

This line seems wrong. Looks like geometry shader should be filled instead.

NRI/Source/D3D12/PipelineD3D12.cpp

Line 122 in 0d00b9d

FillShaderBytecode(stream.amplificationShader, shader);

Descriptor Sets managing

Recently ran into a problem with descriptor set management. I need to dynamically allocate and free sets in descriptor pool, but NRI does not have the necessary method to free descriptor sets. How do I need to manage these sets in this situation?

Add non shader visible descriptor pool

It would be very useful to have a cpy only descriptor for copying and storages.

[RFE] SwapChain: explicitly expose "waitable" (currently always on in D3D) and add VK support

VK: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_present_wait.html

clang-cl compile error

Hello, I am trying to compile NRI using clang-cl compiler. I modified 1-Deploy.bat to use clang-cl as C++ compiler

@echo off

git submodule update --init --recursive

mkdir "_Build"

cd "_Build"
cmake .. -A x64 -T ClangCL
cd ..

Next, compile using script 2-Build.bat

Msbuild will print many warnings and errors. Such as:

E:\cpp\NRI\Source\D3D12\BufferD3D12.cpp(74,13): error : pasting formed '"ID3D12Device::CreateCommittedResource()"" fail
ed, result = 0x%08X!"', an invalid preprocessing token [-Winvalid-token-paste] [E:\cpp\NRI\_Build\NRI_D3D12.vcxproj]
E:\cpp\NRI\Source\Shared\SharedExternal.h(40,86): message : expanded from macro 'RETURN_ON_BAD_HRESULT' [E:\cpp\NRI\_Bu
ild\NRI_D3D12.vcxproj]
E:\cpp\NRI\Source\D3D12\BufferD3D12.cpp(77,13): error : pasting formed '"ID3D12Device::CreatePlacedResource()"" failed,
 result = 0x%08X!"', an invalid preprocessing token [-Winvalid-token-paste] [E:\cpp\NRI\_Build\NRI_D3D12.vcxproj]
E:\cpp\NRI\Source\Shared\SharedExternal.h(40,86): message : expanded from macro 'RETURN_ON_BAD_HRESULT' [E:\cpp\NRI\_Bu
ild\NRI_D3D12.vcxproj]

It seems that the macro RETURN_ON_BAD_HRESULT not standard C++.

My environment is
OS: Windows 11 23H2 22631.3447
Visual Studio: 17.9.6
Installed C++ Clang compiler for Windows(17.0.3)

[RFE] Add stream out (transform feedback)

https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_transform_feedback.html

Wrong barriers for readback buffer

I was trying to run SceneViewer sample, but I got a validation error on D3D12 (without Agility SDK):

D3D12 ERROR: ID3D12CommandList::ResourceBarrier: Certain resources are restricted to certain D3D12_RESOURCE_STATES states, and cannot be changed. Resources on D3D12_HEAP_TYPE_READBACK heaps requires D3D12_RESOURCE_STATE_COPY_DEST or D3D12_RESOURCE_STATE_RESOLVE_DEST. Reserved buffers used exclusively for texture placement requires D3D12_RESOURCE_STATE_COMMON. [ RESOURCE_MANIPULATION ERROR #741: RESOURCE_BARRIER_INVALID_HEAP]

I've decided to remove barrier for readback buffer and now it's working.

        //nri::BufferBarrierDesc bufferBarrierDescs = {};
        //bufferBarrierDescs.buffer = m_Buffers[READBACK_BUFFER];
        //bufferBarrierDescs.before = {nri::AccessBits::UNKNOWN, nri::StageBits::COPY};
        //bufferBarrierDescs.after = {nri::AccessBits::COPY_DESTINATION, nri::StageBits::COPY};

        nri::BarrierGroupDesc barrierGroupDesc = {};
        barrierGroupDesc.textureNum = 1;
        barrierGroupDesc.textures = &textureBarrierDescs;
        //barrierGroupDesc.bufferNum = 1;
        //barrierGroupDesc.buffers = &bufferBarrierDescs;
        NRI.CmdBarrier(commandBuffer, barrierGroupDesc);
        
        // ...
        
        textureBarrierDescs.before = textureBarrierDescs.after;
        textureBarrierDescs.after = {nri::AccessBits::UNKNOWN, nri::Layout::PRESENT};

        //bufferBarrierDescs.before = bufferBarrierDescs.after;
        //bufferBarrierDescs.after = {nri::AccessBits::UNKNOWN, nri::StageBits::COPY};

        NRI.CmdBarrier(commandBuffer, barrierGroupDesc);

What do you think - is that a bug in NRI or in sample?

[RFE] Instancing problem in GPU-driven rendering

I'm trying to implement GPU-driven rendering with bindless descriptors and instances, and I have a problem with instance (specifically on D3D12). SV_InstanceID on D3D11/D3D12 starts from 0. However, in Vulkan it starts from firstInstance in draw indexed instanced command. That means that I can't use instance id in D3D12 or D3D11 because it always starts from 0. I can hack this and use additional vertex instance buffer with instance index for each draw, but this method adds complexity. So what do you think, how can we fix this problem?

gpuweb/gpuweb#901 - Problem description

ReBAR support

Seems wrong value is being copied

NRI/Source/VK/CommandBufferVK.cpp

Line 190 in 64120b7

memcpy(&attachment.clearValue, &clearDescs->value, sizeof(VkClearValue));

This line seems suspicious. Maybe you should be copying &desc.value?

[RFE/VK] Allow disabling bound checks via `VK_EXT_robustness2`

https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_robustness2.html

How does recycling CPU visible descriptors work in NRI?

On destroying a DescriptorD3D12 it seems like the DescriptorHandle is never returned to DeviceD3D12::m_DescriptorPool?

NRI/Source/D3D12/DeviceD3D12.cpp

Line 582 in 7388657

Deallocate(GetStdAllocator(), (DescriptorD3D12*)&descriptor);

Option to disable NVAPI and AGS pull for Linux

Small request, but currently the cmake script will pull the dependencies for NVAPI and AGS via Packman no matter the platform. To my understanding the NVAPI and AGS libraries are not required for linux-x86_64, thus this pull is redundant. Is it possible to add another option to disable this pull if on linux-x86_64?

Misleading line here

NRI/Source/VK/BufferVK.cpp

Line 26 in 64120b7

if (m_Memory != VK_NULL_HANDLE)

The left hand side is being compared with a VK_NULL_HANDLE on the right. This works as VK_NULL_HANDLE is defined as nullptr anyway. However, the line is misleading as looking at it without IDE, one would assume that m_Memory is a vulkan resource when it is in fact a NRI struct.

Documentation access

What are the requirements for getting to the documentation here?

https://docs.google.com/document/d/1OidQtZnm-3grhua7Oy1WKDUzh8ExTdbO83VTAWS_nNM/

[BUG] PSL tier is needed instead of `isProgrammableSampleLocationsSupported`

Based on #61 (comment)

[RFE] Add "low latency" extension

D3D11/D3D12: view NVAPI: NvAPI_D3D_SetSleepMode and friends
VK: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_NV_low_latency2.html

(VULKAN 1.3) Crash when trying to load global commands with vkGetInstanceProcAddr in VkDevice.cpp

When using a VkInstance created with VK_API_VERSION_1_3, vkGetInstanceProcAddr returns a nullptr and thus crashes here:
https://github.com/NVIDIAGameWorks/NRI/blob/main/Source/VK/DeviceVK.cpp#L1973

I suspect that vkGetInstanceProcAddr should not be called with VkCreateInstance, EnumerateInstanceExtensionProperties and EnumerateInstanceLayerProperties as they are global commands and therefore are expected to return NULL as per: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#vkGetInstanceProcAddr

For VK_API_VERSION_1_2 this seems to work, although to me it looks like it shouldn't according to the spec?

Only enable vulkan's extension when it's either supported or required

Previously, it was checking for availability before enabling them, but now it fails on some hardwares.
Could you bring back the checks or allow developers to explicity specify them programmatically?

NRI/Source/VK/DeviceVK.cpp

Lines 367 to 376 in 1d52b40

 // Mandatory 

 desiredExts.push_back(VK_KHR_SWAPCHAIN_EXTENSION_NAME); // TODO: move to supportedFeatures? 

 desiredExts.push_back(VK_KHR_DEFERRED_HOST_OPERATIONS_EXTENSION_NAME); 

 desiredExts.push_back(VK_KHR_SYNCHRONIZATION_2_EXTENSION_NAME); 

 desiredExts.push_back(VK_KHR_SHADER_NON_SEMANTIC_INFO_EXTENSION_NAME); // at least for "printf" 

 #ifdef __APPLE__ 

 desiredExts.push_back(VK_KHR_PORTABILITY_SUBSET_EXTENSION_NAME); 

 desiredExts.push_back(VK_KHR_DYNAMIC_RENDERING_EXTENSION_NAME); 

 #endif

[RFE] Expose "Streamer" extension, which is more practical at runtime than `HelperInterface::UploadData`

Draft with comments:

// © 2021 NVIDIA Corporation

#pragma once

NRI_NAMESPACE_BEGIN

NRI_FORWARD_STRUCT(Streamer);

// Could use an externally created "buffer", but in this case "growing under the hood" becomes impossible
NRI_STRUCT(StreamerDesc)
{
    //uint64_t capacity; // seems to be not needed
    NRI_NAME(MemoryLocation) memoryLocation; // UPLOAD or DEVICE_UPLOAD
    NRI_NAME(BufferUsageBits) usageBits;
    uint8_t framesInFlightNum; // needed to hide reallocation under the hood and avoid exposing capacity
};

NRI_STRUCT(BufferRangeUpdateRequestDesc)
{
    // Data to upload
    const void* data;
    uint64_t dataSize;

    // Destination (optional)
    NRI_NAME(Buffer)* dstBuffer;
    uint64_t dstBufferOffset;

    // Access?
    NRI_NAME(AccessBits) prevState; // potentially could assume UNKNOWN (unrecommended)
    NRI_NAME(AccessBits) nextState; // not all states can be transfered to in COPY and COMPUTE queues
};

NRI_STRUCT(TextureRegionUpdateRequestDesc)
{
    // Data to upload
    const void* data;
    uint64_t dataSize;
    uint32_t srcRowPitch;
    uint32_t srcSlicePitch;

    // Destination (mandatory)
    NRI_NAME(Texture)* dstTexture;
    const NRI_NAME(TextureRegionDesc)* dstRegionDesc;

    // Access?
    NRI_NAME(AccessAndLayout) prevState;
    NRI_NAME(AccessAndLayout) nextState;
};

NRI_STRUCT(StreamerInterface)
{
    NRI_NAME(Result) (NRI_CALL *CreateStreamer)(NRI_NAME_REF(Device) device, const NRI_NAME_REF(StreamerDesc) streamerDesc, NRI_NAME_REF(Streamer*) streamer);    
    void (NRI_CALL *DestroyStreamer)(NRI_NAME_REF(Streamer) streamer);

    // Add an update request to the queue (no work here)
    // These function return ring buffer offset
    uint64_t (NRI_CALL *EnqueueBufferRangeUpdateRequest)(NRI_NAME_REF(Streamer) streamer, const NRI_NAME_REF(BufferRangeUpdateRequestDesc) bufferRangeUpdateRequestDesc);
    uint64_t (NRI_CALL *EnqueueTextureRegionUpdateRequest)(NRI_NAME_REF(Streamer) streamer, const NRI_NAME_REF(TextureRegionUpdateRequestDesc) textureRegionUpdateRequestDesc);

    // Submit all gathered requests and reset the queue
    // The internal buffer can grow in this function
    // Doesn't requre WFI on a specific queue if a new buffer gets allocated immediately and destroying of the old buffer is postponed
    void (NRI_CALL *SubmitStreamerRequests)(NRI_NAME_REF(CommandBuffer) commandBuffer, NRI_NAME_REF(Streamer) streamer);

    // Needed if the buffer is explicitly used for rendering as IB, VB, CB
    // IMPORTANT: valid only after "Submit"
    // IMPORTANT: creating and caching "views" on this buffer is unrecommended
    NRI_NAME(Buffer*) (NRI_CALL *GetStreamerBuffer)(NRI_NAME_REF(Streamer) streamer);

    /*
    Not needed with "enqueue / submit" logic:
    uint64_t (NRI_CALL *GetStreamerCapacity)(NRI_NAME_REF(Streamer) streamer);
    void (NRI_CALL *ChangeStreamerCapacity)(NRI_NAME_REF(Streamer) streamer, uint64_t capacity);
    */
};

NRI_NAMESPACE_END

Such design allows to hide capacity, but "puts" all CPU-side copy operations in one place, i.e. 1 thread. It's not good, but at the same time other threads can have own Streamer objects.

[RFE] Move "depth bias" to dynamic state

https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkDynamicState.html

D3D12:

D3D12_FEATURE_DATA_D3D12_OPTIONS16::DynamicDepthBiasSupported
D3D12_PIPELINE_STATE_FLAG_DYNAMIC_DEPTH_BIAS
RSSetDepthBias

	inline void CommandBufferD3D12::SetDepthBounds(float boundsMin, float boundsMax)
	{
	if (m_Version >= 1)
	m_GraphicsCommandList->OMSetDepthBounds(boundsMin, boundsMax);
	}

	// Mandatory
	desiredExts.push_back(VK_KHR_SWAPCHAIN_EXTENSION_NAME); // TODO: move to supportedFeatures?
	desiredExts.push_back(VK_KHR_DEFERRED_HOST_OPERATIONS_EXTENSION_NAME);
	desiredExts.push_back(VK_KHR_SYNCHRONIZATION_2_EXTENSION_NAME);
	desiredExts.push_back(VK_KHR_SHADER_NON_SEMANTIC_INFO_EXTENSION_NAME); // at least for "printf"
	#ifdef __APPLE__
	desiredExts.push_back(VK_KHR_PORTABILITY_SUBSET_EXTENSION_NAME);
	desiredExts.push_back(VK_KHR_DYNAMIC_RENDERING_EXTENSION_NAME);
	#endif

nvidiagameworks / nri Goto Github PK

nri's People

Contributors

Stargazers

Watchers

Forkers

nri's Issues

Recommend Projects

Recommend Topics

Recommend Org