Git Product home page Git Product logo

nri's People

Contributors

01pollux avatar dzhdannv avatar halldorfannar avatar pollend avatar vertver avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nri's Issues

Bug: X11 and Wayland should be treated mutually exclusive during swapchain creation

#ifdef VK_USE_PLATFORM_XLIB_KHR

This code is broken, if both, X11 and Wayland are available on a system.
NRWindow is a union, so writing to it via

        m_NRIWindow.x11.dpy = glfwGetX11Display();
        m_NRIWindow.x11.window = glfwGetX11Window(m_Window);

Will make the code in SwapChainVK::Create() think, that we provided both, a X11 and a Wayland window handle. The code will then decide to create a X11 surface object first, followed by overwriting the surface handle by creating yet another Wayland surface object with the same X11 window handle. On NVIDIA drivers this will later lead vkCreateSwapchainKHR() to crash deep in the callstack.

I think, making NRWindow not be a union and instead use #ifdef to control the platform-dependent members would be the right call.

Assignment order does not match struct member order

stream_desc.bindingSlot,

The struct member order is :

uint32_t stride;
uint16_t bindingSlot;
VertexStreamStepRate stepRate;

However, the stride and bindingSlot are flipped during assignment:

           stream_desc.bindingSlot,
           stream_desc.stride,
           (stream_desc.stepRate == VertexStreamStepRate::PER_VERTEX) ? VK_VERTEX_INPUT_RATE_VERTEX : VK_VERTEX_INPUT_RATE_INSTANCE

ReBAR and sparse texture support

I was thinking about adding support for Resizeable BAR, but it is only supported in Vulkan and D3D12. What do you think about creating an additional interface for sparse textures and ReBAR as an extension. I would implement one of them, but without an interface I have no idea how to implement them in NRI.

Maybe we need something like "resource extension"? We can easily combine both of technologies into one interface and maybe add streaming support and other features that are not presented on D3D11.

Additional Flags to NRI init to disable features

for my use case when I create the nri device would it be possible to extend NriDeviceCreationDesc with flags to enable extensions. would be preferable if the init code assumed the minimum set of extensions?

Unknown interface HelperInterface

When I started using version v1.120 I have this problem:
NRI::ERROR(Creation.cpp:88) - VK::NVIDIA GeForce 930MX - Unknown interface 'nri::HelperInterface'!

Question: Queue family selection

the way family indices are handled doesn't seem quite right.

This is my gpu it has two queue families:

Queue[0]: VK_QUEUE_GRAPHICS_BIT VK_QUEUE_COMPUTE_BIT VK_QUEUE_TRANSFER_BIT 
Queue[1]: VK_QUEUE_COMPUTE_BIT VK_QUEUE_TRANSFER_BIT 

how would the selection process work? would queue 0 be available for graphics and compute and queue 1 be for copy?

void DeviceVK::FillFamilyIndices(bool useEnabledFamilyIndices, const uint32_t* enabledFamilyIndices, uint32_t familyIndexNum)
{
    uint32_t familyNum = 0;
    m_VK.GetPhysicalDeviceQueueFamilyProperties(m_PhysicalDevices.front(), &familyNum, nullptr);

    Vector<VkQueueFamilyProperties> familyProps(familyNum, GetStdAllocator());
    m_VK.GetPhysicalDeviceQueueFamilyProperties(m_PhysicalDevices.front(), &familyNum, familyProps.data());

    memset(m_FamilyIndices.data(), INVALID_FAMILY_INDEX, m_FamilyIndices.size() * sizeof(uint32_t));

    for (uint32_t i = 0; i < familyProps.size(); i++)
    {
        const VkQueueFlags mask = familyProps[i].queueFlags;
        const bool graphics = mask & VK_QUEUE_GRAPHICS_BIT;
        const bool compute = mask & VK_QUEUE_COMPUTE_BIT;
        const bool copy = mask & VK_QUEUE_TRANSFER_BIT;

        if (useEnabledFamilyIndices)
        {
            bool isFamilyEnabled = false;
            for (uint32_t j = 0; j < familyIndexNum && !isFamilyEnabled; j++)
                isFamilyEnabled = enabledFamilyIndices[j] == i;

            if (!isFamilyEnabled)
                continue;
        }

        if (graphics)
            m_FamilyIndices[(uint32_t)CommandQueueType::GRAPHICS] = i;
        else if (compute)
            m_FamilyIndices[(uint32_t)CommandQueueType::COMPUTE] = i;
        else if (copy)
            m_FamilyIndices[(uint32_t)CommandQueueType::COPY] = i;
    }
}

what is the intention for how a queue is selected for a family?

overlapping incompatibility with VK_IMAGE_CREATE_2D_ARRAY_COMPATIBLE_BIT | VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT

these rules are not compatible:

If flags contains VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT, imageType must be VK_IMAGE_TYPE_2D
If flags contains VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT, arrayLayers must be greater than or equal to 6

If flags contains VK_IMAGE_CREATE_2D_ARRAY_COMPATIBLE_BIT, imageType must be VK_IMAGE_TYPE_3D

if its a cubemap then it has to be a 2d image but if its an array then it must be a 3d image.

[RFE] Pipeline library extension

On D3D12 and Vulkan there is a problem with the PSO build process. Sometimes the game can hang, and not only that - you, as a developer, cannot directly predict the pipeline build time. Also, on GDK you can't build PSO at runtime, only pre-compile. So I think this feature is most needed in NRI as an extension because it affects a lot of things. What do you think? Maybe we need some priorities for our tasks to manage them properly?

Unsupported depth bounds

The new 1.118 update broke support for some hardwares where depth bounds isn't supported.
Rather than checking if version is greater than 1, it should have been D3D12_FEATURE_DATA_D3D12_OPTIONS2::DepthBoundsTestSupported
This also raises the question of other versions check like sample positions.

inline void CommandBufferD3D12::SetDepthBounds(float boundsMin, float boundsMax)
{
if (m_Version >= 1)
m_GraphicsCommandList->OMSetDepthBounds(boundsMin, boundsMax);
}

Resource management improvements

So I was thinking about how to manage resources on GPU/CPU side, and I thought it may be useful to have an ability to Map/Unmap ranges on CPU bound buffers and upload multiple ranges for the same buffer (now you can only upload one range per buffer/texture). Also it might be useful for geometry update (you have two big vertex and index buffers and you only need to update some region of them at once).

Improved Metal / Apple Silicon Support

Hi, I have been taking this abstraction for a spin (thank you for making it), and in order for it to work on the M1 MacBook I am using, I needed to adjust Vulkan instance and device creation to correctly utilize the VK_KHR_portability_enumeration/subset extension.

Descriptor Sets managing

Recently ran into a problem with descriptor set management. I need to dynamically allocate and free sets in descriptor pool, but NRI does not have the necessary method to free descriptor sets. How do I need to manage these sets in this situation?

clang-cl compile error

Hello, I am trying to compile NRI using clang-cl compiler. I modified 1-Deploy.bat to use clang-cl as C++ compiler

@echo off

git submodule update --init --recursive

mkdir "_Build"

cd "_Build"
cmake .. -A x64 -T ClangCL
cd ..

Next, compile using script 2-Build.bat

Msbuild will print many warnings and errors. Such as:

E:\cpp\NRI\Source\D3D12\BufferD3D12.cpp(74,13): error : pasting formed '"ID3D12Device::CreateCommittedResource()"" fail
ed, result = 0x%08X!"', an invalid preprocessing token [-Winvalid-token-paste] [E:\cpp\NRI\_Build\NRI_D3D12.vcxproj]
E:\cpp\NRI\Source\Shared\SharedExternal.h(40,86): message : expanded from macro 'RETURN_ON_BAD_HRESULT' [E:\cpp\NRI\_Bu
ild\NRI_D3D12.vcxproj]
E:\cpp\NRI\Source\D3D12\BufferD3D12.cpp(77,13): error : pasting formed '"ID3D12Device::CreatePlacedResource()"" failed,
 result = 0x%08X!"', an invalid preprocessing token [-Winvalid-token-paste] [E:\cpp\NRI\_Build\NRI_D3D12.vcxproj]
E:\cpp\NRI\Source\Shared\SharedExternal.h(40,86): message : expanded from macro 'RETURN_ON_BAD_HRESULT' [E:\cpp\NRI\_Bu
ild\NRI_D3D12.vcxproj]

It seems that the macro RETURN_ON_BAD_HRESULT not standard C++.

My environment is
OS: Windows 11 23H2 22631.3447
Visual Studio: 17.9.6
Installed C++ Clang compiler for Windows(17.0.3)

Wrong barriers for readback buffer

I was trying to run SceneViewer sample, but I got a validation error on D3D12 (without Agility SDK):

D3D12 ERROR: ID3D12CommandList::ResourceBarrier: Certain resources are restricted to certain D3D12_RESOURCE_STATES states, and cannot be changed. Resources on D3D12_HEAP_TYPE_READBACK heaps requires D3D12_RESOURCE_STATE_COPY_DEST or D3D12_RESOURCE_STATE_RESOLVE_DEST. Reserved buffers used exclusively for texture placement requires D3D12_RESOURCE_STATE_COMMON. [ RESOURCE_MANIPULATION ERROR #741: RESOURCE_BARRIER_INVALID_HEAP]

I've decided to remove barrier for readback buffer and now it's working.

        //nri::BufferBarrierDesc bufferBarrierDescs = {};
        //bufferBarrierDescs.buffer = m_Buffers[READBACK_BUFFER];
        //bufferBarrierDescs.before = {nri::AccessBits::UNKNOWN, nri::StageBits::COPY};
        //bufferBarrierDescs.after = {nri::AccessBits::COPY_DESTINATION, nri::StageBits::COPY};

        nri::BarrierGroupDesc barrierGroupDesc = {};
        barrierGroupDesc.textureNum = 1;
        barrierGroupDesc.textures = &textureBarrierDescs;
        //barrierGroupDesc.bufferNum = 1;
        //barrierGroupDesc.buffers = &bufferBarrierDescs;
        NRI.CmdBarrier(commandBuffer, barrierGroupDesc);
        
        // ...
        
        textureBarrierDescs.before = textureBarrierDescs.after;
        textureBarrierDescs.after = {nri::AccessBits::UNKNOWN, nri::Layout::PRESENT};

        //bufferBarrierDescs.before = bufferBarrierDescs.after;
        //bufferBarrierDescs.after = {nri::AccessBits::UNKNOWN, nri::StageBits::COPY};

        NRI.CmdBarrier(commandBuffer, barrierGroupDesc);

What do you think - is that a bug in NRI or in sample?

[RFE] Instancing problem in GPU-driven rendering

I'm trying to implement GPU-driven rendering with bindless descriptors and instances, and I have a problem with instance (specifically on D3D12). SV_InstanceID on D3D11/D3D12 starts from 0. However, in Vulkan it starts from firstInstance in draw indexed instanced command. That means that I can't use instance id in D3D12 or D3D11 because it always starts from 0. I can hack this and use additional vertex instance buffer with instance index for each draw, but this method adds complexity. So what do you think, how can we fix this problem?

gpuweb/gpuweb#901 - Problem description

Option to disable NVAPI and AGS pull for Linux

Small request, but currently the cmake script will pull the dependencies for NVAPI and AGS via Packman no matter the platform. To my understanding the NVAPI and AGS libraries are not required for linux-x86_64, thus this pull is redundant. Is it possible to add another option to disable this pull if on linux-x86_64?

Misleading line here

if (m_Memory != VK_NULL_HANDLE)

The left hand side is being compared with a VK_NULL_HANDLE on the right. This works as VK_NULL_HANDLE is defined as nullptr anyway. However, the line is misleading as looking at it without IDE, one would assume that m_Memory is a vulkan resource when it is in fact a NRI struct.

(VULKAN 1.3) Crash when trying to load global commands with vkGetInstanceProcAddr in VkDevice.cpp

When using a VkInstance created with VK_API_VERSION_1_3, vkGetInstanceProcAddr returns a nullptr and thus crashes here:
https://github.com/NVIDIAGameWorks/NRI/blob/main/Source/VK/DeviceVK.cpp#L1973

I suspect that vkGetInstanceProcAddr should not be called with VkCreateInstance, EnumerateInstanceExtensionProperties and EnumerateInstanceLayerProperties as they are global commands and therefore are expected to return NULL as per: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#vkGetInstanceProcAddr

For VK_API_VERSION_1_2 this seems to work, although to me it looks like it shouldn't according to the spec?

Only enable vulkan's extension when it's either supported or required

Previously, it was checking for availability before enabling them, but now it fails on some hardwares.
Could you bring back the checks or allow developers to explicity specify them programmatically?

NRI/Source/VK/DeviceVK.cpp

Lines 367 to 376 in 1d52b40

// Mandatory
desiredExts.push_back(VK_KHR_SWAPCHAIN_EXTENSION_NAME); // TODO: move to supportedFeatures?
desiredExts.push_back(VK_KHR_DEFERRED_HOST_OPERATIONS_EXTENSION_NAME);
desiredExts.push_back(VK_KHR_SYNCHRONIZATION_2_EXTENSION_NAME);
desiredExts.push_back(VK_KHR_SHADER_NON_SEMANTIC_INFO_EXTENSION_NAME); // at least for "printf"
#ifdef __APPLE__
desiredExts.push_back(VK_KHR_PORTABILITY_SUBSET_EXTENSION_NAME);
desiredExts.push_back(VK_KHR_DYNAMIC_RENDERING_EXTENSION_NAME);
#endif

[RFE] Expose "Streamer" extension, which is more practical at runtime than `HelperInterface::UploadData`

Draft with comments:

// © 2021 NVIDIA Corporation

#pragma once

NRI_NAMESPACE_BEGIN

NRI_FORWARD_STRUCT(Streamer);

// Could use an externally created "buffer", but in this case "growing under the hood" becomes impossible
NRI_STRUCT(StreamerDesc)
{
    //uint64_t capacity; // seems to be not needed
    NRI_NAME(MemoryLocation) memoryLocation; // UPLOAD or DEVICE_UPLOAD
    NRI_NAME(BufferUsageBits) usageBits;
    uint8_t framesInFlightNum; // needed to hide reallocation under the hood and avoid exposing capacity
};

NRI_STRUCT(BufferRangeUpdateRequestDesc)
{
    // Data to upload
    const void* data;
    uint64_t dataSize;

    // Destination (optional)
    NRI_NAME(Buffer)* dstBuffer;
    uint64_t dstBufferOffset;

    // Access?
    NRI_NAME(AccessBits) prevState; // potentially could assume UNKNOWN (unrecommended)
    NRI_NAME(AccessBits) nextState; // not all states can be transfered to in COPY and COMPUTE queues
};

NRI_STRUCT(TextureRegionUpdateRequestDesc)
{
    // Data to upload
    const void* data;
    uint64_t dataSize;
    uint32_t srcRowPitch;
    uint32_t srcSlicePitch;

    // Destination (mandatory)
    NRI_NAME(Texture)* dstTexture;
    const NRI_NAME(TextureRegionDesc)* dstRegionDesc;

    // Access?
    NRI_NAME(AccessAndLayout) prevState;
    NRI_NAME(AccessAndLayout) nextState;
};

NRI_STRUCT(StreamerInterface)
{
    NRI_NAME(Result) (NRI_CALL *CreateStreamer)(NRI_NAME_REF(Device) device, const NRI_NAME_REF(StreamerDesc) streamerDesc, NRI_NAME_REF(Streamer*) streamer);    
    void (NRI_CALL *DestroyStreamer)(NRI_NAME_REF(Streamer) streamer);

    // Add an update request to the queue (no work here)
    // These function return ring buffer offset
    uint64_t (NRI_CALL *EnqueueBufferRangeUpdateRequest)(NRI_NAME_REF(Streamer) streamer, const NRI_NAME_REF(BufferRangeUpdateRequestDesc) bufferRangeUpdateRequestDesc);
    uint64_t (NRI_CALL *EnqueueTextureRegionUpdateRequest)(NRI_NAME_REF(Streamer) streamer, const NRI_NAME_REF(TextureRegionUpdateRequestDesc) textureRegionUpdateRequestDesc);

    // Submit all gathered requests and reset the queue
    // The internal buffer can grow in this function
    // Doesn't requre WFI on a specific queue if a new buffer gets allocated immediately and destroying of the old buffer is postponed
    void (NRI_CALL *SubmitStreamerRequests)(NRI_NAME_REF(CommandBuffer) commandBuffer, NRI_NAME_REF(Streamer) streamer);

    // Needed if the buffer is explicitly used for rendering as IB, VB, CB
    // IMPORTANT: valid only after "Submit"
    // IMPORTANT: creating and caching "views" on this buffer is unrecommended
    NRI_NAME(Buffer*) (NRI_CALL *GetStreamerBuffer)(NRI_NAME_REF(Streamer) streamer);

    /*
    Not needed with "enqueue / submit" logic:
    uint64_t (NRI_CALL *GetStreamerCapacity)(NRI_NAME_REF(Streamer) streamer);
    void (NRI_CALL *ChangeStreamerCapacity)(NRI_NAME_REF(Streamer) streamer, uint64_t capacity);
    */
};

NRI_NAMESPACE_END

Such design allows to hide capacity, but "puts" all CPU-side copy operations in one place, i.e. 1 thread. It's not good, but at the same time other threads can have own Streamer objects.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.