nvidiagameworks / nri Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Line 103 in 919412b
This code is broken, if both, X11 and Wayland are available on a system.
NRWindow is a union, so writing to it via
m_NRIWindow.x11.dpy = glfwGetX11Display();
m_NRIWindow.x11.window = glfwGetX11Window(m_Window);
Will make the code in SwapChainVK::Create()
think, that we provided both, a X11 and a Wayland window handle. The code will then decide to create a X11 surface object first, followed by overwriting the surface handle by creating yet another Wayland surface object with the same X11 window handle. On NVIDIA drivers this will later lead vkCreateSwapchainKHR() to crash deep in the callstack.
I think, making NRWindow not be a union and instead use #ifdef to control the platform-dependent members would be the right call.
It looks like a new query type is needed, because VK_QUERY_TYPE_TIMESTAMP
can be used in any queue.
Line 366 in 64120b7
The struct member order is :
uint32_t stride;
uint16_t bindingSlot;
VertexStreamStepRate stepRate;
However, the stride and bindingSlot are flipped during assignment:
stream_desc.bindingSlot,
stream_desc.stride,
(stream_desc.stepRate == VertexStreamStepRate::PER_VERTEX) ? VK_VERTEX_INPUT_RATE_VERTEX : VK_VERTEX_INPUT_RATE_INSTANCE
I was thinking about adding support for Resizeable BAR, but it is only supported in Vulkan and D3D12. What do you think about creating an additional interface for sparse textures and ReBAR as an extension. I would implement one of them, but without an interface I have no idea how to implement them in NRI.
Maybe we need something like "resource extension"? We can easily combine both of technologies into one interface and maybe add streaming support and other features that are not presented on D3D11.
for my use case when I create the nri device would it be possible to extend NriDeviceCreationDesc with flags to enable extensions. would be preferable if the init code assumed the minimum set of extensions?
Sometimes it's useful to have an indirect buffer count, so that only the required number of draw calls or shader executions are executed.
Links:
gpuweb/gpuweb#1949
https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/vkCmdDrawIndirectCount.html
NRI/Source/D3D12/CommandBufferD3D12.cpp
Line 520 in 171b583
This line and the one below should be casting to TextureD3D12*, not BufferD3D12*
Having any interop between GAPI and CUDA would be nice.
https://microsoft.github.io/DirectX-Specs/d3d/D3D12EnhancedBarriers.html
It should help to make D3D12 backend closer to VK (with legacy resource states there are some hidden discrepancies). According to the spec "resource transitions" are emulated using "enhanced barriers" API under the hood.
When I started using version v1.120 I have this problem:
NRI::ERROR(Creation.cpp:88) - VK::NVIDIA GeForce 930MX - Unknown interface 'nri::HelperInterface'!
the way family indices are handled doesn't seem quite right.
This is my gpu it has two queue families:
Queue[0]: VK_QUEUE_GRAPHICS_BIT VK_QUEUE_COMPUTE_BIT VK_QUEUE_TRANSFER_BIT
Queue[1]: VK_QUEUE_COMPUTE_BIT VK_QUEUE_TRANSFER_BIT
how would the selection process work? would queue 0 be available for graphics and compute and queue 1 be for copy?
void DeviceVK::FillFamilyIndices(bool useEnabledFamilyIndices, const uint32_t* enabledFamilyIndices, uint32_t familyIndexNum)
{
uint32_t familyNum = 0;
m_VK.GetPhysicalDeviceQueueFamilyProperties(m_PhysicalDevices.front(), &familyNum, nullptr);
Vector<VkQueueFamilyProperties> familyProps(familyNum, GetStdAllocator());
m_VK.GetPhysicalDeviceQueueFamilyProperties(m_PhysicalDevices.front(), &familyNum, familyProps.data());
memset(m_FamilyIndices.data(), INVALID_FAMILY_INDEX, m_FamilyIndices.size() * sizeof(uint32_t));
for (uint32_t i = 0; i < familyProps.size(); i++)
{
const VkQueueFlags mask = familyProps[i].queueFlags;
const bool graphics = mask & VK_QUEUE_GRAPHICS_BIT;
const bool compute = mask & VK_QUEUE_COMPUTE_BIT;
const bool copy = mask & VK_QUEUE_TRANSFER_BIT;
if (useEnabledFamilyIndices)
{
bool isFamilyEnabled = false;
for (uint32_t j = 0; j < familyIndexNum && !isFamilyEnabled; j++)
isFamilyEnabled = enabledFamilyIndices[j] == i;
if (!isFamilyEnabled)
continue;
}
if (graphics)
m_FamilyIndices[(uint32_t)CommandQueueType::GRAPHICS] = i;
else if (compute)
m_FamilyIndices[(uint32_t)CommandQueueType::COMPUTE] = i;
else if (copy)
m_FamilyIndices[(uint32_t)CommandQueueType::COPY] = i;
}
}
what is the intention for how a queue is selected for a family?
There is a selection of formats but sometimes you want to know how many samples there are or the bit depth of the format. would by nice to have some utilities similar to tiny_imageformat?
https://github.com/DeanoC/tiny_imageformat/blob/master/include/tiny_imageformat/tinyimageformat.h
these rules are not compatible:
If flags contains VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT, imageType must be VK_IMAGE_TYPE_2D
If flags contains VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT, arrayLayers must be greater than or equal to 6
If flags contains VK_IMAGE_CREATE_2D_ARRAY_COMPATIBLE_BIT, imageType must be VK_IMAGE_TYPE_3D
if its a cubemap then it has to be a 2d image but if its an array then it must be a 3d image.
NRI/Source/D3D12/DescriptorPoolD3D12.cpp
Line 136 in 171b583
Otherwise, you will do a double free when the destructor is called which tries to deallocate all items in the list.
NRI/Source/VK/DeviceSemaphoreVK.cpp
Line 43 in 0f3c47b
On D3D12 and Vulkan there is a problem with the PSO build process. Sometimes the game can hang, and not only that - you, as a developer, cannot directly predict the pipeline build time. Also, on GDK you can't build PSO at runtime, only pre-compile. So I think this feature is most needed in NRI as an extension because it affects a lot of things. What do you think? Maybe we need some priorities for our tasks to manage them properly?
The new 1.118 update broke support for some hardwares where depth bounds isn't supported.
Rather than checking if version is greater than 1, it should have been D3D12_FEATURE_DATA_D3D12_OPTIONS2::DepthBoundsTestSupported
This also raises the question of other versions check like sample positions.
NRI/Source/D3D12/CommandBufferD3D12.cpp
Lines 297 to 301 in 1d52b40
So I was thinking about how to manage resources on GPU/CPU side, and I thought it may be useful to have an ability to Map/Unmap ranges on CPU bound buffers and upload multiple ranges for the same buffer (now you can only upload one range per buffer/texture). Also it might be useful for geometry update (you have two big vertex and index buffers and you only need to update some region of them at once).
Hi, I have been taking this abstraction for a spin (thank you for making it), and in order for it to work on the M1 MacBook I am using, I needed to adjust Vulkan instance and device creation to correctly utilize the VK_KHR_portability_enumeration/subset
extension.
@dzhdanNV The code here seems to be incorrect:
https://github.com/NVIDIAGameWorks/NRI/blob/main/Source/VK/DescriptorPoolVK.cpp#L31-L32
It seems it should be attempting to free 0-m_UsedSets instead of m_UsedSets-m_AllocatedSets.size() (which is the same in some cases.)
NRI/Source/D3D12/PipelineD3D12.cpp
Line 122 in 0d00b9d
Recently ran into a problem with descriptor set management. I need to dynamically allocate and free sets in descriptor pool, but NRI does not have the necessary method to free descriptor sets. How do I need to manage these sets in this situation?
It would be very useful to have a cpy only descriptor for copying and storages.
Hello, I am trying to compile NRI using clang-cl compiler. I modified 1-Deploy.bat
to use clang-cl as C++ compiler
@echo off
git submodule update --init --recursive
mkdir "_Build"
cd "_Build"
cmake .. -A x64 -T ClangCL
cd ..
Next, compile using script 2-Build.bat
Msbuild will print many warnings and errors. Such as:
E:\cpp\NRI\Source\D3D12\BufferD3D12.cpp(74,13): error : pasting formed '"ID3D12Device::CreateCommittedResource()"" fail
ed, result = 0x%08X!"', an invalid preprocessing token [-Winvalid-token-paste] [E:\cpp\NRI\_Build\NRI_D3D12.vcxproj]
E:\cpp\NRI\Source\Shared\SharedExternal.h(40,86): message : expanded from macro 'RETURN_ON_BAD_HRESULT' [E:\cpp\NRI\_Bu
ild\NRI_D3D12.vcxproj]
E:\cpp\NRI\Source\D3D12\BufferD3D12.cpp(77,13): error : pasting formed '"ID3D12Device::CreatePlacedResource()"" failed,
result = 0x%08X!"', an invalid preprocessing token [-Winvalid-token-paste] [E:\cpp\NRI\_Build\NRI_D3D12.vcxproj]
E:\cpp\NRI\Source\Shared\SharedExternal.h(40,86): message : expanded from macro 'RETURN_ON_BAD_HRESULT' [E:\cpp\NRI\_Bu
ild\NRI_D3D12.vcxproj]
It seems that the macro RETURN_ON_BAD_HRESULT
not standard C++.
My environment is
OS: Windows 11 23H2 22631.3447
Visual Studio: 17.9.6
Installed C++ Clang compiler for Windows(17.0.3)
I was trying to run SceneViewer sample, but I got a validation error on D3D12 (without Agility SDK):
D3D12 ERROR: ID3D12CommandList::ResourceBarrier: Certain resources are restricted to certain D3D12_RESOURCE_STATES states, and cannot be changed. Resources on D3D12_HEAP_TYPE_READBACK heaps requires D3D12_RESOURCE_STATE_COPY_DEST or D3D12_RESOURCE_STATE_RESOLVE_DEST. Reserved buffers used exclusively for texture placement requires D3D12_RESOURCE_STATE_COMMON. [ RESOURCE_MANIPULATION ERROR #741: RESOURCE_BARRIER_INVALID_HEAP]
I've decided to remove barrier for readback buffer and now it's working.
//nri::BufferBarrierDesc bufferBarrierDescs = {};
//bufferBarrierDescs.buffer = m_Buffers[READBACK_BUFFER];
//bufferBarrierDescs.before = {nri::AccessBits::UNKNOWN, nri::StageBits::COPY};
//bufferBarrierDescs.after = {nri::AccessBits::COPY_DESTINATION, nri::StageBits::COPY};
nri::BarrierGroupDesc barrierGroupDesc = {};
barrierGroupDesc.textureNum = 1;
barrierGroupDesc.textures = &textureBarrierDescs;
//barrierGroupDesc.bufferNum = 1;
//barrierGroupDesc.buffers = &bufferBarrierDescs;
NRI.CmdBarrier(commandBuffer, barrierGroupDesc);
// ...
textureBarrierDescs.before = textureBarrierDescs.after;
textureBarrierDescs.after = {nri::AccessBits::UNKNOWN, nri::Layout::PRESENT};
//bufferBarrierDescs.before = bufferBarrierDescs.after;
//bufferBarrierDescs.after = {nri::AccessBits::UNKNOWN, nri::StageBits::COPY};
NRI.CmdBarrier(commandBuffer, barrierGroupDesc);
What do you think - is that a bug in NRI or in sample?
I'm trying to implement GPU-driven rendering with bindless descriptors and instances, and I have a problem with instance (specifically on D3D12). SV_InstanceID
on D3D11/D3D12 starts from 0. However, in Vulkan it starts from firstInstance
in draw indexed instanced command. That means that I can't use instance id in D3D12 or D3D11 because it always starts from 0. I can hack this and use additional vertex instance buffer with instance index for each draw, but this method adds complexity. So what do you think, how can we fix this problem?
gpuweb/gpuweb#901 - Problem description
NRI/Source/VK/CommandBufferVK.cpp
Line 190 in 64120b7
This line seems suspicious. Maybe you should be copying &desc.value?
On destroying a DescriptorD3D12 it seems like the DescriptorHandle is never returned to DeviceD3D12::m_DescriptorPool?
NRI/Source/D3D12/DeviceD3D12.cpp
Line 582 in 7388657
Small request, but currently the cmake script will pull the dependencies for NVAPI and AGS via Packman no matter the platform. To my understanding the NVAPI and AGS libraries are not required for linux-x86_64, thus this pull is redundant. Is it possible to add another option to disable this pull if on linux-x86_64?
Line 26 in 64120b7
The left hand side is being compared with a VK_NULL_HANDLE on the right. This works as VK_NULL_HANDLE is defined as nullptr anyway. However, the line is misleading as looking at it without IDE, one would assume that m_Memory is a vulkan resource when it is in fact a NRI struct.
What are the requirements for getting to the documentation here?
https://docs.google.com/document/d/1OidQtZnm-3grhua7Oy1WKDUzh8ExTdbO83VTAWS_nNM/
Based on #61 (comment)
D3D11/D3D12: view NVAPI: NvAPI_D3D_SetSleepMode
and friends
VK: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_NV_low_latency2.html
When using a VkInstance created with VK_API_VERSION_1_3
, vkGetInstanceProcAddr returns a nullptr and thus crashes here:
https://github.com/NVIDIAGameWorks/NRI/blob/main/Source/VK/DeviceVK.cpp#L1973
I suspect that vkGetInstanceProcAddr should not be called with VkCreateInstance
, EnumerateInstanceExtensionProperties
and EnumerateInstanceLayerProperties
as they are global commands and therefore are expected to return NULL
as per: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#vkGetInstanceProcAddr
For VK_API_VERSION_1_2 this seems to work, although to me it looks like it shouldn't according to the spec?
Previously, it was checking for availability before enabling them, but now it fails on some hardwares.
Could you bring back the checks or allow developers to explicity specify them programmatically?
Lines 367 to 376 in 1d52b40
Draft with comments:
// © 2021 NVIDIA Corporation
#pragma once
NRI_NAMESPACE_BEGIN
NRI_FORWARD_STRUCT(Streamer);
// Could use an externally created "buffer", but in this case "growing under the hood" becomes impossible
NRI_STRUCT(StreamerDesc)
{
//uint64_t capacity; // seems to be not needed
NRI_NAME(MemoryLocation) memoryLocation; // UPLOAD or DEVICE_UPLOAD
NRI_NAME(BufferUsageBits) usageBits;
uint8_t framesInFlightNum; // needed to hide reallocation under the hood and avoid exposing capacity
};
NRI_STRUCT(BufferRangeUpdateRequestDesc)
{
// Data to upload
const void* data;
uint64_t dataSize;
// Destination (optional)
NRI_NAME(Buffer)* dstBuffer;
uint64_t dstBufferOffset;
// Access?
NRI_NAME(AccessBits) prevState; // potentially could assume UNKNOWN (unrecommended)
NRI_NAME(AccessBits) nextState; // not all states can be transfered to in COPY and COMPUTE queues
};
NRI_STRUCT(TextureRegionUpdateRequestDesc)
{
// Data to upload
const void* data;
uint64_t dataSize;
uint32_t srcRowPitch;
uint32_t srcSlicePitch;
// Destination (mandatory)
NRI_NAME(Texture)* dstTexture;
const NRI_NAME(TextureRegionDesc)* dstRegionDesc;
// Access?
NRI_NAME(AccessAndLayout) prevState;
NRI_NAME(AccessAndLayout) nextState;
};
NRI_STRUCT(StreamerInterface)
{
NRI_NAME(Result) (NRI_CALL *CreateStreamer)(NRI_NAME_REF(Device) device, const NRI_NAME_REF(StreamerDesc) streamerDesc, NRI_NAME_REF(Streamer*) streamer);
void (NRI_CALL *DestroyStreamer)(NRI_NAME_REF(Streamer) streamer);
// Add an update request to the queue (no work here)
// These function return ring buffer offset
uint64_t (NRI_CALL *EnqueueBufferRangeUpdateRequest)(NRI_NAME_REF(Streamer) streamer, const NRI_NAME_REF(BufferRangeUpdateRequestDesc) bufferRangeUpdateRequestDesc);
uint64_t (NRI_CALL *EnqueueTextureRegionUpdateRequest)(NRI_NAME_REF(Streamer) streamer, const NRI_NAME_REF(TextureRegionUpdateRequestDesc) textureRegionUpdateRequestDesc);
// Submit all gathered requests and reset the queue
// The internal buffer can grow in this function
// Doesn't requre WFI on a specific queue if a new buffer gets allocated immediately and destroying of the old buffer is postponed
void (NRI_CALL *SubmitStreamerRequests)(NRI_NAME_REF(CommandBuffer) commandBuffer, NRI_NAME_REF(Streamer) streamer);
// Needed if the buffer is explicitly used for rendering as IB, VB, CB
// IMPORTANT: valid only after "Submit"
// IMPORTANT: creating and caching "views" on this buffer is unrecommended
NRI_NAME(Buffer*) (NRI_CALL *GetStreamerBuffer)(NRI_NAME_REF(Streamer) streamer);
/*
Not needed with "enqueue / submit" logic:
uint64_t (NRI_CALL *GetStreamerCapacity)(NRI_NAME_REF(Streamer) streamer);
void (NRI_CALL *ChangeStreamerCapacity)(NRI_NAME_REF(Streamer) streamer, uint64_t capacity);
*/
};
NRI_NAMESPACE_END
Such design allows to hide capacity, but "puts" all CPU-side copy operations in one place, i.e. 1 thread. It's not good, but at the same time other threads can have own Streamer
objects.
https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkDynamicState.html
D3D12:
D3D12_FEATURE_DATA_D3D12_OPTIONS16::DynamicDepthBiasSupported
D3D12_PIPELINE_STATE_FLAG_DYNAMIC_DEPTH_BIAS
RSSetDepthBias
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.