So I was thinking about how to manage resources on GPU/CPU side, and I thought it may

...but they will use the same load buffer <p dir="auto"

OK, the intermediate conclusion: Upl

Resource management improvements about nri HOT 15 CLOSED

nvidiagameworks commented on June 12, 2024

Resource management improvements

from nri.

Comments (15)

dzhdanNV commented on June 12, 2024

Please, provide more information. Potential API changes would be especially helpful.

I thought it may be useful to have an ability to Map/Unmap ranges

I definitely misunderstand something, but current API allows to map a range:

void* (NRI_CALL *MapBuffer)(NRI_NAME_REF(Buffer) buffer, uint64_t offset, uint64_t size);

Is your concern is that it's only 1 range, but you want to get multiple? If yes, why not Map / Unmap several times? Or just not Map a bigger range?

you have two big vertex and index buffers and you only need to update some region of them at once)

Why is Map range -> Modify -> Unmap not enough? BTW, IMO, better use UPLOAD heap only for streaming and not for rendering (except minor lightweight passes). I.e.

A (resource for streaming in UPLOAD heap, ring buffer)
Map range in A with proper offset to avoid touching data referenced by GPU
Update
Unmap (persistent mapping is nice, but only if you don't care about D3D11)
Copy to a destination resource on DEVICE

from nri.

vertver commented on June 12, 2024

Is your concern is that it's only 1 range, but you want to get multiple? If yes, why not Map / Unmap several times? Or just not Map a bigger range?

Is this the correct use of this function? What about a views in with different offsets in the buffer/texture? I know that D3D12 and D3D11 supports multiple subranges update in one call to reduce GAPI call count.

Why is Map range -> Modify -> Unmap not enough? BTW, IMO, better use UPLOAD heap only for streaming and not for rendering (except minor lightweight passes). I.e.

I'm using DEVICE heap for each buffer that is not updated every frame, the problem is that I can only update 1 range in helper function UploadData. So I can't update subranges on device without copying unnecessary data.

from nri.

dzhdanNV commented on June 12, 2024

It seems to me that you have a resource in DEVICE heap, which has several "assigned" SRVs (logical resources), which in order you want to update via a single Map / Unmap call, right? If yes, bad usage. In my comment above I explained good practice,

UploadData can be improved. Which API do you expect to see?

from nri.

vertver commented on June 12, 2024

which has several "assigned" SRVs (logical resources)

Not SRVs, CBVs. I'm using dynamic constant buffers for that case.

UploadData can be improved. Which API do you expect to see?

This function uses a queue to load resources and can only load one resource per desc. I think you can improve it by batch loading data into one larger HOST_UPLOAD buffer and then updating a couple of resources at a time. Also, it is sometimes useful to schedule upload commands in a command list and then upload them all at once. Instead, right now I need to create an upload buffer for each resource I want to upload, or have a separate ring buffer that allows me to batch uploading as I said earlier.

from nri.

dzhdanNV commented on June 12, 2024

Ouch, I understood. The truth is that UploadData has been designed for lazy guys. It's purpose is to upload data in the simplest potentially unoptimized way in LOADING phase. Kinda "fire and forget" way. Yes, I do agree that this function just doesn't suit for resources updating at run time.

What can we do? I think improving UploadData itself can be a good start:

replace commandQueue with commandBuffer
improve internal code (do more batching)

But, IMO, I'm personally planning to write my own streamer on "plain" NRI which is based on the stuff I already mentioned:

pre-allocated ring buffer in UPLOAD heap
a user enqueues a request containing:
- what to upload
- destination "range" in a resource located in DEVICE heap
streamer maps once and writes all requests in concatenated form
then copies to the clients on GPU (better via COPY queue)

Not SRVs, CBVs. I'm using dynamic constant buffers for that case

Here is how I update "dynamic CB" in my project:
https://github.com/NVIDIAGameWorks/RayTracingDenoiser/blob/master/Integration/NRDIntegration.hpp#L590

from nri.

vertver commented on June 12, 2024

replace commandQueue with commandBuffer

I don't think this is a good solution, because you can use this function multiple times in different queues (for example: I want to transfer texture mips in the copy queue and load critical path textures in the graphics queue, but they will use the same load buffer. Oops).

I think the second solution would be preferable. Also, I would like to have a different instance of the ring buffer for each queue (because sometimes you need to load data into both queues at the same time).

Here is how I update "dynamic CB" in my project:

Thank you for sample. I'll take a look at it.

from nri.

dzhdanNV commented on June 12, 2024

I don't think this is a good solution, because you can use this function multiple times in different queues (for example: I want to transfer texture mips in the copy queue and load critical path textures in the graphics queue, but they will use the same load buffer. Oops).

To be honest don't see a problem. Queue is used to create internal temp CommandBuffers and submit work, and wait for idle. It's good for loading, not suitable for real time. I think, that if we pass CommandBuffer instead this function will only write commands into the command buffer, but submitting the work and synchronization will move to the user side. Less convenient, but at least "non blocking". Not advertising this much, because UploadData will be turned into something else.

from nri.

dzhdanNV commented on June 12, 2024

...but they will use the same load buffer

For clarity, UploadData blocks execution, it "waits for idle" under the hood. Upload buffer is internal and different for each call of UploadData.

from nri.

vertver commented on June 12, 2024

For clarity, UploadData blocks execution, it "waits for idle" under the hood. Upload buffer is internal and different for each call of UploadData.

That's what I don't like about it. I would prefer more manual management.

from nri.

dzhdanNV commented on June 12, 2024

OK, the intermediate conclusion:

UploadData behaves how it has been designed
UploadData != StreamData (which currently doesn't exist)
I suggest you just take the contents of UploadData, modify according to your usage pattern and... later we can decide is there a sharable code, which can be organized into StreamData or is there a code blocks which can be used to improve UploadData

Deal? ;)

P.S. I made the following fixes so far in UploadData:

only 1 command allocator is needed even in case of mGPU
removed unnecessary reference stored in the helper class
NRI: removed nodeMask from CreateCommandAllocator (just not needed in any API)

from nri.

vertver commented on June 12, 2024

Deal? ;)

OK, why not. I'll get on it as soon as I get my hands on it.

from nri.

vertver commented on June 12, 2024

So I wrote snippet for this case, I think it might be helpful: https://gist.github.com/vertver/6f522aa83f09776d0ece7c11ea124835

from nri.

dzhdanNV commented on June 12, 2024

Thanks. I need to meditate on the code...

from nri.

dzhdanNV commented on June 12, 2024

Your snippet is useful, but I don't think it needs to be a part of NRI. Data streaming can be organized in one way or another, I would prefer to let people decide how to better solve this task on a higher level (application level). Even in your case, overflow is not handled: it might not be needed, or might be... Moreover, streaming functionality doesn't fit into HelperInterface, because it assumes "state" which need to be kept for some time, i.e. an object is needed. Probably, in the nearest future I will agree to add a standalone extension for data streaming, but not now. I see benefits though... Feel free to file a new issue "RFE: add streaming functionality for lazy guys" :)

Have we exhausted this issue? Can it be closed?

from nri.

dzhdanNV commented on June 12, 2024

Closing this one. Created a new issue #34 for adding "streaming" functionality.

from nri.

Resource management improvements about nri HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent