owl-project / owl Goto Github PK

View Code? Open in Web Editor NEW

231.0 231.0 51.0 16.2 MB

Home Page: http://owl-project.github.io

License: Apache License 2.0

CMake 5.39% C++ 70.21% C 3.71% Cuda 4.26% Makefile 0.26% Python 16.12% Shell 0.05%

owl's People

Contributors

Stargazers

Watchers

owl's Issues

feature request (minor): add sample/test that stresses instance xfms w/ rotations

we already have a sample that uses transforms (sierpinski), but that one only uses translations that doesn't affect normals, and doesn't need world-space vs object-space positions (and thus, doesn't use/test any of the point/vector/normals transforms required for 'real' transform matrices.

Would be nice to have an additional sample that used more complex transforms and also used/tested/demonstrated the optix world-object space transforms in the device code. This will likely require also inovlve adding some convenience funtions to the ll/deviceAPI.h.

Not critical, but would be nice to have.

Make cmake scripts find optix 7 sdk when in default locatoins

In particular on windows, htere's a default location for the SDK - should check for that if OptiX_INSTALL_DIR isn't set.

feature request: owl<Type>SetPointer()

owlSet is currnetly only implemented for 3f, 2i, etc, as well as Buffer ... but not for raw pointers.
Workardoun is to set the pointer as a 1ul variable (which works just fine), but htat's ugly.

potential bug: double-check bounds program launch dim size

dimension of cuda launch for coputing bounds of a usergeom is currently computed as follows:
uint32_t boundsFuncBlockSize = 128;
uint32_t numPrims = (uint32_t)ug->numPrims;
vec3i blockDims(owl::common::divRoundUp(numPrims,boundsFuncBlockSize),1,1);
vec3i gridDims(boundsFuncBlockSize,1,1);

This seems to work just fine so far BUT for large numers of prims has two (potential) issues:
a) the 'divRoundUp' on uint types may get overflows
b) the grid Dimensions in x may be too big for what cuda allows (64K on some cards).
May have to go to setting both x and y; then also need to fix the bounds call itself to adjust for that.

feature request: support DDS file format

It would be awesome if owl could support DDS file format :)

add vec3f(float4) constructor

currently need to write vec3f x = vec3f(vec4f(f4array[idx])) .... fix this.

Enable parallel bounds kernel calls for groups with lots of user geoms

Right now the bound kernel for all prims in one geom are run in parallel, but different geoms in a group are run sequentially (so only one kernel is in flight at any time). This works great when all prims are in few geoms, but creates giant built times for models with large numbers of independent geometries (ie, when each stream line is its own geom).

web page - fix broken links on about page

document dependencies and build process

... in readme.me.

Note: make sure optix 7 SDK and TBB are handled, in particular for windows

managed memory buffers should set 'read-mostly' flag

without this performance is bad when using multi-device

feature request: support for OptiX Denoiser

When using the high level OWL api, it's currently unclear how to implement the OptiX denoiser, as users do not have direct access to the optix context required for optixDenoiserCreate.

I could see adding support for the denoiser in one of two ways. Either 1, we could add some function which returns the current optix context handle to the end user. Or 2, we could implement the optix denoiser within OWL and define some abstraction API somehow.

Release with Debug builds in parent projects are disabled when including OWL

When I add_subdirectory(owl), I was finding that my RelWithDebug and other build types disappeared. It took me a while to find, but it seems that the configure_build_type.cmake is sneakily deleting them.

This was a pretty subtle issue I had a hard time fixing. I'd propose we remove that configure file, since OWL shouldn't be in control of parent project CMAKE_CONFIGURATION_TYPES or CMAKE_BUILD_TYPE.

enabel peer access during owl init

peer access is required for nvlink; should enable by default.

feature request: add buffer type that gets created on only one device

feature request (minor): add unit test for c99 compiiler

ideally have c99 version of at least one example; minimal solution is to at least include the header file, create a context, and exit.

feature request: OWL_BUFFER_GROUP and OWL_TEXTURE_GROUP

Currently, OWL does not support lists of buffers or lists of textures. Users instead must extract device pointers from OWLBuffers on a device by device basis, uploading these pointers to an OWLBuffer that's device-specific.

This usually results in applications sacrificing multi-GPU during prototyping, and it can quickly become too difficult to maintain multi-GPU compatibility long term. However, these lists of buffers and lists of texture handles are really important, eg for scenes with emissive triangles and/or textures accessed outside the closest hit.

At the same time, arbitrary buffers of pointers complicate matters under the hood for OWL. Buffers of buffer pointers might result in some pointers in that buffer to themselves point to buffers of pointers. Yuck!

Instead, I'd propose we add support for 1D groups of buffer pointers and groups of textures. These groups would act like flattened lists, with no possible multi-dimensional indirection. This way, OWL could under the hood create N flattened item lists, one per device, each flattened list containing only items allocated on the corresponding device.

If we could get that working, then it would be much easier to for OWL end users to maintain multiGPU compatibility. Later down the line, OWLBufferGroups and OWLTextureGroups could additionally contain hints about NVLink stuff.

web page - fix missing octocat ping on github link

move bounds memory into managed memory buffer

at the very least, specify a flag that allows using to enable this.

fix remaining 'gdt' usages

all gdt:: namespace and gdt/ include dirs to be replaced with owl-common; purge no-longer-needed components. Goal is for apps to eb able to use both gdt and owl without any naming or namespace conflicts.

check max group sizes upon accel build

Optix comes with certani limits in num prims per GAS, or num insts per IAS; going over these limits apparently triggers errors in debug mode, but apparently just creates wrong results in release mode. Should add test for those values, and trigger owl error if inputs execeed these limits.

device synchronizatoin after launch param launch only sync's first device

feature request (minor): add sierpinski/instancing sample/test to owl/ng samples

We already have a sample of instancing for the ll layer (sample 07 sierpinski), but none yet for the node graph layer.

TODO: take samples/ll/s07-sierpinski, and make ng version in samples/ng/s07-sierpinsiki. All requires API functionality should already be there, and devicecode should be exactly the same, but needs to get done.

add dedicated ManageMemoryBuffer type

right now we only have host pinned buffer (always on host) and devicebuffer (always on device).
for large models, would be good to have managedmemory as well - in theory that shoudl allow paging.

bug: crash when running multipel instance builds in parallel across different threads

add c-linkage dll api to ll-api layer

feature request: add "owlSetMissProg(rayTypeID,missProg)"

right now miss progs always get used in the order they are created; better would be to let user set that explicitly.

add 'make install' target for c-api

rename owlParamsLaunch into owlParamsLaunchAsync to better convey that such launches are asynchronous

feature: allow setting disable_anyhit flag on groups

needed to replicate HPG 2019 RTX tetmesh point query functionality

clean up instancing code

Just committed "hotfix" for missing owlInstanceGroupCreate() and owlInstanceGrouPSetTransform() to unblock project that needs those... .but that functionality isn't clean enogh, yet, and needs work.

Notes for now:

need to add matrix memory layout options - currently use affine3f 4x3f format, optix7 uses 3x4f ... need both
affine3f doesn't have a unit-matrix default constuctor, which is annoying/misleading. add one.
had to add world2object etc to ll05 devicecode to test this - remove
setChild() currently also sets a unit trasnform for that child, so currently must set in correct order or get wrong result. fix this.
setChild currently has to use a transform. should allow nullptr for 'use unit transform'.
need real instancing sample on ng layer --> sierpinsky?
need sample with proper transforms (worldtosapce etc); sierpinski doesn't do that.

instance builds are serial across multiple devices

Feature request: owlBufferUpload not implemented for managed memory buffers, host mapped buffers

More of a feature request than an issue.

I'm trying to convert my owlDeviceBuffers into owlManagedMemoryBuffers, to enable device-side memory oversubscription. However, since owlBufferUpload isn't implemented for owlManagedMemoryBuffers, I need to refactor my code to instead memcpy into a pointer through owlBufferGetPointer(). owlBufferGetPointer then wants me to specify which device the buffer belongs to, but managed memory and host memory don't belong to a particular device, so this is somewhat confusing.

If owl were to implement the same owlBufferUpload API for both managed memory buffers and host mapped buffers, it would be easier to refactor code to make the switch back and forth.

consistency: rename owlLaunchParmsSet -> owlParamsSet

we have owlParamsLaunch, but owlLaunchParamsSet ... make consistent.

add missing owlModuleRelease

most types have relaese functions, module does not.
not critical - contextdestroy already realeses module - but nice to have.

TBB_FOUND forced to FALSE on Win32

TBB works on windows, so this shouldn't be forced to false on Win32. Maybe this could be changed to an optional dependency for all platforms?

add optix include path to OWL_INCLUDES variable

memory leak in owlGroupAccelBuild for instance groups

instance groups dont' free the temp memory they use for building the instances; leading to a mmeory leak if those instances get rebuilt every frame.

probably best solution: make destructor of DeviceMemory free the memory it allocated.

reproducer: see iw/group-rebuild-test branch.

Add sanity checks for multi-level instancing

Current API allows user to specify max instancing depth, and then sets optix up in the right way ... but does not actually check whether the created graph of group nodes adheres to the specified limit.

Should add a check at least upon group creation time that tracks, for each created group, how deep the tree below it is, and if that matches the context's configured depth.

web page - fix figure caption on sample page

potential bug: check owl prime core dump when using geometry recursion depth of 7+

For recursion depth of 6 owl prime sample generates 16M triangles, and runs through.
For depth 7 and above (where it should produce 64M+ triangles), I currently get this:
error (/home/iwald/Projects/owl/owl/ll/TrianglesGeomGroup.cpp: line 227): an illegal memory access was encountered

add thread-safe monitors to c-api branch

feature request: equip owlBuildSBT() with (raygen|miss|all) flag

For simpe use cases, it'll be common to only change raygen values; doesn' tmake sense to force a omplete SBT rebuilt. Todo: add function that allows to rebuilt only the raygen and/or miss program(s).

Invalid Device Ordinal when CUDA_VISIBLE_DEVICES=1

Simplest way to reproduce is to run the interactive sample like this: CUDA_VISIBLE_DEVICES=1 ./int01-simpleTriangles . This Invalid Denice Ordinal error occurs in groupBuildAccel in DeviceGroup.cpp

I don't seem to run into the same issue with CUDA_VISIBLE_DEVICES=1 ./sample01-simpleTriangles. However, I do see the above exception with ViSII

cudaMemAdviseSetPreferredLocation fails with code 101: invalid device ordinal

Running into the above exception when using managed memory on my local environment. It seems we can fix this issue by checking if this cudaMemAdvise is supported by the current device, and fall back to not advising if it's not supported.

split instanceGroupCreate(.. children ... .xfms ...) into separate atomic functoins

ie, have one function that creates the group (and that also specifies num time steps for motion blut), then another functoin to set all children, one to set all transforms, etc.

make tbb optional

make cmake try to find TBB, and add single-threaded fallback if not found.
just too much trouble asking users to install tbb on windows.

Optional sample builds disabled, breaking parent projects linking GLFW statically

In ViSII, I link to GLFW statically before adding the owl subdirectory to my project.
Before, I disabled OWL from also linking in GLFW statically through the build samples option.
Now that that's commented out, I cannot disable OWL from linking in GLFW statically, and it's breaking my parent project's build.

owl-project / owl Goto Github PK

owl's People

Contributors

Stargazers

Watchers

Forkers

owl's Issues

Recommend Projects

Recommend Topics

Recommend Org