nvidia / warp Goto Github PK

View Code? Open in Web Editor NEW

3.6K 54.0 202.0 38.81 MB

A Python framework for high performance GPU simulation and graphics

Home Page: https://nvidia.github.io/warp/

License: Other

Python 71.80% C++ 19.48% Cuda 4.63% C 4.08% Shell 0.01% Batchfile 0.01%

warp's People

Contributors

Stargazers

Watchers

Forkers

cadop mfkiwl roelvdp iverb mambo4 xyyeh samirapakravan metavai wafi618 pieper horzelski nvfx fred7b fbxiang namwoo dtlindsey jimbae mmarcinkiewicz mmacklin fabiodr jjandnn desmondzhong python-repository-hub timills gsiogkas syguan96 dianajg7 kissinger31 ggangliu avasquee robertdigital yilingqiao 0000duck samkenxstream tmats aqua12138 milad-rakhsha w601sxs mshoe krishpop dlpf mihirk284 half-potato eric-heiden lupin4 nanotoolworks ricklentz pingchuanma 5l1v3r1 wyk96 dvkellerman muhammadmoizulhaq fenghuayumo gkrait jamesthesnake jgphpc mesa1014 adagolodjo maphysart tanew7391 rishis3d gg-big-org ruslanallayarov liudeke miraz12 phoenixdigitalfx wxbbuaa2011 matthias-research hmthanh mingj1125 leitoxve kmarchais joolstorrentecalo asmundizaki c0d1f1ed marcelroed kobe16 shaoxiongyao mingno dwblaikie dashuai1469 ccwutw amir-ashkezari alexandor91 cedrickode soheilappear akhenaton27 tavakkoliamirmohammad tl566 differs zixuanvickylu vital121 yingjiang96 knut0815 clemensschwarke louhz wenchanggaot antheali rabbit-hu pairlab

warp's Issues

just confirm: the default render using usd is not differentiable right?

Hi
Thank you for this awesome work!
just confirm: the default render using USD is not differentiable right?
if hope to use warp like gradSim, then need to replace the renderer with a differentiable renderer?
Thank you!

Compared to taichi

Hello!

Would it be possible to provide a comparison between this project and taichi?

https://github.com/taichi-dev/taichi

For starters, here's the example script for both projects.

Warp

import warp as wp
import numpy as np

wp.init()

num_points = 1024
device = "cuda"

@wp.kernel
def length(points: wp.array(dtype=wp.vec3),
           lengths: wp.array(dtype=float)):

    # thread index
    tid = wp.tid()
    
    # compute distance of each point from origin
    lengths[tid] = wp.length(points[tid])


# allocate an array of 3d points
points = wp.array(np.random.rand(num_points, 3), dtype=wp.vec3, device=device)
lengths = wp.zeros(num_points, dtype=float, device=device)

# launch kernel
wp.launch(kernel=length,
          dim=len(points),
          inputs=[points, lengths],
          device=device)

print(lengths)

Taichi

# python/taichi/examples/simulation/fractal.py

import taichi as ti

ti.init(arch=ti.gpu)

n = 320
pixels = ti.field(dtype=float, shape=(n * 2, n))


@ti.func
def complex_sqr(z):
    return ti.Vector([z[0]**2 - z[1]**2, z[1] * z[0] * 2])


@ti.kernel
def paint(t: float):
    for i, j in pixels:  # Parallelized over all pixels
        c = ti.Vector([-0.8, ti.cos(t) * 0.2])
        z = ti.Vector([i / n - 1, j / n - 0.5]) * 2
        iterations = 0
        while z.norm() < 20 and iterations < 50:
            z = complex_sqr(z) + c
            iterations += 1
        pixels[i, j] = 1 - iterations * 0.02


gui = ti.GUI("Julia Set", res=(n * 2, n))

for i in range(1000000):
    paint(i * 0.03)
    gui.set_image(pixels)
    gui.show()

wp.svd3 not working

I am trying to use svd3, here is a small example

`import warp as wp

wp.init()

x = wp.diag(wp.vec3(1., 2., 3.))
u = wp.diag(wp.vec3(0., 0., 0.))
v = wp.diag(wp.vec3(0., 0., 0.))
s = wp.vec3(0., 0., 0.)

wp.svd3(x, u, s, v)`

I get this error message when running the code

File "/home/kasra/Postdoc/projects/auto_design/code/alaki3.py", line 12, in <module> wp.svd3(x, u, s, v) File "/home/kasra/anaconda3/envs/auto_design/lib/python3.8/site-packages/warp/context.py", line 189, in __call__ raise error File "/home/kasra/anaconda3/envs/auto_design/lib/python3.8/site-packages/warp/context.py", line 163, in __call__ value_type = type_ctype(f.value_func(None)) File "/home/kasra/anaconda3/envs/auto_design/lib/python3.8/site-packages/warp/context.py", line 155, in type_ctype elif issubclass(dtype, ctypes.Array): TypeError: issubclass() arg 1 must be a class

Can't find warp in omniverse kit

I clone this repository and build it
Run some example and get some usd files
But I can't reproduce the whole simulation but the fisrt frame
So I guess maybe I should install warp extesion in omniverse kit
Unfortunately， I can't find it
The version for omniverse kit is 103.1.1 the latest

Colab support

Just a heads up there seems to be some trouble getting warp running on colab. Installation itself can be patched through this

!export CUDA_HOME=/usr/local/cuda-11.1
!git clone https://github.com/NVIDIA/warp
%cd /content/warp
!python build_lib.py --cuda_path /usr/local/cuda-11.1
!pip install -e .

But there still seems to be an issue with running kernels that call an external function (wp.func), which I'm trying to track down.

Any plans on adding collision handling between rigid-bodies?

Failed to lookup kernel function

Hello I have relatively simple warp kernel that I am trying to use as a loss function in computer vision

All works most of the time but in some iterations I get

Failed to lookup kernel function

on the same data in the same settings the same function call sometimes works and sometimes do not, ussually it is not the first call so library is properly loaded and warp.init called

Additionally for debugging purposes the kernel definition is put in the same file as the function that is using it

is_cuda_available crash

Just installed this for fun on a computer without a GPU and found a bug, maybe easy to fix?

>>> import warp as wp
>>> wp.is_cuda_available()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/warp/context.py", line 871, in is_cuda_available
    return runtime.cuda_device != None
AttributeError: 'NoneType' object has no attribute 'cuda_device'

Unable to run warp exemple on Windows WLS.

Hi,
I am very interested in what warp seems to do and I wanted to test it on my local desktop.
So I tried the installation procedure mentioned in the main page and tried to run the tests.
I get the following output which indicate their is an error and CUDA is not recognized.

 python -m warp.tests
Warp CUDA error: Could not open libcuda.so.
Warp 0.7.2 initialized:
   CUDA not available
   Devices:
     "cpu"    | x86_64
   Kernel cache: /home/rcremese/.cache/warp/0.7.2

Skipping Torch tests due to exception: No module named 'torch'
Skipping Torch DLPack tests due to exception: No module named 'torch'
Skipping Jax DLPack tests due to exception: No module named 'jax'
test_volume_allocation_f (warp.tests.test_volume.register.<locals>.TestVolumes) ... ok
test_volume_allocation_i (warp.tests.test_volume.register.<locals>.TestVolumes) ... ok
test_volume_allocation_v (warp.tests.test_volume.register.<locals>.TestVolumes) ... ok
test_volume_introspection (warp.tests.test_volume.register.<locals>.TestVolumes) ... ok
test_volume_sample_linear_f_gradient (warp.tests.test_volume.register.<locals>.TestVolumes) ... ok
test_volume_sample_linear_v_gradient (warp.tests.test_volume.register.<locals>.TestVolumes) ... ok
test_volume_store (warp.tests.test_volume.register.<locals>.TestVolumes) ... ok
test_volume_transform_gradient (warp.tests.test_volume.register.<locals>.TestVolumes) ... ok
test_func_export_cpu (warp.tests.test_func.register.<locals>.TestFunc) ... ok
test_addition_float16 (warp.tests.test_quat.register.<locals>.TestQuat) ... ERROR

======================================================================
ERROR: test_addition_float16 (warp.tests.test_quat.register.<locals>.TestQuat)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rcremese/mambaforge/envs/physic-env/lib/python3.9/site-packages/warp/tests/test_base.py", line 144, in test_func
    func(self, device, **kwargs)
  File "/home/rcremese/mambaforge/envs/physic-env/lib/python3.9/site-packages/warp/tests/test_quat.py", line 395, in test_addition
    wp.launch(kernel, dim=1, inputs=[q,v,], outputs=[r0,r1,r2,r3], device=device)
  File "/home/rcremese/mambaforge/envs/physic-env/lib/python3.9/site-packages/warp/context.py", line 2091, in launch
    success = kernel.module.load(device)
  File "/home/rcremese/mambaforge/envs/physic-env/lib/python3.9/site-packages/warp/context.py", line 863, in load
    raise RuntimeError("Failed to build CPU module because no CPU buildchain was found")
RuntimeError: Failed to build CPU module because no CPU buildchain was found

----------------------------------------------------------------------
Ran 10 tests in 0.006s

FAILED (errors=1)

I guess this is due to the fact that I test it on a Windows WLS2 with an Ubuntu 20.04 distro. Nevertherless, I installed the CUDA toolkit on my Windows machine as explained here and checked that CUDA is recognized in WSL (cf. below).

 nvidia-smi
Tue Mar  7 14:29:45 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 531.14       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2070 w...    On | 00000000:01:00.0 Off |                  N/A |
| N/A   41C    P5               12W /  N/A|      0MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Therefore, I wonder where the problem might come from and if it's possible to use warp on Windows WSL.
Thank you.

Remove `import *` from package

The use of import * is helpful in some cases, but it makes the codebase extremely difficult to follow and debug. For example, the recent issue for urdf #5 (comment) seems to be a problem with the Mesh class. When following the trace in a local copy of the package, it turns out it calls the Mesh class. But it seems like it should call the Mesh class... obviously this is confusing...

I only found the problem because after wondering why the next error (after fixing the previous one) results in 'Mesh' object has no attribute 'finalize' , even though its in the same file and clearly has that function.

I suspect import * is causing a silent problem with import resolution (not sure why the tests don't see this).

parse_urdf errors

In parse_urdf() there is a call to urdfpy.urdf.load, I think this is supposed to be urdfpy.URDF.load.

There is also builder.add_link but this gives the error AttributeError: 'ModelBuilder' object has no attribute 'add_link', which I guess is true since I don't see it.

AttributeError: module 'warp' has no attribute 'sub'

Hi, I'm trying to execute this sample

https://github.com/matthias-research/pages/blob/master/tenMinutePhysics/16-GPUCloth.py

However, I wasn't able to run it properly. In particular, I get the following message.

  File "c:\Users\Alex\Desktop\16-GPUCloth.py", line 875, in <module>
    z = wp.sub(x, y)
AttributeError: module 'warp' has no attribute 'sub'

Computing Jacobians

What is the best way to compute the Jacobian $\mathbf{J} \in \mathbb{R}^{m,n}$ of a function $\mathbf{f}(\mathbf x): \mathbb{R^n \to \mathbb{R}^m}$ in the Warp framework?

What is CUDA_PATH and where to specify it in the build_lib.py file?

Build on macOS M1?

Is it possible to build Warp on MacOS M1?

Thanks!

Can warp differentiable with cloth materials?

Material parameters of mesh like mu lamba are all floats type. Can I replace them with wp array and set them requires_grad?

`wp.mesh_query_point` throws access violation

I think there are some nuances to this issue and this is my best guess.

As the title stands, I could see an argument for saying its not a bug. However, there should be a better error. Or instead, when creating the mesh, find a way to check if it is invalid upon creation.

even though I turn on debug mode and breakpoints in vscode, the only thing I can see is access violoation on the original launch. it was by trial and error and commenting out things until I got to where it seems to happen.

The same code leads to different results on old and new version of Warp

I am trying to run an existing piece of code on Warp, but got different results on warp version 0.2.0 and 0.7.2. The code I'm trying to run is a two-ball collision scenario.

Here is the results from Warp 0.2.0. As you can see, warp outputs reasonable gradients on dl/dx0, dl/dv0, and dl/du0 (The full description of those items can be found at the the original repo.

Warp initialized:
   Version: 0.2.0
   Using CUDA device: NVIDIA GeForce RTX 3090
   Using CPU compiler: /usr/bin/g++
Module utils.customized_integrator_xpbd load took 432.12 ms
Module _two_balls_1_warp load took 159.95 ms
------------Task 3: Position-based Dynamics (Warp)-----------
loss: [2.0605943]
gradient of loss w.r.t. initial position dl/dx0: [[-0.3609619 -0.3609619]
 [-0.3609619 -0.3609619]]
gradient of loss w.r.t. initial velocity dl/dv0: [[-0.47226432 -0.47226432]
 [-0.24966814 -0.24966814]]
gradient of loss w.r.t. initial ctrl dl/du0: [[ 0.00026612  0.00026612]
 [-0.00052014 -0.00052014]]
---------start training------------
^C

On warp 0.7.2, although the forwarding results are the same (the loss is still 2.06), the computed gradients are totally different:

Warp CUDA error: Could not open libcuda.so.
Warp 0.7.2 initialized:
   CUDA not available
   Devices:
     "cpu"    | x86_64
   Kernel cache: /home/user/.cache/warp/0.7.2
Module utils.customized_integrator_xpbd load on device 'cpu' took 823.64 ms
Module _two_balls_1_warp load on device 'cpu' took 263.55 ms
------------Task 3: Position-based Dynamics (Warp)-----------
loss: [2.0605943]
gradient of loss w.r.t. initial position dl/dx0: [[-2.   -2.  ]
 [-1.25 -1.25]]
gradient of loss w.r.t. initial velocity dl/dv0: [[-23400.346 -23400.346]
 [ -4669.934  -4669.934]]
gradient of loss w.r.t. initial ctrl dl/du0: [[-48.746975 -48.746975]
 [ -9.72903   -9.72903 ]]
---------start training------------
^C

System configurations

OS: Ubuntu 18.04 LTS
CPU: AMD Ryzen Threadripper 3970X 32-Core Processor
GPU: NVIDIA RTX 3090
CUDA: 11.5

How to reproduce

First, clone the repository https://github.com/DesmondZhong/diff_sim_grads
Next, create a conda environment with python 3.9.
For installing warp 0.2.0, run:

pip install -r requirements_freeze.txt

For installing warp 0.7.2, run:

pip install warp-lang usd-core omegaconf matplotlib

After installing necessary packages, run the following commands to see the difference in warp 0.2.0 and warp 0.7.2 environments.

export PYTHONPATH=[abs_path_to_diff_sim_grads]
cd diff_sim_grads/task3_two_balls
python two_balls_1_pbd_warp.py

, where [abs_path_to_diff_sim_grads] is the absolute path to the clone repo.

adjoint=False ignored in launch() for some data types?

In a simple kernel that multiplies an array of matrices with an array of vectors I get a CUDA compilation error when using an array of mat44 and arrays of vec4

error: no suitable constructor exists to convert from "void" to "wp::mat44"

This does not happen if I define and launch a similar kernel with mat33 and vec3

I have attached both the working 3x3 and the non-working 4x4 Python samples along with the full error text

files.zip

To replicate simply run

python test_works.py

and

python test.py

The last one gives the error.

extra info:

Warp Version: 0.1.25   Using CUDA device: NVIDIA GeForce GTX 1080
NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5

Are reductions possible? (e.g minimum/maximum)

Hi there!

Thanks for making this library available - I am curious if there is a way to do reductions, for example I want to find the nearest point to a line (distance + index), for that some kind of reduction seems necessary.

Is this possible at the moment? I am presuming a for-loop is not the correct attempt, but from what I can tell the only reductions are additions or subtractions currently?

Thanks,
Oliver

Q: Inspecting generated kernel code?

Is it possible to inspect the code that was generated for user kernels (for both forward and backward pass)?

Example crashed

I am really intestrest in the cloth simulation， So I want to run the "example_sim_humanoid.py"
But, Get an error:

Some example works well:

I guess the human was represented by the union of capsule sdf and some works from

airMesh & sdfconact

Zero-copy conversion to CuPy array?

Is there a supported approach to converting a warp array to a cupy array, ideally without copying?

String arguments to a kernel launch

Is it possible to pass string type arguments as inputs in wp.launch()? Warp throws an error when I try to do it.
"RuntimeError: Error launching kernel, unable to pack kernel parameter type <class 'str'> for param bc, expected <class 'str'>"

Joints of URDF are not rendered

When running the examples and previewing in omniverse through the stored USD, it seemed like there were joints, but this was only by playing the simulation back. However, there doesn't seem to be any joints

Documentation typo.

https://github.com/NVIDIA/warp/blob/main/warp/types.py#L981

wp.init() crash with error of Could not find module '<..>warp.dll' (or one of its dependencies)

Hi!

I've installed warp with pip: pip install warp-lang. After I try to execute

import warp as wp
wp.init()

I get a FileNotFoundError:

Traceback (most recent call last):
  File ".\examples\example_sim_ant.py", line 26, in <module>
    wp.init()
  File "C:\Users\mariako\anaconda3\envs\warp\lib\site-packages\warp\context.py", line 2173, in init
    runtime = Runtime()
  File "C:\Users\mariako\anaconda3\envs\warp\lib\site-packages\warp\context.py", line 1050, in __init__
    self.core = warp.build.load_dll(os.path.join(bin_path, warp_lib))
  File "C:\Users\mariako\anaconda3\envs\warp\lib\site-packages\warp\build.py", line 323, in load_dll
    dll = ctypes.CDLL(dll_path, winmode=0)
  File "C:\Users\mariako\anaconda3\envs\warp\lib\ctypes\__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\mariako\anaconda3\envs\warp\Lib\site-packages\warp\bin\warp.dll' (or one of its dependencies). Try using the full path with constructor syntax.

However, the dll file is on it's place

I checked warp.dll dependencies with https://github.com/lucasg/Dependencies tool, and some entries looks suspicious

However, I'm sure what to do from here. May I ask to take a look

I'm using Windows 11, python 3.8 in conda environment

Reuse of allocated buffers

I have used warp to create a custom pytorch operator and it works great, but deallocating and reallocating memory not only takes a lot of time, but the timing of the GC effects the amount of memory allocated. Is there a way to allow warp to reuse allocated memory, especially when the buffers are the same size, besides writing my own allocator?

Suggestion: Make `wp.array` class Generic

Hello! I've got a question: Have you considered making wp.array a Generic type, rather than passing the arguments to the constructor in type annotations?

For example, from this:

@wp.kernel
def apply_forces(grid : wp.uint64,
                 particle_x: wp.array(dtype=wp.vec3),
                 particle_v: wp.array(dtype=wp.vec3),
                 particle_f: wp.array(dtype=wp.vec3),
                 radius: float,
                 k_contact: float,
                 k_damp: float,
                 k_friction: float,
                 k_mu: float):
   ...

to this:

@wp.kernel
def apply_forces(grid : wp.uint64,
                 particle_x: wp.array[wp.vec3],
                 particle_v: wp.array[wp.vec3],
                 particle_f: wp.array[wp.vec3],
                 radius: float,
                 k_contact: float,
                 k_damp: float,
                 k_friction: float,
                 k_mu: float):
   ...

This would have the following benefits:

This would make the annotations "valid" (i.e. no calls inside annotations), so that type checkers could be used in the codebase.
This would make it possible to enable postponed evaluation of type annotations in the user code (https://peps.python.org/pep-0563/), which doesn't seem to be supported atm (but correct me if I'm wrong).

I assume you're using something like typing.get_type_hints or the __annotations__ dict directly in wp.kernel to extract the type annotations from the function, correct?
With a generic wp.array type, the dtype can still be easily be recovered using typing.get_args on the annotation.

Let me know what you think!

Computing second order derivatives!

Do you plan to support second order derivatives? Or there is already a way to do these types of computations.

import numpy as np
import math

import torch

import warp as wp
import warp.torch

device = "cuda"

wp.init()


@wp.kernel
def test_kernel(
    x : wp.array(dtype=float),
    y : wp.array(dtype=float)):

    tid = wp.tid()

    y[tid] = 0.5 - x[tid]*x[tid]*2.0


# define PyTorch autograd op to wrap simulate func
class TestFunc(torch.autograd.Function):

    @staticmethod
    def forward(ctx, x):

        # allocate output array
        y = torch.empty_like(x)

        ctx.x = x
        ctx.y = y

        wp.launch(
            kernel=test_kernel, 
            dim=len(x), 
            inputs=[wp.torch.from_torch(x)], 
            outputs=[wp.torch.from_torch(y)], 
            device=device)

        return y

    @staticmethod
    def backward(ctx, adj_y):
        
        # adjoints should be allocated as zero initialized
        adj_x = torch.zeros_like(ctx.x).contiguous()
        adj_y = adj_y.contiguous()

        wp.launch(
            kernel=test_kernel, 
            dim=len(ctx.x), 

            # fwd inputs
            inputs=[wp.torch.from_torch(ctx.x)],
            outputs=[None], 

            # adj inputs
            adj_inputs=[wp.torch.from_torch(adj_x)],
            adj_outputs=[wp.torch.from_torch(adj_y)],

            device=device,
            adjoint=True)

        return adj_x


# define PyTorch autograd op to wrap simulate func
class TestFuncTorch(torch.autograd.Function):

    @staticmethod
    def forward(ctx, x):

        ctx.x = x
        y = 0.5 - 2. * x**2
        return y

    @staticmethod
    def backward(ctx, adj_y):
        
        
        return adj_y * (- 4. * x)


# input data
x = 2.0 * torch.ones(16, dtype=torch.float32, device=device, requires_grad=True).contiguous()

# Pure torch
y = TestFuncTorch.apply(x)


dydx = torch.autograd.grad(y.sum(), x, retain_graph=True, create_graph=True)[0]
print(dydx )

d2ydx2 = torch.autograd.grad(dydx.sum(), x, retain_graph=True, create_graph=True)[0]
print(d2ydx2)


# execute op
y = TestFunc.apply(x)

dydx = torch.autograd.grad(y.sum(), x, retain_graph=True, create_graph=True)[0]
print(dydx )

try:
    d2ydx2 = torch.autograd.grad(dydx.sum(), x, retain_graph=True, create_graph=True)[0]
    print(d2ydx2)
except:
    raise ValueError("No gradient!")

OSError: warp.so: undefined symbol

Hi all,

On Ubuntu 20.04, Anaconda python 3.9.7, cudatoolkit 11.3. I followed the installation in the guide and I am getting the following error. Why is this happening?

$ python examples/example_raycast.py 
Traceback (most recent call last):
  File "/home/user/scratch_space/warp/warp/examples/example_raycast.py", line 23, in <module>
    wp.init()
  File "/home/user/scratch_space/warp/warp/warp/context.py", line 1097, in init
    runtime = Runtime()
  File "/home/user/scratch_space/warp/warp/warp/context.py", line 470, in __init__
    self.core = warp.build.load_dll(warp_lib)
  File "/home/user/scratch_space/warp/warp/warp/build.py", line 285, in load_dll
    dll = ctypes.CDLL(dll_path)
  File "/home/user/anaconda3/envs/warp/lib/python3.9/ctypes/__init__.py", line 382, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/user/scratch_space/warp/warp/warp/bin/warp.so: undefined symbol: _ZN2wp24hash_grid_rebuild_deviceERKNS_8HashGridEPKNS_4vec3Ei

make the non-commercial clause more visible in the readme

This looks like an interesting library, but after reviewing the license I realized it says it's for non-commercial use only. I think that information should be made more clear so people don't invest time unless they are comfortable with that limitation. (Of course an even better solution would be to change the license to something like Apache 2, in case you are open to that option.)

Calling ```wp.launch``` inside kernel

How is the user supposed to implement functionality with two levels of parallelism, for example matrix—matrix multiplication?

Minimal example

My first thought was to implement two kernels, where one performs matrix—vector multiplication and the second one calls the first kernel for each column of the second matrix. I know that the following could be done via multi-dimensional grid bounds, but just for a sake of it.

E.g. for matrixes and vectors of size 3:

@wp.func
def row_mult(row: wp.array(dtype=float), v: wp.array(dtype=float)):
    return row[0]*v[0] + row[1]*v[1] + row[2]*v[2]

@wp.kernel
def mat_vec_mul(a: wp.array(dtype=float, ndim=2), b: wp.array(dtype=float), c: wp.array(dtype=float)):
    id = wp.tid()
    c[id] = row_mult(a[id], b)


@wp.kernel
def mat_mat_mul(a: wp.array(dtype=float, ndim=2), b: wp.array(dtype=float, ndim=2), c: wp.array(dtype=float, ndim=2)):
    id = wp.tid()
    wp.launch(kernel=mat_vec_mul, dim=3, inputs=[a, b[id]], outputs=[c[id]], device="cuda")


# calling
dim = 15
a = wp.array(np.arange(9).reshape((3,3)), dtype=float, device="cuda")
c = wp.array(0*np.eye(dim, dtype=float), dtype=float, device="cuda")
b = wp.array(np.eye(dim, dtype=float), dtype=float, device="cuda")

wp.launch(kernel=mat_mat_mul, dim=dim, inputs=[a, b], outputs=[c], device="cuda")
print(c)

But apparently this causes RuntimeError: Could not find function wp.launch as a built-in or user-defined function. Note that user functions must be annotated with a @wp.func decorator to be called from a kernel.

Any workaround for this?

My use case

My use case is to compute Jacobins of an unrolled simulation trajectory, i.e.

simulate for T steps and save the sub_computation_graphs between every two subsequential steps
launch kernel that computes Jacobian for every step (according to its sub_computation_graph)

Structs in functions

So far it seems structs are supported as arguments to kennels but cannot be used in functions. It would be nice to be able to add support for structs in functions for both arguments and return values. This should be considered low priority. I put it here in case others find the same issue I did.

FileNotFoundError: Could not find module 'warp.dll'

After building from source, I encounter the following error when running tests.

FileNotFoundError: Could not find module 'warp.dll' (or one of its dependencies). Try using the full path with constructor syntax.

After spending some time on the problem, I figured out that this is due to a bug of Python 3.8 on Windows.

This stackoverflow post pointed out the issue.
https://stackoverflow.com/questions/59330863/cant-import-dll-module-in-python/64472088#64472088

The author of the above post has reported this bug and it has been fixed for future python versions.
python/cpython#86280

In short, the solution is to explicitly pass winmode=0 to ctypes.CDLL. In Python 3.8, the default value of winmode is set to None, which is unintended as it is not consistent with the documentation. This argument only affects the loading logic under Windows.

Besides this change, I also need to pass the full path of the file warp.dll instead of just the filename in order to successfully load the file. This is suggested in the error message above.

I'll be creating a pull request to fix this issue soon. I think python 3.8 users on Windows will encounter this problem and will benefit from the fix.

My environment

OS: Windows 10
Python version: 3.8.10
MSVC Compiler Version (cl.exe version) 19.26.28806 for x64
Warp version: 0.1.25

Bug with `wp.pow`

Seems to be a bug with wp.pow for negative values:

@wp.kernel
def test(tt: wp.array(dtype=wp.float32)):
  tt[0] = wp.pow(-1., 2.)

tt = wp.zeros((1,), dtype=wp.float32, device='cuda')
wp.launch(test, dim=1, inputs=[tt], device='cuda')
print(tt)

The above prints [nan] while it should print [-1]. Could be an issue with my graphics card who knows but if not then it's probably a bug somewhere. (I'm getting the correct value with device='cpu'.)

Warp not able to find its own function

I have a @wp.func that is part of a larger call history. However, the part that it is failing at is:

@wp.func
def calc__force(rr_i: wp.vec3, 
                     ri: float, 
                     vv_i: wp.vec3, 
                     pn_rr: wp.array(dtype=wp.vec3), 
                     pn_vv: wp.array(dtype=wp.vec3), 
                     pn_r: wp.array(dtype=float),
                     ):
    force = wp.zeros_like(rr_i)

Resulting in:

(The top part of the error, with warp being properly initialized)

Warp initialized:
   Version: 0.3.1
   CUDA device: NVIDIA RTX A4500
   Kernel cache: C:\Users\local user\AppData\Local\NVIDIA Corporation\warp\Cache\0.3.1
Error: Could not find function wp.zeros_like as a built-in or user-defined function. Note that user functions must be annotated with a @wp.func decorator to be called from a kernel. while transforming node <class '_ast.Call'> in func: calc_agent_force at line: 11 col: 12:
        force = wp.zeros_like(rr_i)

and then the same function is being mentioned at the bottom of the error

....\lib\site-packages\warp\codegen.py", line 1054, in eval
    raise RuntimeError(f"Could not find function {'.'.join(path)} as a built-in or user-defined function. Note that user functions must be annotated with a @wp.func decorator to be called from a kernel.")        
RuntimeError: Could not find function wp.zeros_like as a built-in or user-defined function. Note that user functions must be annotated with a @wp.func decorator to be called from a kernel.

Based on the docs, this should be working. However, I assume there is something else wrong, somewhere, and this error is being incorrectly displayed.

feature request: wp.bool type

currently the types available are:

scalar_types = [int8, uint8, int16, uint16, int32, uint32, int64, uint64, float32, float64]
vector_types = [vec2, vec3, vec4, mat22, mat33, mat44, quat, transform, spatial_vector, spatial_matrix]

there doesn't seem to be a bool type?

My specific use-case is to implement something like the activeIndices from Nvidia Flex that activates / deactivates particles on the GPU while still keeping them in memory.

will Warp be able to cover all the features of the C++ CUDA API?

I'm a fresh learner for CUDA and was getting familiar with C++ CUDA API. This repo looks interesting! I'm wondering will it later support all the C++ CUDA API completely, by using only python to implement any kind of CUDA kernels?

Inconsistent behavior in example_dem.py

From example_dem.py:

def update(self):

    with wp.ScopedTimer("simulate", active=True):

        if (self.use_graph):

            with wp.ScopedTimer("grid build", active=False):
                self.grid.build(self.x, self.grid_cell_size)

            with wp.ScopedTimer("solve", active=False):
                wp.capture_launch(self.graph)
                wp.synchronize()

        else:
            for s in range(self.sim_substeps):

                with wp.ScopedTimer("grid build", active=False):
                    self.grid.build(self.x, self.point_radius)

                with wp.ScopedTimer("forces", active=False):
                    wp.launch(kernel=apply_forces, dim=len(self.x), inputs=[self.grid.id, self.x, self.v, self.f, self.point_radius, self.k_contact, self.k_damp, self.k_friction, self.k_mu], device=self.device)
                    wp.launch(kernel=integrate, dim=len(self.x), inputs=[self.x, self.v, self.f, (0.0, -9.8, 0.0), self.sim_dt, self.inv_mass], device=self.device)
            
            wp.synchronize()

The captured kernels are defined as:

if (self.use_graph):

    wp.capture_begin()

    for s in range(self.sim_substeps):

        with wp.ScopedTimer("forces", active=False):
            wp.launch(kernel=apply_forces, dim=len(self.x), inputs=[self.grid.id, self.x, self.v, self.f, self.point_radius, self.k_contact, self.k_damp, self.k_friction, self.k_mu], device=self.device)
            wp.launch(kernel=integrate, dim=len(self.x), inputs=[self.x, self.v, self.f, (0.0, -9.8, 0.0), self.sim_dt, self.inv_mass], device=self.device)
        
    self.graph = wp.capture_end()

If use_graph==True, grid.build() is called once, then the kernels are replayed 64 times. If use_graph==False, grid.build() is called every time the kernels are called. I think this is a bug.

Can't find warp in extensions list

I'm not able to find warp in omniverse create/kit extensions list, do I need to turn on some settings?

thanks

can't import after pip install

I tried pip install warp as described in this blog post but even after it says it was installed I get this:

ubuntu@ip-172-31-31-7:/mnt/extra/pieper/SlicerDMRI$ pip3 install warp
Collecting warp
  Using cached warp-0.0.1-py3-none-any.whl (1.1 kB)
Installing collected packages: warp
Successfully installed warp-0.0.1
ubuntu@ip-172-31-31-7:/mnt/extra/pieper/SlicerDMRI$ python3
Python 3.8.10 (default, Nov 26 2021, 20:14:08) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import warp
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'warp'

This is ubuntu 20.04 with the standard python and pip packages.

Support for Apple M1

It would be nice to have support for the Apple M1, at least for the CPU device, to make warp code portable to more environments, in my case to use warp for teaching university courses. When I tried to run a simple test on the M1, the execution breaks with the error

OSError: dlopen(/opt/homebrew/lib/python3.9/site-packages/warp/bin/warp.dylib, 0x0006): tried: '/opt/homebrew/lib/python3.9/site-packages/warp/bin/warp.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))

I am sure sure if this is due to my installation or simply that warp does not get packaged with armv8 support.

Failed to open USD file

When running examples, the errors show that I can't open USD files. I've installed usd-core.

(warp) xx@xx:~/code/warp$ python examples/example_mesh.py 
Warp initialized:
   Version: 0.1.25
   Using CUDA device: GeForce RTX 2060
   Using CPU compiler: /usr/lib/ccache/g++
Traceback (most recent call last):
  File "example_mesh.py", line 103, in <module>
    usd_stage = Usd.Stage.Open(os.path.join(os.path.dirname(__file__), "assets/bunny.usd"))
pxr.Tf.ErrorException: 
        Error in 'pxrInternal_v0_22__pxrReserved__::UsdStage::Open' at line 878 in file /opt/USD/pxr/usd/usd/stage.cpp : 'Failed to open layer @assets/bunny.usd@'

Point Inside Mesh

Hello there!

Is the method implemented in WARP for testing if a point is inside a closed mesh correct? I am seeing that it checks the dot product of the difference vector between the query point and the closest point on the mesh with the triangle normal to see if is inside. However, e.g. in this paper they say that always using the triangle normal is incorrect if the closest point is on a vertex or an edge of the mesh. Paper: "Generating Signed Distance Fields From Triangle Meshes". Instead they use vertex and edge (pseudo) normals to get the correct sign of the distance.

any plan on supporting float64 computations?

nvrtc: error: failed to open nvrtc-builtins64_113.dll

I got this error message when running "python -m warp.tests":

nvrtc: error: failed to open nvrtc-builtins64_113.dll

environment: Win 10 x64; CUDA toolkit 11.5; Python 3.8.7; Nvidia graphics driver 496.13; Visual Studio 2019

steps to reproduce: 1. download and unzip v0.1.25-alpha; 2. "pip install ." ; 3. "pip install usd-core" 4. "python -m warp.tests". 5. error.

The .dll file in question is found in both the warp project directory, and also in the python install directory:

"C:\Users<my username>\AppData\Local\Programs\Python\Python38\Lib\site-packages\warp\bin\nvrtc-builtins64_113.dll"

So maybe it's just a pathing issue - any pointers?

Screenshot attached.

add_body docstring does not match function

There seems to be some missing arguments in the docstring

warp/warp/sim/model.py

Line 652 in fc7d325

"""Adds a rigid body to the model.

TypeError in warp.sim.model

Thank you for this cool library!

I noticed a small issue in the function ModelBuilder.add_soft_mesh in warp/sim/model.py, line 1787 :

p = wp.quat_rotate(rot, v * scale) + pos

where pos is List[float]. Seems that addition between wp.vec3 and List[float] is not supported and this line should be

p = np.array(wp.quat_rotate(rot, v * scale)) + pos

according to other lines in the same file.

Thank you!

warp.so: undefined symbol: _ZN2wp24hash_grid_rebuild_deviceERKNS_8HashGridEPKNS_4vec3Ei

Hey there, just installed the repo, and did the following:

$ conda create -n warp python=3.9
...
$ conda activate warp
$ pip install -e .
...
$ python examples/example_dem.py 
Traceback (most recent call last):
  File "/home/fabrice/Source/warp/examples/example_dem.py", line 24, in <module>
    wp.init()
  File "/home/fabrice/Source/warp/warp/context.py", line 1097, in init
    runtime = Runtime()
  File "/home/fabrice/Source/warp/warp/context.py", line 470, in __init__
    self.core = warp.build.load_dll(warp_lib)
  File "/home/fabrice/Source/warp/warp/build.py", line 285, in load_dll
    dll = ctypes.CDLL(dll_path)
  File "/home/fabrice/miniconda3/envs/warp/lib/python3.9/ctypes/__init__.py", line 382, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/fabrice/Source/warp/warp/bin/warp.so: undefined symbol: _ZN2wp24hash_grid_rebuild_deviceERKNS_8HashGridEPKNS_4vec3Ei

So then I tried:

$  python build_lib.py 
Namespace(msvc_path=None, sdk_path=None, cuda_path=None, mode='release', verbose=True)
Warning: building without CUDA support
Building /home/fabrice/Source/warp/warp/bin/warp.so
g++ -O3 -DNDEBUG -DWP_CPU  -fPIC --std=c++11 -c "/home/fabrice/Source/warp/warp/native/warp.cpp" -o "/home/fabrice/Source/warp/warp/native/warp.cpp.o"
build took 1178.73 ms
g++ -shared -Wl,-rpath,'$ORIGIN' -o '/home/fabrice/Source/warp/warp/bin/warp.so' "/home/fabrice/Source/warp/warp/native/warp.cpp.o"
link took 72.52 ms
$ python examples/example_dem.py 
Traceback (most recent call last):
  File "/home/fabrice/Source/warp/examples/example_dem.py", line 24, in <module>
    wp.init()
  File "/home/fabrice/Source/warp/warp/context.py", line 1097, in init
    runtime = Runtime()
  File "/home/fabrice/Source/warp/warp/context.py", line 470, in __init__
    self.core = warp.build.load_dll(warp_lib)
  File "/home/fabrice/Source/warp/warp/build.py", line 285, in load_dll
    dll = ctypes.CDLL(dll_path)
  File "/home/fabrice/miniconda3/envs/warp/lib/python3.9/ctypes/__init__.py", line 382, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/fabrice/Source/warp/warp/bin/warp.so: undefined symbol: _ZN2wp24hash_grid_rebuild_deviceERKNS_8HashGridEPKNS_4vec3Ei

(No change).

OS: Ubuntu 20.04 LTS

$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

I'm probably doing something wrong, and I didn't read all the README/documentation thoroughly. But I thought I'd post this, just in case this information might be useful.

Cheers!