Git Product home page Git Product logo

catmull_clark_with_local_patches's Introduction

Catmull_Clark_with_local_patches

Another old (2017) school project (MSIM 715). This one tests how memory optimization could affect Catmull-Clark algorithm

Catmull-Clark with CUDA and openMP

Sub-division surfaces

  • Splines
    • “Infinite” resolution
    • Smooth curves
    • Hard to animate and texture
  • Meshes
    • Easy to model and texture
    • Easy to animate
    • Rough/Imperfect surface
    • Large footprint
  • Subdivision surface
    • Combine both
    • Mesh acts as B-spline image

Real-time subsurf motivation

  • Currently used in film and modeling (actually this has probably changed since I wrote this all the way back in 2017)
    • Pixar popularized
    • Saves disk space
      • part of modifier stack
    • SubSurf is applied by CPU
    • Result is in RAM
  • GPU usage
    • There is current interest in applying subsurf on GPU
    • Saves RAM and bandwidth
    • Don’t need the result back
      • Push out to screem

Catmull-Clark

  • Meshes are:
    • Vertices
    • Faces
    • Contain indices to vertex array
    • Edges
    • Contain indices to face array
    • Arbitrary meshes are not aligned in memory in accordance to their spatial relationship
  • Algorithm
    • Calculate face average points F
    • Calculate edge average points R
      • Uses results from F (sync locally here)
    • Calculate barycenter of vertices P
      • Uses results from F
    • Close faces
    • Repeat as desired

Parallelization Approach

  • Some calculations require data from a neighbor
  • We can read that data and perform a duplicate calculation
  • Break mesh into patches
    • Let individual threads submit work to the GPU
  • The patches will be divided into the thread blocks for the GPU kernels that do the calculations
  • Theory: GPU Multiprocessors spend a lot of time waiting on reads; however they also have a good scheduler
    • Keep GPUs busy and task switching with multiple threads

Optimization approach

  • Wondered if we could lay mesh data in regular grids
    • Theory: more cash friendly if neighbors are near each other
  • This is what I came up with:
    • Assume quads only – a typical requirement
    • Assume manifold mesh – no wormholes
    • Break mesh into patches of regular matrices
      • Using non-degree 4 vertices as a guide

image

Implementation

  • CUDA
    • NVCC
  • OpenMP
    • Can combine with CUDA through “streams”
    • NCVV supports openMP with -Xcompiler -fopenmp --default-dtream per-thread
  • Batch Size Macro – 4 compiled version
    • (4x4, 8x8, 16x16, 32x32
  • openMP threads and mesh size are command line options

Compute Platform

  • Used my of computer
  • OS: Linux Mint 18.1 Cinnamon 64-bit
  • CPU: Intel i7-6700k
    • 4 physical core, 8 logical core @4 GHz
  • 16 GB DDR4
  • GPU: Nvidia GTX 1080
    • Compute Capability 6.1
    • 2560 CUDA Cores @ 2GHz
    • 8 GB GDDR5 @ “10GHz”

Experiments

  • Each Ran multiple times
  • Six mesh sizes
    • 11664(108x108), 104876(324*324), 101604(1008x1008), 10036224(3168x3168), 100160064(10008x10008), 1000583424(31632x31632)
  • Four Paths Counts/Thread counts
    • 1,4,9,16
    • Exceeds core count (8) – job is just to submit work to GPU
    • Divides up the mesh
  • Four Block Sizes
    • 16(4x4), 64(8x8), 256(16*16), 1024(32x32)
    • Chosen to divide well by the SM
    • Divide up the patches
  • Used high resolution time available with C11

Patch/Thread Resultsd 32x32

image

Block Size and Block Threads Results

  • No noticeable pattern
  • Each calculation is highly independent
    • Perhaps we were unable to capitalize on the shared memory afforded by larger block sizes

Optimization Results

  • Internally patches had random neighbors
  • Did not duplicate the problem of random neighbors between patches
    • The effect we see here is just from the GPU image

Analysis

  • Results scaled with thread count on the largest mesh that didn’t fail
    • The CPU threads were thus keeping the GPU Streaming Multiprocessors busy
  • Results did not scale with block size
    • This is perhaps because the algorithm couldn’t benefit from the shared memory
  • Results of optimization were promising
    • The tested irregular mesh was just irregular with the patch
    • Perhaps results would be more striking otherwise

Future (perhaps not...)

  • I’d like to combine this with an actual graphics application
  • Combine with tesselation
  • Explore use with progressive collision detection

catmull_clark_with_local_patches's People

Contributors

feildawproton avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.