Git Product home page Git Product logo

fdtdz's Introduction

fdtd-z -- fast, scalable, and free photonic simulation

fdtd-z is our first step in revolutionizing photonic design by enabling photonic engineers to harness compute at scale.

  • fast: 100x faster than CPU implementations such as Meep, Lumerical, RSoft, ...,
  • scalable: easily runs simulations on hundreds of nodes and beyond,
  • free: open-source, no license-fees!

Give it a test drive in Colab, join the chat on https://gitter.im/fdtdz/community, and read the whitepaper.

Frequently Asked Questions

Is the plan to build additional functionality inside the fdtd-z repo?

fdtd-z is intended to be a super low-level utility in the UNIX philosophy of "just do one thing". As such, a number of seemingly critical functionality is intentionally missing such as waveguide mode solving, utilities for building dielectric structures, and more.

Many, if not all, of these functionalities already exist in various repositories such as MEEP and SPINS, but instead of the same code bits being replicated yet again in fdtd-z our hope is that the photonics community will build these in a modular, open-source way.

We envision a toolchain of photonics software built upon a common numerical framework that is both scaleable and differentiable. At the moment JAX fits the bill. fdtd-z is our contribution to this ecosystem, and we hope there will be many others to join us in building something amazing :).

Will dispersive materials be supported?

Currently, fdtd-z only supports simple dielectric materials (not even absorption is supported) for performance reasons (see the whitepaper for the full story). In the systolic scheme that fdtd-z uses there are two many constraints that prevent us from modeling more complex materials:

  • bandwidth limitations: Additional coefficients used in the update equation need to be loaded, while additional auxiliary fields would both need to be loaded and written back to disk. FDTD is already a heavily bandwidth-limited algorithm and this would further decrease performance.
  • register pressure: CUDA threads are hard-limited to a maximum of 256 registers (reference needed), which are needed for fast access to E- and H-field values, as well as storing coefficients. The current implementation of fdtd-z already uses the maximum number of registers -- a fundamental change in the basic architecture of the systolic memory system (such as using shared memory instead of registers) would be needed to simulate dispersive materials and other more complex material systems.

For dispersive materials, a simple work-around is to run individual single-frequency simulations at each wavelength of interest with the appropriate modified permittivity. While more laborious fdtd-z is designed to be fast as well as easy to parallelize, this work-around also allows for arbitrarily complex dispersions to be modeled accurately.

What about more flexible output sources?

Output sources are currently limited to materializing the E-field over the entire simulation domain for a set of equidistant time steps and leaving it to the user to back out frequency components as desired. While we have not exhaustively tested other output schemes, we can summarize the thinking behind this design decision.

  • The bandwidth costs of a continuous, or running, update are extremely high. In addition to having to read and write the E- and H-field values needed for the FDTD update, an update scheme (e.g. performing a rolling DFT) would have to both read and write output values at every frequency of interest as well.
  • The systolic update scheme negates some of the advantages of only materializing a sub-volume of the simulation domain -- the whole grid of CUDA threads really needs to move along at the same rate (this is also the reason why PML absorbing conditions are not implemented along the x-y plane since the additional, intially localized, computational cost would get spread across a large portion of the simulation domain). Additionally, the time-skew of the systolic scheme also significantly smears the additional cost initially localized to a single time step so that it is felt across multiple time steps.

For these prinicipal reasons, fdtd-z has tried to limit output operations to be write-only and to be as temporally sparse as possible. That said, we do think there is room for additional flexibility in terms of allowing for (potentially multiple) subdomains to be materialized for a larger number of time steps in order to allow a greater number of frequency components to be inferred from a single simulation. Please let us know if this would be important for your application!

Is multi-GPU supported?

While fdtd-z is not able to distribute a single simulation across multiple GPUs, building on JAX means that there should be excellent support readily available for parallelization in terms of distributing multiple simulations across multiple GPUs (where each device has 1 or more simulations to solve). The jax.pmap documentation is probably the right starting point for this.

CUDA_ERROR_COOPERATIVE_LAUNCH_TOO_LARGE

fdtd-z uses CUDA cooperative groups to implement the systolic scheme outlined in the whitepaper and get around the GPU bandwidth bottleneck. Because of this, the launch parameters of the kernel become tightly connected to the underlying architecture of the GPU. In particular the (gridu, gridv) part of the launch parameters must not exceed the number of streaming multiprocessors (SMs) that are on the GPU. For example, the RTX4000 has 36 GPUs so it would make sense to use (gridu, gridv) = (6, 6) (note that there is the additional constraint that blocku * gridu <= blockv * gridv). If gridu * gridv is greater than the number of available GPUs, then an attempt to launch the kernel will result in the CUDA_ERROR_COOPERATIVE_LAUNCH_TOO_LARGE error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.