Git Product home page Git Product logo

5l1v3r1 / kokkos-clang Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lanl/kokkos-clang

0.0 2.0 0.0 63.08 MB

A Clang-based compiler for compiling Kokkos code (with no syntactical differences) with the aim of generating optimized code for parallel targets such a multithreaded and GPU (NVIDIA/CUDA) and preserving domain awareness.

License: Other

CMake 0.23% Cuda 1.27% Makefile 0.24% Shell 0.07% M4 0.06% Go 0.08% C++ 56.14% OCaml 0.24% Python 0.61% CSS 0.01% Batchfile 0.01% Standard ML 0.01% C 6.03% SourcePawn 0.01% Objective-C 3.24% Assembly 6.41% LLVM 23.60% Mirah 0.12% HTML 0.72% Objective-C++ 0.91%

kokkos-clang's Introduction

Copyright (c) 2016, Los Alamos National Security, LLC All rights
reserved. Copyright 2016. Los Alamos National Security, LLC. This
software was produced under U.S. Government contract DE-AC52-06NA25396
for Los Alamos National Laboratory (LANL), which is operated by Los
Alamos National Security, LLC for the U.S. Department of Energy. The
U.S. Government has rights to use, reproduce, and distribute this
software.  NEITHER THE GOVERNMENT NOR LOS ALAMOS NATIONAL SECURITY,
LLC MAKES ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LIABILITY
FOR THE USE OF THIS SOFTWARE.  If software is modified to produce
derivative works, such modified software should be clearly marked, so
as not to confuse it with the version available from LANL.
 
Additionally, redistribution and use in source and binary forms, with
or without modification, are permitted provided that the following
conditions are met: 

1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

3. Neither the name of Los Alamos National Security, LLC, Los Alamos
National Laboratory, LANL, the U.S. Government, nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
 
THIS SOFTWARE IS PROVIDED BY LOS ALAMOS NATIONAL SECURITY, LLC AND
CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING,
BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL LOS
ALAMOS NATIONAL SECURITY, LLC OR CONTRIBUTORS BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Kokkos Clang was developed as part of the IDEAS (Interoperable Design of
Extreme-scale Application Software) project funded by the DOE Office of Science.

This code is unclassified and has been assigned LA-CC-16-054.

====== Kokkos GPU Compiler A.K.A Kokkos Clang

The Kokkos Clang compiler is a version of the Clang C++ compiler that
has been modified to perform targeted code generation for Kokkos
constructs in the goal of generating highly optimized code and to
provide semantic (domain) awareness throughout the compilation
toolchain of these constructs such as parallel for and parallel
reduce. This approach is taken to explore the possibilities of
exposing the developer’s intentions to the underlying compiler
infrastructure (e.g. optimization and analysis passes within the
middle stages of the compiler) instead of relying solely on the
restricted capabilities of C++ template metaprogramming. To date our
current activities have focused on correct GPU code generation and
thus we have not yet focused on improving overall performance.  The
compiler is implemented by recognizing specific (syntactic) Kokkos
constructs in order to bypass normal template expansion mechanisms and
instead use the semantic knowledge of Kokkos to directly generate code
in the compiler’s intermediate representation (IR); which is then
translated into an NVIDIA-centric GPU program and supporting runtime
calls. In addition, by capturing and maintaining the higher-level
semantics of Kokkos directly within the lower levels of the compiler
has the potential for significantly improving the ability of the
compiler to communicate with the developer in the terms of their
original programming model/semantics.

Developed by:

Nick Moss ([email protected]) and Pat McCormick ([email protected])

====== Project Description / Status

-A Clang-based compiler for compiling Kokkos code (with no syntactical
differences) with the aim of generating optimized code for specific
targets such a multithreaded and GPU (NVIDIA/CUDA) and preserving
domain awareness.

-Runtime for thread pooling, synching, function queueing, and GPU -
run kernel from PTX, manage device memory for Kokkos views and
dynamically allocated C++ arrays, and device <-> host memory transfer.

-Both multithreaded and GPU modes intercept the code generation paths
that would normally be followed when encountering a parallel
for/reduce - analyze C++ templated parts of the AST to pull out the
compile-time information needed for interface to the runtime.

-Multithreaded mode generates "closures" for a parallel for's which
wrap the arguments and values and generate and IR function ->
queueable functions to the runtime.

-GPU mode generates a kernel for each parallel for -- and IR function
w/ transformations which gets code-generated to PTX and embedded in
the module - uses CUDA driver library to launch kernels, allocate
memory, etc.

-GPU: translate CPU-side Kokkos views runtime and static dimensions
into device memory - translate view indexing into cuda array index

-GPU: runtime manages device memory for Kokkos views - analyze
read/write patterns in the kernels and copies memory to/from device
only as needed -  device memory for shared views across kernels is
reused/persistent. CFG and AST visitor analysis for read/write
dependence, auto-copy views/arrays to/from device memory, also used
for asynchronous parallel for/reduce launches using CUDA streams.

-Currently looking into dynamic parallelism for nested forall
-launches.

-Established a flexible LLVM foundation - each kernel is IR which can
be analyzed and manipulated - possibile further optimizations:
warp/divergence, shared memory, better overlap IO, different memory
layouts for views, view slicing, module wide optimizations across
different kernels. Many possibilities exist for
transforming/optimizing the generated code without requiring any
changes or instrumentation to the user's original Kokkkos code.

====== Code

-runtime resides in top-level runtime directory

-to locate the places within LLVM/Clang which were modified/extended, 
grep for "=== ideas"

====== Known problems

The compiled Clang may not properly find system C++ headers so the
-isystem and proper include paths may have to be specified. See the
regression tests.

====== Building the Compiler

-Clone Kokkos from https://github.com/kokkos/kokkos and place the 'kokkos' directory at the top-level directory of kokkos-clang
git checkout tags/2.02.15


-from the top-level ideas directory, e.g:

-mkdir build

-cd build

-cmake .. OR cmake -DCMAKE_BUILD_TYPE=RELEASE ..

-make

====== Running

-The build process creates a C++/Clang compiler in build/llvm/bin which 
can be used just like the ordinary clang++ compiler

-Regression tests are located in test/regress

====== Darwin cluster specific build

module load cuda/8.0
module load cmake
module load gcc/5.2.0

kokkos-clang's People

Contributors

nickm319 avatar zard49 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.