Git Product home page Git Product logo

spn-compiler's People

Contributors

csvtuda avatar johannesschulte avatar mhalk avatar sommerlukas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spn-compiler's Issues

Graph Partitioning

For applications with many features or complicated feature relations, the trained SPNs can become very large. To better handle such large graphs, implement graph partitioning.

This probably requires some refactoring of the SPN dialect and maybe to split the dialect up into a more high-level abstraction and a low-level graph structure.

Add support for more leaf distributions

Currently, the compiler and MLIR dialect only support histograms as leaf nodes. Extend the MLIR dialect and the compile flow to represent & generated code for other leaf distributions.
Two leaf distributions can currently be serialized, but have no representation in the compiler:

  • Gaussian
  • Categorical

GPU support

Support mapping to GPUs by lowering from LoSPN to the MLIR gpu dialect (and other dialects if necessary) and extending the runtime interface for GPUs.

GPU memory optimization

Implement a pass to avoid unnecessary copies of intermediate buffers between host and device, if the data is only required on the GPU.

Also, address the problem of insufficient parameter space reported by ptxas for very large graphs.

Support Categorical and Histogram in SLP vectorizer

Implement support for Categorical and Histogram leaf nodes in the SLP vectorizer.

Potential implementation strategies can be gather loads from constant arrays or, in case the leaf has only few classes/buckets, vectorized select operations.

Migrate Graph Statistics to new dialects

As part of #40, the old SPN dialect was replaced with two dialects. The collection of graph statistics has not been migrated to the new dialect. Migrate the graph statistics to operate on the LoSPN dialect.

Representation for models and queries in Python interface

Extend the Python interface/library to represent models and query types and pass information to compiler.

  • Add a thin wrapper around SPN graphs in the xspn library
  • Add classes representing query-types and query-specific parameters to xspn
  • Extend binary serialization in xspn for the queries and the wrapper
  • Use binary serialized format for input in C++-part of the compiler. Replaces the old JSON-based input interface. As currently no operations are performed on the old graph-IR, the new interface might as well directly generate MLIR modules, skipping the graph-IR.
  • Extend the Kernel product of the compiler to contain information about the query type.

Data-layout transformation

Implement efficient data-layout transformation (e.g. matrix transpose) for more efficient memory access on CPU (vector-load #6 ) and GPU (coalesced memory access).

Make use of vector libraries for math functions

Use efficient implementations for functions such as log or exp from vector libraries such as SVML or libmvec when generating vector code.

Currently, there is a patch for LLVM under review, that would allow to do so during the translation from LLVM IR to assembly/object code. Integrate the corresponding options or implement a similar solution.

Compiler configuration interface

Extend the interface of the compiler (including the Python-Interface) to allow a compiler configuration (target, flags, etc.) to be passed to the compiler.

CMake build fails

After the merge of #23, the CMake build fails with my setup, with the following error:

In file included from .../execute/src/main.cpp:10:
In file included from .../compiler/include/driver/Options.h:16:
../compiler/../common/include/util/Logging.h:12:10: fatal error: 'spdlog/spdlog.h' file not found
#include <spdlog/spdlog.h>
         ^~~~~~~~~~~~~~~~~

Adding spdlog to the dependencies of the execute target fixes the problem in my setup:

diff --git a/execute/CMakeLists.txt b/execute/CMakeLists.txt
index 9b025c7..6c1899b 100644
--- a/execute/CMakeLists.txt
+++ b/execute/CMakeLists.txt
@@ -1,6 +1,6 @@
 add_executable(driver src/main.cpp)
 
-target_link_libraries(driver spnc spnc-rt dl)
+target_link_libraries(driver spnc spnc-rt dl spdlog::spdlog)
 add_dependencies(driver compiler-rt)
 add_compile_definitions(TEST_KERNEL_DIR="${CMAKE_CURRENT_BINARY_DIR}")

@mhalk: Can you please check if the problem occurs for you too, with a fresh CMake setup and if the fix works for you, too?
If that is the case, please submit the change as a bugfix (on a bugfix/-branch from develop).

Delete old SPN dialect

After the changes made in #40, the old SPN dialect is now outdated. Currently, the graph statistics still need to be migrated (#41), but after that is done, the old SPN dialect and related infrastructure can be removed from the project.

Enhance compiler Python interface

The Python interface currently is a very thin wrapper around the compiler. Enhance the interface to make use of the compiler from Python more convenient.

Numerical value tracing

In order to evaluate precision/accuracy requirements, we need to trace numerical values inside the SPN.

A LLVM-pass running on the generated LLVM IR should insert calls to a trace-function for all computed values, including leaf-nodes. The trace-function itself should be implemented in a LLVM bitcode-library, which is linked with the generated module using llvm-link.

Testing of MLIR passes & patterns

The new, alternative MLIR toolchain uses patterns and passes for transformations within dialects and conversion between dialects.

The implemented patterns should be tested using LLVM's testing infrastructure (FileCheck), see the testing guide.

Extend runtime library execution interface

The runtime used to execute the compiled kernels currently does not yet reflect the different kinds of queries and how they execute (batch mode vs. single execution, ...).

Make the runtime aware of different execution modes and query types and enhance its Python interface for more convenience.

Add support for MPE queries

Add support for most-probable explanation (MPE) queries, generating code for fast and memory-efficient computation.

Binary serialization

For efficient storage and exchange of SPNs between SPFlow and the compiler, SPNs should be serializable to some binary format, e.g. protobuf.

Numeric/Arithmetic Analysis

The choice of the most efficient arithmetic format (single-precision, double-precision, ...) depends on the required precision of the application (e.g., provided by the user/developer) and the tree-structure.

To automatically choose the best arithmetic format, we should have an analysis pass in the compiler, inspired by, for example, ProbLP.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.