Git Product home page Git Product logo

siquus / hdfit.scriptshpc Goto Github PK

View Code? Open in Web Editor NEW

This project forked from intellabs/hdfit.scriptshpc

0.0 0.0 0.0 732 KB

This repository contains an HPC (High Performance Computing) reliability benchmark, carrying out fault injection experiments on a variety of HPC applications, targeting BLAS (Basic Linear Algebra Subroutines) GEMM (GEneral Matrix Multiply) operations.

License: GNU Lesser General Public License v3.0

Shell 20.05% Python 73.03% Makefile 6.92%

hdfit.scriptshpc's Introduction

HDFIT.ScriptsHPC

This repository is part of the Hardware Design Fault Injection Toolkit (HDFIT). HDFIT enables end-to-end fault injection experiments and comprises additionally HDFIT.NetlistFaultInjector and HDFIT.SystolicArray.

HDFIT HPC Toolchain

This repository contains the main components of the HDFIT HPC reliability benchmark in order to carry out fault injection experiments on a variety of HPC applications, targeting BLAS GEMM operations and using the proof-of-concept systolic array design implemented in HDFIT.SystolicArray.

Directory Structure

The repository is structured in the following directories:

  • apps: contains code to clone all of the supported HPC applications, as well as apply patches to them to enable fault injection. This directory also contains a series of the application configurations that can be used to run experiments. Once compiled, the applications can be executed from this location.
  • test: contains scripts to configure and run HDFIT fault injection experiments on the supported HPC applications. The scripts can be used both in a serial context, as well as on distributed HPC clusters for large-scale runs.
  • plot: contains a series of Python scripts that can be used to process the CSV files produced by HDFIT experiments, in order to generate useful plots and metrics.

For additional details about the components of the HDFIT HPC reliability benchmark, please refer to the README documents in each directory.

External Dependencies

The main external dependency of the HPC reliability benchmark is the custom HDFIT OpenBLAS library supporting fault injection. Before compiling the HPC applications and running experiments, users need to point to the exact location of the OpenBLAS library. This needs to be done by replacing the PATH_TO_CUSTOM_OPENBLAS_LIB string with the absolute path to the OpenBLAS root directory, in two different places:

  • apps/config.mk: in order to compile HPC applications using the correct OpenBLAS version.
  • test/HDFIT_runner.sh: in order to enable use of LD_PRELOAD for certain applications that require it.

There are other minor dependencies required to compile the HPC applications and use the Python plotting scripts. These are make, cmake, autoconf, pkgconf, MPI and a functional gcc and gfortran toolchain for the former, plus Python 3 with the numpy, matplotlib and seaborn packages for the latter. More details can be found in the README documents in the apps and plot directories respectively.

Getting Started

The basic process to run and analyze HDFIT fault injection tests comprises the following steps, assuming that a valid OpenBLAS library has been already compiled and set with the PATH_TO_CUSTOM_OPENBLAS_LIB variable. First, the HPC applications need to be compiled:

cd apps && make all

Then, a test can be run - here we consider as example the CP2K application with the C2H4 input, performing by default 5k fault injection runs:

cd cp2k && ../../test/HDFIT_runner.sh CP2K-test-C2H4.env

This will eventually produce a out.C2H4 directory containing the experiment's results and a CSV summary. It should be noted that the output of each application run is not printed on the shell, but is directed to separate log files (e.g., out.C2H4/fi-transient/run10.log). The CSV summary file can be further fed into the HDFIT plotting scripts, for example to produce an SDE error curve:

cd out.C2H4 && python3 ../../../plot/HDFIT_plot_error_curve.py HDFIT-CP2K-C2H4-29.08.2022-transient.csv

This will produce an image file containing the desired plot, as well as display several statistical metrics. Further analysis can be conducted by using the output files resulting from each application run under fault injection.

License Terms

All original code that is part of the HDFIT HPC reliability benchmark is released under the terms of the GNU Lesser General Public License (LGPL) version 3 or (at your option) any later version. This includes all files in the plot and test directories of this repository.

The patch files for the individual HPC applications, as well as the associated input configurations, are instead released under the terms of the respective original licenses. This includes all files under the apps/resources directory. A copy of each application's license is included.

hdfit.scriptshpc's People

Contributors

alessio-netti avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.