Git Product home page Git Product logo

hpc4e-project's Introduction

Summary

  • [1] Performance and Power Profiling Tool - CPU Module
  • [2] Nagios check_proc_performance Plugin

[1] Performance and Power Profiling Tool - CPU Module

Description

The form to collect the data from the internal sensors and the hardware parameters, for each computational architecture, differ from each other. For this reason, it was developed a performance and power profiling tool based on different modules to collect these parameters, so that the hardware parameters and the internal sensors were collected in a more homogeneous way as possible. The tool was designed to address the need for fine-grained power profiling on parallel and distributed system and the need to correlate with performance profiling. The approach to monitor performance of applications consists of online monitoring and directly method. The tool was developed based on two different modules that enable profile both CPUs and GPUs using the internal sensors. It has very low overhead and high sampling rate.

Goals

  • Collect the hardware parammeters (CPU and Memory usage, I/O disk and network)
  • Collect the sensors reading (Power and Temperature)
  • Analyze data

Dependencies:

sudo apt-get install gcc python3-psutil freeipmi

Usage

The process_monitor.py file has a function that can be selected the auxiliar script to collect the sensors reading. Other scripts can be developed according the sensors available in the system and called in this function.

  • To collect the power, use:

    • os.system("/usr/sbin/ipmi-dcmi --get-system-power-statistics | grep Current | awk '{print $4}' >> power.dat") or
    • os.system("./power_jetson.sh")
  • To collect the temperature, use:

    • os.system("./temp_sgi.sh") or
    • os.system("./temp_jetson.sh")

The launcher is responsable to run the application.

The output files are:

  • [PROCESS_NAME.dat]
  • power.dat
  • temperature.dat

Running

sudo python3 process_monitor.py [PROCESS_NAME] [LAUNCHER_NAME]

[2] Nagios check_proc_performance plugin

Description

According to Nagios documentation (Nagios Core Development Team and Community Contributors, 2016), there is a native plugin (check proc) to monitor a specific process in Linux. However, after several unsuccessfully attempts to use this one, we chose to create our own plugin. The plugin, named check proc performance, monitor a specific process in Linux and show what percentage the process is using of CPU, Memory and I/O, while it is running.

Goals

  • Collect the hardware parammeters (CPU and Memory usage)
  • Alert the Nagios server with four states:
    • 0 - "OK"
    • 1 - "WARNING"
    • 2 - "CRITICAL"
    • 3 - "UNKNOWN"
  • Analyze data

Dependencies

Usage

Configure the Nagios server to use this plugin as a service.

Options to be used -p The process name -w The warning for CPU percentage -c The critical for CPU percentage -x The warning for Memory percentage -y The critical for Memory percentage

Example

Defining the command: define command{ command_name check_proc_performance command_line $USER1$/check_proc_performance -p [PROCESS_NAME] -w 80 -c 90 -x 80 -y 90 }

Defining the service: define service use local-service service_description Service Info host_name localhost check_command check_proc_performance }

hpc4e-project's People

Contributors

viniciuspratakloh avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.