Git Product home page Git Product logo

ariesncl's Introduction

AriesNCL v1.0

Aries Network Performance Counters Monitoring Library

AriesNCL is a library to monitor and record network router tile performance counters on the Aries router of Cray’s Cascade/XC30 platform.

Build

make

Make sure module papi is loaded before compiling. You may also need to unload the darshan module.

C API

Call the function below to initialize PAPI and set up the counters. It expects a file called 'counters.txt' in the same directory as the executable with a newline-delimited list of counter names to record:

void InitAriesCounters(char *progname, int my_rank, int reporting_rank_mod,
int* AC_event_set, char*** AC_events, long long** AC_values, int* AC_event_count)

Start recording counters:

void StartRecordAriesCounters(int my_rank, int reporting_rank_mod, int*
AC_event_set, char*** AC_events, long long** AC_values, int* AC_event_count)

Stop recording counters. Stores counters in memory until FinalizeAriesCounters called:

void EndRecordAriesCounters(int my_rank, int
reporting_rank_mod, int* AC_event_set, char*** AC_events,
long long** AC_values, int* AC_event_count)

Writes out all the counters to YAML files, cleans up memory, stops PAPI:

void FinalizeAriesCounters(MPI_Comm *mod16_comm, int my_rank, int reporting_rank_mod,
int* AC_event_set, char*** AC_events, long long** AC_values, int* AC_event_count)

Test

For an example test, see the tests folder. The test is currently configured for the KNL nodes on Cori, and it should be run on 2 nodes with 64 ranks per node:

cd tests
make
srun -n 128 ./regions

To setup the variables required by the function calls look at tests/regions.c or the following (this example is for 16 MPI ranks per node):

int AC_event_set;
char** AC_events;
long long * AC_values;
int AC_event_count;
int numtasks;
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Group mod16_group, group_world;
MPI_Comm mod16_comm;
int members[(numtasks-1)/16 + 1];
int rank;
for (rank=0; rank<((numtasks-1)/16 + 1); rank++)
{
members[rank] = rank*16;
}
MPI_Comm_group(MPI_COMM_WORLD, &group_world);
MPI_Group_incl(group_world, ((numtasks-1)/16 + 1), members, &mod16_group);
MPI_Comm_create(MPI_COMM_WORLD, mod16_group, &mod16_comm);

Since every node will record the same counter information, we only record it on one rank per node (we could have up to 3 redudant records on Edison but we cannot easily figure out how many nodes we own on a router). The reporting_rank_mod is the number of ranks per node.

These must be run on nodes with papi support (i.e. Cray's NPU component). All of Edison's compute nodes have this.

Running the test will output YAML files mpitest.nettiles.1.yaml and mpitest.proctiles.1.yaml, containing the counter data for NIC tiles and processor tiles, respectively.

Release

Copyright (c) 2014, Lawrence Livermore National Security, LLC. Produced at the Lawrence Livermore National Laboratory.

Written by:

    Dylan Wang <[email protected]>
    Staci Smith <[email protected]>
    Abhinav Bhatele <[email protected]>

LLNL-CODE-678960. All rights reserved.

This file is part of AriesNCL. For details, see: https://github.com/LLNL/ariesncl. Please also read the LICENSE file for our notice and the LGPL.

ariesncl's People

Contributors

bhatele avatar stacismith avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.