ecoHMEM framework

Prerequisites

The required external dependencies for the framework are:

Python: Minimum version == 3.
Extrae: Profiling library developed by BSC's Performance Analysis group.
Paramedir: Tool to postproces Extrae's traces, developed by BSC's Performance Analysis group. It is part of the Paraver package, which is available as source code or precompiled binaries for several operating systems.
Flexmalloc: User-level interposition runtime for dynamic objects allocation on heterogeneous memory.

Extrae and Paramedir can be download from the AccelComm group's Github OR the BSC's Performance Analysis group website. It is recommended to get both Extrae and Paramedir from the first option, however, in case you choose the later, the minimum Extrae version required is 4.0.0.

The Flexmalloc library can be download it from its Github repository.

Workflow of the framework

The workflow consists of 3 main steps:

Run an instrumented execution of the target application to obtain profiling data
Generate a dynamic memory object distribution across the available memory tiers based on the profiling data
Run the application with the memory object distribution using the Flexmalloc library

None of this steps requires any change in the source code of the application. We provide a set of scripts to help handle the execution of each step, as described in the following subsections.

1. Setup configuration

Since there are several parameters and variables that are used in more than one of the steps, the scripts we provide are prepared to take get the information from a set of environment variables. To setup these variables, copy the file example_conf.src to your working directory. This file consists of a list of export commands that setup the variables expected by the scripts, and you will have to load it using the source command. The main variables you will have to modify are the ones describing the application and its arguments, but there are several variables to configure the tools and scripts of the framework.

ECOHMEM_HOME: the path of the directory containing the framework.
ECOHMEM_EXTRAE_HOME: the path of the directory where you installed Extrae.
ECOHMEM_FLEXMALLOC_HOME: the path of the directory where Flexmalloc is installed.
ECOHMEM_PYTHON: the path of the python interpreter.
ECOHMEM_ADVISOR: the path of the hmem_advisor program.
ECOHMEM_ALLOCSINFO: the path of the allocs_info program. It is used to extract data from the profiling trace.
ECOHMEM_PARAMEDIR_CFG_GEN: the path of the cfg_gen program. It is used to generate Paramedir configuration files.
ECOHMEM_LOAD_EXTRAE_SCRIPT: the path of the helper script that will be used to load Extrae.
ECOHMEM_PARAMEDIR: the path of the Paramedir program. It is used to extract data from the profiling trace.
ECOHMEM_MPI2PRV: the path of the mpi2prv program. It converts the intermediate Extrae trace to the usual prv format, if it wasn't done by Extrae at the end of the profiling run.
ECOHMEM_MPI2PRV_EXTRA_FLAGS: extra flags that will be passed to mpi2prv.
ECOHMEM_EXTRAE_LIB: path of the Extrae dynamic library to use for the profiling run. If this variable is empty, the framework will select a library automatically using variables that describe the application.
ECOHMEM_EXTRAE_XML: path of the Extrae configuration file. There are a couple of example configuration files in the extrae_confs directory.
ECOHMEM_TRACE_TYPE: type of the data being collected. Currently it can be loads or loads_stores. It has to match with the PEBS counters configured in the Extrae configuration file.
ECOHMEM_TRACE_NAME: name of the prv trace, without the extension. If Extrae is configured to do the merge step, this should match the name used by Extrae.
ECOHMEM_TRACE_OUTPUT_DIR: directory where the profiling trace and postprocessed data will be stored.
ECOHMEM_ADVISOR_EXTRA_ARGS: extra arguments that will be passed to hmem_advisor.
ECOHMEM_ADVISOR_MEM_CONFIG: path to the hmem_advisor configuration file. This file describes the available memory tiers, the size, the cost weight for loads, the cost weight for stores and the associated flexmalloc allocator.
ECOHMEM_ADVISOR_OUTPUT_FILE: path of the output file that will contain the memory object distribution.
ECOHMEM_FLEXMALLOC_FALLBACK_ALLOCATOR: name of the flexmalloc allocator that will be used when the rest of allocators are full.
ECOHMEM_FLEXMALLOC_MINSIZE_THRESHOLD_ALLOCATOR: name of the flexmalloc allocator that will be used when the size of the allocation is lower than the threshold.
ECOHMEM_FLEXMALLOC_MINSIZE_THRESHOLD: threshold (in number of bytes) for allocations to be considered by flexmalloc.
ECOHMEM_FLEXMALLOC_MEM_CONFIG: path of the Flexmalloc configuration file.
ECOHMEM_LOAD_FLEXMALLOC_SCRIPT: path of the script that will be used to load Flexmalloc.
ECOHMEM_APP_BINARY: path of your application program.
ECOHMEM_APP_ARGS: arguments that will be passed to your application.
ECOHMEM_IS_FORTRAN_APP: describes if your application is written in Fortran or not (set to 1 or 0, respectively).
ECOHMEM_IS_MPI_APP: set to 1 if your application uses MPI.
ECOHMEM_IS_OMP_APP: set to 1 if your application uses OpenMP.
ECOHMEM_IS_PTHREAD_APP: set to 1 if your application uses pthreads.
ECOHMEM_MPIRUN: path of the mpirun program. Only used for MPI applications.
ECOHMEM_MPIRUN_FLAGS: arguments that will be passed to mpirun.
ECOHMEM_APP_RUNNER: path of the tool that should be used to run your application (e.g. numactl).
ECOHMEM_APP_RUNNER_FLAGS: arguments for the runner tool.

To allow passing flags and arguments with whitespace, the quotes and backslashes in the variables *_FLAGS and *_ARGS are interpreted as the command line shell would. For example, setting ECOHMEM_APP_ARGS="a b1\ b2 c" or ECOHMEM_APP_ARGS="a 'b1 b2' c" will pass 3 arguments to the application; a, "b1 b2", and c.

The clear_env.src script can be sourced to clear the framework configuration variables from your command line environment.

2. Profiling run

This step is performed using the profile_run.sh script. When executed it performs several substeps:

run your application under Extrae to generate the profiling trace
merge the intermediate trace using mpi2prv if Extrae is not configured to do it
generate the Paramedir configuration files customized for the current Extrae trace
run Paramedir and allocs_info to postprocess and extract the data that will be used later on by the hmem_advisor.

To avoid starting from scratch if any of the substeps fail, this script tries to detect which ones were done in previous runs, and skip them until the first pending substep. This behavior can be changed using the following flags:

--force: start from scratch.
--force-cfgs: regenerate the Paramedir configuration files.
--force-postprocess: redo the trace postprocessing with Paramedir and allocs_info.

The script can be run without any argument and it will take all the information it needs from the environment variables configured in the previous step. For convenience, some of the environment variables can be overridden using the following command line arguments:

--trace-name
--trace-type
--output-dir
--app-args
--app-runner
--runner-flags
--mpirun-flags
--mpi2prv-flags

Once the script finishes the output directory (set by ECOHMEM_TRACE_OUTPUT_DIR or the --output-dir argument) will contain the Extrae trace, the postprocessed data and the application output and error logs.

2.1. Extrae Configuration XMLs

We recommend to use one of the Extrae XMLs provided under extrae_confs/ and set the ECOHMEM_EXTRAE_XML environment variable in Step 1 to its path:

ld_extrae.xml: XML file to profile loads.
ldst_extrae.xml: XML file to profile loads and stores.

Each of these XML files contains certain tags to customize the profiling of the application, tags-of-interest to our tool include:

<mpi ..>: For MPI applications
<openmp ..>: For openmp applications
<pthread ..>: For multi-threaded applications.

Note: Each of the previous tags can be enabled/disabled or used in combination depending on the application's parallel environment. In any case, it is recommended to enable only the tags of interest that are required to correctly profile the applications.

<callers ..>: Specifies the depth of function calls to record. In our framework, we are most interested in the <dynamic-memory.. > nested tag which specifies the callstack depth of each dynamic memory object.
<pebs-sampling ..>: Specifies PEBS sampling properties and how frequent to sample PEBS counters. Note that the frequencey attributes can impact the speed and quailty of the profile.
<dynamic-memory ..>: Specifies whether to profile dynamic memory objects. It is used to specifiy a threshold size, i.e., the minimum size of dynamic memory objects to profile.
<merge ..>: If enabled, it converts the intermediate Extrae trace to the usual prv format. If it is not enabled, it will be performed by the profile_run.sh script of Step 1. In the later case, make sure to set the necessary environment vairables of ECOHMEM_MPI2PRV and ECOHMEM_MPI2PRV_EXTRA_FLAGS in Step 1 of the workflow.

3. Memory object distribution

The second step is done using the generate_distribution.sh script, which is a wrapper around the hmem_advisor program. This program uses the postprocessed profiling data to compute the cost heuristics for each object, and then distributes the objects across the available memory tiers. The hmem_advisor accepts several arguments to configure its behavior (set by ECOHMEM_ADVISOR_EXTRA_ARGS or the --extra-args flag accepted by the script):

--mem-config: path to configuration file describing the available memory tiers, required.
--loads: path of csv file with load accesses information, required.
--sizes: path of csv file with memory object size information, required.
--stores: path of csv file with store accesses information.
--worst
--algo
--metric
--page: memory page size, defaults to 4KB
--verbose
--rank: Use statistics from a given rank instead of aggregating from all ranks.
--rank-statistics: How are rank statistics aggregated together (Total, Average).
--visualizer
--allocs-info: path of json file with memory allocations lifetime information.
--num-ranks: number of MPI ranks used to run the application.
--disable-bw-aware: disable the BW-aware distribution refinement step.

Available command line arguments for the generate_distribution.sh script:

--trace-type
--output-file
--input-dir
--mem-config
--extra-args
--force

4. Run with automatic object distribution

Finally, you can run your application using the memory object distribution computed in the previous step running the run.sh script with the --flexmalloc flag. It will execute your application preloading the Flexmalloc library, which will divert each dynamic memory allocation to the memory tier specified in the file generated by the hmem_advisor. If you run this script without the --flexmalloc flag it will execute the application without any change or profiling, which may be useful to collect the baseline execution time.

Available command line arguments:

--obj-file
--app-args
--app-runner
--runner-flags
--mpirun-flags
--flexmalloc-config
--flexmalloc

Flexmalloc supports several memory types. You have to specify which ones are available in its configuration file. Here are a few examples on how to do it:

posix: Allocates data using the standard posix function.

# Memory configuration for allocator posix
Size <MB available per process> MBytes

memkind/pmem: Allocates data on persistent memory mounted as FSDAX.

# Memory configuration for allocator memkind/pmem
@ </path/to/pmem 0> ... @ </path/to/pmem N>

memkind/hbwmalloc: HBW memory available in KNL processors

# Memory configuration for allocator memkind/hbwmalloc
Size <MB available per process> MBytes

For further information about Flexmalloc features and usage, take a look at its own documentation.

Licensing

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

The full GPLv2 license is available in the LICENSE file.

Contact

📫 AccelCom Email: [email protected]

Accelerators and Communications For HPC Group (AccelCom)

Department of Computer Science (CS)

Barcelona Supercomputing Center (BSC)

accelcom-bsc / ecohmem Goto Github PK

ecohmem's Introduction

ecoHMEM framework

Prerequisites

Workflow of the framework

1. Setup configuration

2. Profiling run

2.1. Extrae Configuration XMLs

3. Memory object distribution

4. Run with automatic object distribution

Licensing

Contact

ecohmem's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent