Git Product home page Git Product logo

gpgpu-sim / gpgpu-sim_distribution Goto Github PK

View Code? Open in Web Editor NEW
1.0K 47.0 485.0 34.2 MB

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.

License: Other

Makefile 0.64% Python 2.27% C++ 54.99% Shell 0.03% C 20.89% GDB 0.03% Lex 0.74% Yacc 0.90% SWIG 0.01% Cuda 19.50%

gpgpu-sim_distribution's Introduction

Welcome to GPGPU-Sim, a cycle-level simulator modeling contemporary graphics processing units (GPUs) running GPU computing workloads written in CUDA or OpenCL. Also included in GPGPU-Sim is a performance visualization tool called AerialVision and a configurable and extensible energy model called GPUWattch. GPGPU-Sim and GPUWattch have been rigorously validated with performance and power measurements of real hardware GPUs.

This version of GPGPU-Sim has been tested with a subset of CUDA version 4.2, 5.0, 5.5, 6.0, 7.5, 8.0, 9.0, 9.1, 10, and 11

Please see the copyright notice in the file COPYRIGHT distributed with this release in the same directory as this file.

If you use GPGPU-Sim 4.0 in your research, please cite:

Mahmoud Khairy, Zhesheng Shen, Tor M. Aamodt, Timothy G Rogers. Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling. In proceedings of the 47th IEEE/ACM International Symposium on Computer Architecture (ISCA), May 29 - June 3, 2020.

If you use CuDNN or PyTorch support, checkpointing or our new debugging tool for functional simulation errors in GPGPU-Sim for your research, please cite:

Jonathan Lew, Deval Shah, Suchita Pati, Shaylin Cattell, Mengchi Zhang, Amruth Sandhupatla, Christopher Ng, Negar Goli, Matthew D. Sinclair, Timothy G. Rogers, Tor M. Aamodt Analyzing Machine Learning Workloads Using a Detailed GPU Simulator, arXiv:1811.08933, https://arxiv.org/abs/1811.08933

If you use the Tensor Core model in GPGPU-Sim or GPGPU-Sim's CUTLASS Library for your research please cite:

Md Aamir Raihan, Negar Goli, Tor Aamodt, Modeling Deep Learning Accelerator Enabled GPUs, arXiv:1811.08309, https://arxiv.org/abs/1811.08309

If you use the GPUWattch energy model in your research, please cite:

Jingwen Leng, Tayler Hetherington, Ahmed ElTantawy, Syed Gilani, Nam Sung Kim, Tor M. Aamodt, Vijay Janapa Reddi, GPUWattch: Enabling Energy Optimizations in GPGPUs, In proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA 2013), Tel-Aviv, Israel, June 23-27, 2013.

If you use the support for CUDA dynamic parallelism in your research, please cite:

Jin Wang and Sudhakar Yalamanchili, Characterization and Analysis of Dynamic Parallelism in Unstructured GPU Applications, 2014 IEEE International Symposium on Workload Characterization (IISWC), November 2014.

If you use figures plotted using AerialVision in your publications, please cite:

Aaron Ariel, Wilson W. L. Fung, Andrew Turner, Tor M. Aamodt, Visualizing Complex Dynamics in Many-Core Accelerator Architectures, In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 164-174, White Plains, NY, March 28-30, 2010.

This file contains instructions on installing, building and running GPGPU-Sim. Detailed documentation on what GPGPU-Sim models, how to configure it, and a guide to the source code can be found here: http://gpgpu-sim.org/manual/. Instructions for building doxygen source code documentation are included below. Detailed documentation on GPUWattch including how to configure it and a guide to the source code can be found here: http://gpgpu-sim.org/gpuwattch/.

If you have questions, please sign up for the google groups page (see gpgpu-sim.org), but note that use of this simulator does not imply any level of support. Questions answered on a best effort basis.

To submit a bug report, go here: http://www.gpgpu-sim.org/bugs/

See Section 2 "INSTALLING, BUILDING and RUNNING GPGPU-Sim" below to get started.

See file CHANGES for updates in this and earlier versions.

CONTRIBUTIONS and HISTORY

GPGPU-Sim

GPGPU-Sim was created by Tor Aamodt's research group at the University of British Columbia. Many have directly contributed to development of GPGPU-Sim including: Tor Aamodt, Wilson W.L. Fung, Ali Bakhoda, George Yuan, Ivan Sham, Henry Wong, Henry Tran, Andrew Turner, Aaron Ariel, Inderpret Singh, Tim Rogers, Jimmy Kwa, Andrew Boktor, Ayub Gubran Tayler Hetherington and others.

GPGPU-Sim models the features of a modern graphics processor that are relevant to non-graphics applications. The first version of GPGPU-Sim was used in a MICRO'07 paper and follow-on ACM TACO paper on dynamic warp formation. That version of GPGPU-Sim used the SimpleScalar PISA instruction set for functional simulation, and various configuration files indicating which loops should be spawned as kernels on the GPU, along with reconvergence points required for SIMT execution to provide a programming model simlar to CUDA/OpenCL. Creating benchmarks for the original GPGPU-Sim simulator was a very time consuming process and the validity of code generation for CPU run on a GPU was questioned by some. These issues motivated the development an interface for directly running CUDA applications to leverage the growing number of applications being developed to use CUDA. We subsequently added support for OpenCL and removed all SimpleScalar code.

The interconnection network is simulated using the booksim simulator developed by Bill Dally's research group at Stanford.

To produce output that matches the output from running the same CUDA program on the GPU, we have implemented several PTX instructions using the CUDA Math library (part of the CUDA toolkit). Code to interface with the CUDA Math library is contained in cuda-math.h, which also includes several structures derived from vector_types.h (one of the CUDA header files).

GPUWattch Energy Model

GPUWattch (introduced in GPGPU-Sim 3.2.0) was developed by researchers at the University of British Columbia, the University of Texas at Austin, and the University of Wisconsin-Madison. Contributors to GPUWattch include Tor Aamodt's research group at the University of British Columbia: Tayler Hetherington and Ahmed ElTantawy; Vijay Reddi's research group at the University of Texas at Austin: Jingwen Leng; and Nam Sung Kim's research group at the University of Wisconsin-Madison: Syed Gilani.

GPUWattch leverages McPAT, which was developed by Sheng Li et al. at the University of Notre Dame, Hewlett-Packard Labs, Seoul National University, and the University of California, San Diego. The paper can be found at http://www.hpl.hp.com/research/mcpat/micro09.pdf.

INSTALLING, BUILDING and RUNNING GPGPU-Sim

Assuming all dependencies required by GPGPU-Sim are installed on your system, to build GPGPU-Sim all you need to do is add the following line to your ~/.bashrc file (assuming the CUDA Toolkit was installed in /usr/local/cuda):

  export CUDA_INSTALL_PATH=/usr/local/cuda

then type

  bash
  source setup_environment
  make

If the above fails, see "Step 1" and "Step 2" below.

If the above worked, see "Step 3" below, which explains how to run a CUDA benchmark on GPGPU-Sim.

Step 1: Dependencies

GPGPU-Sim was developed on SUSE Linux (this release was tested with SUSE version 11.3) and has been used on several other Linux platforms (both 32-bit and 64-bit systems). In principle, GPGPU-Sim should work with any linux distribution as long as the following software dependencies are satisfied.

Download and install the CUDA Toolkit. It is recommended to use version 3.1 for normal PTX simulation and version 4.0 for cuobjdump support and/or to use PTXPlus (Harware instruction set support). Note that it is possible to have multiple versions of the CUDA toolkit installed on a single system -- just install them in different directories and set your CUDA_INSTALL_PATH environment variable to point to the version you want to use.

[Optional] If you want to run OpenCL on the simulator, download and install NVIDIA's OpenCL driver from http://developer.nvidia.com/opencl. Update your PATH and LD_LIBRARY_PATH as indicated by the NVIDIA install scripts. Note that you will need to use the lib64 directory if you are using a 64-bit machine. We have tested OpenCL on GPGPU-Sim using NVIDIA driver version 256.40 http://developer.download.nvidia.com/compute/cuda/3_1/drivers/devdriver_3.1_linux_64_256.40.run This version of GPGPU-Sim has been updated to support more recent versions of the NVIDIA drivers (tested on version 295.20).

GPGPU-Sim dependencies:

  • gcc
  • g++
  • make
  • makedepend
  • xutils
  • bison
  • flex
  • zlib
  • CUDA Toolkit

GPGPU-Sim documentation dependencies:

  • doxygen
  • graphvi

AerialVision dependencies:

  • python-pmw
  • python-ply
  • python-numpy
  • libpng12-dev
  • python-matplotlib

We used gcc/g++ version 4.5.1, bison version 2.4.1, and flex version 2.5.35.

If you are using Ubuntu, the following commands will install all required dependencies besides the CUDA Toolkit.

GPGPU-Sim dependencies:

sudo apt-get install build-essential xutils-dev bison zlib1g-dev flex libglu1-mesa-dev

GPGPU-Sim documentation dependencies:

sudo apt-get install doxygen graphviz

AerialVision dependencies:

sudo apt-get install python-pmw python-ply python-numpy libpng12-dev python-matplotlib

CUDA SDK dependencies:

sudo apt-get install libxi-dev libxmu-dev libglut3-dev

If you are running applications which use NVIDIA libraries such as cuDNN and cuBLAS, install them too.

Finally, ensure CUDA_INSTALL_PATH is set to the location where you installed the CUDA Toolkit (e.g., /usr/local/cuda) and that $CUDA_INSTALL_PATH/bin is in your PATH. You probably want to modify your .bashrc file to incude the following (this assumes the CUDA Toolkit was installed in /usr/local/cuda):

export CUDA_INSTALL_PATH=/usr/local/cuda
export PATH=$CUDA_INSTALL_PATH/bin

If running applications which use cuDNN or cuBLAS:

export CUDNN_PATH=<Path To cuDNN Directory>
export LD_LIBRARY_PATH=$CUDA_INSTALL_PATH/lib64:$CUDA_INSTALL_PATH/lib:$CUDNN_PATH/lib64

Step 2: Build

To build the simulator, you first need to configure how you want it to be built. From the root directory of the simulator, type the following commands in a bash shell (you can check you are using a bash shell by running the command "echo $SHELL", which should print "/bin/bash"):

source setup_environment <build_type>

replace <build_type> with debug or release. Use release if you need faster simulation and debug if you need to run the simulator in gdb. If nothing is specified, release will be used by default.

Now you are ready to build the simulator, just run

make

After make is done, the simulator would be ready to use. To clean the build, run

make clean

To build the doxygen generated documentations, run

make docs

To clean the docs run

make cleandocs

The documentation resides at doc/doxygen/html.

To run Pytorch applications with the simulator, install the modified Pytorch library as well by following instructions here.

Step 3: Run

Before we run, we need to make sure the application's executable file is dynamically linked to CUDA runtime library. This can be done during compilation of your program by introducing the nvcc flag "--cudart shared" in makefile (quotes should be excluded).

To confirm the same, type the follwoing command:

ldd <your_application_name>

You should see that your application is using libcudart.so file in GPGPUSim directory. If the application is a Pytorch application, <your_application_name> should be $PYTORCH_BIN, which should be set during the Pytorch installation.

If running applications which use cuDNN or cuBLAS:

  • Modify the Makefile or the compilation command of the application to change all the dynamic links to static ones, for example:

    • -L$(CUDA_PATH)/lib64 -lcublas to -L$(CUDA_PATH)/lib64 -lcublas_static

    • -L$(CUDNN_PATH)/lib64 -lcudnn to -L$(CUDNN_PATH)/lib64 -lcudnn_static

  • Modify the Makefile or the compilation command such that the following flags are used by the nvcc compiler: -gencode arch=compute_61,code=compute_61

    (the number 61 refers to the SM version. You would need to set it based on the GPGPU-Sim config -gpgpu-ptx-force-max-capability you use)

Copy the contents of configs/QuadroFX5800/ or configs/GTX480/ to your application's working directory. These files configure the microarchitecture models to resemble the respective GPGPU architectures.

To use ptxplus (native ISA) change the following options in the configuration file to "1" (Note: you need CUDA version 4.0) as follows:

-gpgpu_ptx_use_cuobjdump 1
-gpgpu_ptx_convert_to_ptxplus 1

Now To run a CUDA application on the simulator, simply execute

source setup_environment <build_type>

Use the same <build_type> you used while building the simulator. Then just launch the executable as you would if it was to run on the hardware. By running source setup_environment <build_type> you change your LD_LIBRARY_PATH to point to GPGPU-Sim's instead of CUDA or OpenCL runtime so that you do NOT need to re-compile your application simply to run it on GPGPU-Sim.

To revert back to running on the hardware, remove GPGPU-Sim from your LD_LIBRARY_PATH environment variable.

The following GPGPU-Sim configuration options are used to enable GPUWattch

-power_simulation_enabled 1 (1=Enabled, 0=Not enabled)
-gpuwattch_xml_file <filename>.xml

The GPUWattch XML configuration file name is set to gpuwattch.xml by default and currently only supplied for GTX480 (default=gpuwattch_gtx480.xml). Please refer to http://gpgpu-sim.org/gpuwattch/ for more information.

Running OpenCL applications is identical to running CUDA applications. However, OpenCL applications need to communicate with the NVIDIA driver in order to build OpenCL at runtime. GPGPU-Sim supports offloading this compilation to a remote machine. The hostname of this machine can be specified using the environment variable OPENCL_REMOTE_GPU_HOST. This variable should also be set through the setup_environment script. If you are offloading to a remote machine, you might want to setup passwordless ssh login to that machine in order to avoid having too retype your password for every execution of an OpenCL application.

If you need to run the set of applications in the NVIDIA CUDA SDK code samples then you will need to download, install and build the SDK.

The CUDA applications from the ISPASS 2009 paper mentioned above are distributed separately on github under the repo ispass2009-benchmarks. The README.ISPASS-2009 file distributed with the benchmarks now contains updated instructions for running the benchmarks on GPGPU-Sim v3.x.

(OPTIONAL) Contributing to GPGPU-Sim (ADVANCED USERS ONLY)

If you have made modifications to the simulator and wish to incorporate new features/bugfixes from subsequent releases the following instructions may help. They are meant only as a starting point and only recommended for users comfortable with using source control who have experience modifying and debugging GPGPU-Sim.

WARNING: Before following the procedure below, back up your modifications to GPGPU-Sim. The following procedure may cause you to lose all your changes. In general, merging code changes can require manual intervention and even in the case where a merge proceeds automatically it may introduce errors. If many edits have been made the merge process can be a painful manual process. Hence, you will almost certainly want to have a copy of your code as it existed before you followed the procedure below in case you need to start over again. You will need to consult the documentation for git in addition to these instructions in the case of any complications.

STOP. BACK UP YOUR CHANGES BEFORE PROCEEDING. YOU HAVE BEEN WARNED. TWICE.

To update GPGPU-Sim you need git to be installed on your system. Below we assume that you ran the following command to get the source code of GPGPU-Sim:

  git clone git://dev.ece.ubc.ca/gpgpu-sim

Since running the above command you have made local changes and we have published changes to GPGPU-Sim on the above git server. You have looked at the changes we made, looking at both the new CHANGES file and probably even the source code differences. You decide you want to incorporate our changes into your modified version of GPGPU-Sim.

Before updating your source code, we recommend you remove any object files:

  make clean

Then, run the following command in the root directory of GPGPU-Sim:

  git pull

While git is pulling the latest changes, conflicts might arise due to changes that you made that conflict with the latest updates. In this case, you need to resolved those conflicts manually. You can either edit the conflicting files directly using your favorite text editor, or you can use the following command to open a graphical merge tool to do the merge:

  git mergetool

Testing updated version of GPGPU-Sim

Now you should test that the merged version "works". This means following the steps for building GPGPU-Sim in the new README file (not this version) since they may have changed. Assuming the code compiles without errors/warnings the next step is to do some regression testing. At UBC we have an extensive set of regression tests we run against our internal development branch when we make changes. In the future we may make this set of regression tests publically available. For now, you will want to compile the merged code and re-run all of the applications you care about (implying these applications worked for you before you did the merge). You want to do this before making further changes to identify any compile time or runtime errors that occur due to the code merging process.

MISCELLANEOUS

Speeding up the execution

Some applications take several hours to execute on GPGPUSim. This is because the simulator has to dump the PTX, analyze them and get resource usage statistics. This can be avoided everytime we execute the program in the following way:

  1. Execute the program by enabling “-save_embedded_ptx 1” in config file, execute the code and let cuobjdump command dump all necessary files. After this process, you will get 2 new files namely: cuobjdump_complete_output<some_random_name> and _1.ptx

  2. Create new environment variables or include the below in your .bashrc file:

    1. export PTX_SIM_USE_PTX_FILE=_1.ptx
    2. export PTX_SIM_KERNELFILE=_1.ptx
    3. export CUOBJDUMP_SIM_FILE=cuobjdump_complete_output<some_random_name>
  3. Disable -save_embedded_ptx flag, execute the code again. This will skip the dumping by cuobjdump and directly goes to executing the program thus saving time.

Debugging failing GPGPU-Sim Regressions

Credits: Tor M Aamodt

To debug failing GPGPU-Sim regression tests you need to run them locally. The fastest way to do this, assuming you are working with GPGPU-Sim versions more recent than the GPGPU-Sim dev branch circa March 28, 2018 (commit hash 2221d208a745a098a60b0d24c05007e92aaba092), is to install Docker. The instructions below were tested with Docker CE version 18.03 on Ubuntu and Mac OS. Docker will enable you to run the same set of regressions used by GPGPU-Sim when submitting a pull request to https://github.com/gpgpu-sim/gpgpu-sim_distribution and also allow you to log in and launch GPGPU-Sim in gdb so you can inspect failures.

  1. Install Docker. On Ubuntu 14.04 and 16.04 the following instructions work: https://docs.docker.com/install/linux/docker-ce/ubuntu/#uninstall-old-versions

  2. Clone GPGPU-Sim from your fork of GPGPU-Sim. For example:

    git clone https://github.com//gpgpu-sim_distribution.git

  3. Run the following command (this is all one line) to run the regressions in docker:

    docker run --privileged -v `pwd`:/home/runner/gpgpu-sim_distribution:rw aamodt/gpgpu-sim_regress:latest /bin/bash -c "./start_torque.sh; chown -R runner /home/runner/gpgpu-sim_distribution; su - runner -c 'source /home/runner/gpgpu-sim_distribution/setup_environment && make -j -C /home/runner/gpgpu-sim_distribution && cd /home/runner/gpgpu-sim_simulations/ && git pull && /home/runner/gpgpu-sim_simulations/util/job_launching/run_simulations.py -c /home/runner/gpgpu-sim_simulations/util/job_launching/regression_recipies/rodinia_2.0-ft/configs.gtx1080ti.yml -N regress && /home/runner/gpgpu-sim_simulations/util/job_launching/monitor_func_test.py -v -N regress'; tail -f /dev/null"
    

    Explanation: The last part of this command, "tail -f /dev/null" will keep the docker container running after the regressions finish. This enables you to log into the container to run the same tests inside gdb so you can debug. The "--privileged" part enables you to use breakpoints inside gdb in a container. The "-v" part maps the current directory (with the GPGPU-Sim source code you want to test) into the container. The string "aamodt/gpgpu-sim_regress:latest" is a tag for a container setup to run regressions which will be downloaded from docker hub. The portion starting with /bin/bash is a set of commands run inside a bash shell inside the container. E.g., the command start_torque.sh starts up a queue manager inside the container.

    If the above command stops with the message "fatal: unable to access 'https://github.com/tgrogers/gpgpu-sim_simulations.git/': Could not resolve host: github.com" this likely means your computer sits behind a firewall which is blocking access to Google's name servers (e.g., 8.8.8.8). To get around this you will need to modify th above command to point to your local DNS server. Lookup your DNS server IP address which we will call <DNS_IP_ADDRESS> below. On Ubuntu run "ifconfig" to lookup the network interface connecting your computer to the network. Then run "nmcli device show " to find the IP address of your DNS server. Modify the above command to include "--dns <DNS_IP_ADDRESS>" after "run", E.g.,

    docker run --dns <DNS_IP_ADDRESS> --privileged -v `pwd`:/home/runner/gpgpu-sim_distribution:rw aamodt/gpgpu-sim_regress:latest /bin/bash -c "./start_torque.sh; chown -R runner /home/runner/gpgpu-sim_distribution; su - runner -c 'source /home/runner/gpgpu-sim_distribution/setup_environment && make -j -C /home/runner/gpgpu-sim_distribution && cd /home/runner/gpgpu-sim_simulations/ && git pull && /home/runner/gpgpu-sim_simulations/util/job_launching/run_simulations.py -c /home/runner/gpgpu-sim_simulations/util/job_launching/regression_recipies/rodinia_2.0-ft/configs.gtx1080ti.yml -N regress && /home/runner/gpgpu-sim_simulations/util/job_launching/monitor_func_test.py -v -N regress'; tail -f /dev/null"
    
  4. Find the CONTAINER ID associated with your docker container by running "docker ps".

  5. Log into the container by running the command:

    docker exec -it <CONTAINER_ID> /bin/bash -c "su -l runner"`
    

    The container is running Ubuntu 16.04 and has screen, cscope and vim installed (if you find a favorite Linux tool missing, it is fairly easy to create derived containers that have additional tools).

  6. Lookup the directory of the regression test you want to debug by going to the regression log file directory:

    cd /home/runner/gpgpu-sim_simulations/util/job_launching/logfiles
    
  7. The file "failed_job_log_sim_log.regress..txt" includes information about the failed test including its simulation directory. For the following example, I'll assume the first failing test was "hotspot-rodinia-2.0-ft-30_6_40___data_result_30_6_40_txt--GTX1080Ti" for which the simulation directory is /home/runner/gpgpu-sim_simulations/util/job_launching/../../sim_run_4.2/hotspot-rodinia-2.0-ft/30_6_40___data_result_30_6_40_txt/GTX1080Ti/

  8. Change to the simulation directory using:

    cd <simulation_directory>
    

    E.g., cd /home/runner/gpgpu-sim_simulations/util/job_launching/../../sim_run_4.2/hotspot-rodinia-2.0-ft/30_6_40___data_result_30_6_40_txt/GTX1080Ti/

    This directory should contain a file called "torque.sim" that contains commands used to launch the simulation during regression tests. We will modify this file to enable us to re-run the regression test in gdb. This directory should also contain a file containing the standard output during the regression test. This file will end in .o where is the torque queue manager job number. For the running example for me this file is called "hotspot-rodinia-2.0-ft-30_6_40___data_result_30_6_40_txt.o2". Open this file to determine the LD_LIBRARY_PATH settings used when launching the simulation. Look for a line that starts "doing: export LD_LIBRARY_PATH" and copy the entire line starting with "export LD_LIBRARY_PATH ..."

  9. Paste the "export LD_LIBRARY_PATH ..." line into the bash shell to set LD_LIBRARY_PATH. E.g.,

    export LD_LIBRARY_PATH=/home/runner/gpgpu-sim_simulations/util/job_launching/../../sim_run_4.2/gpgpu-sim-builds/libcudart_gpgpu-sim_git-commit-177d02254ae38b6331b17dd6cd139b570a03c589_modified_0.so:/gpgpu-sim/usr/local/gcc-4.5.4/lib64:/gpgpu-sim/usr/local/gcc-4.5.4/lib:/gpgpu-sim/usr/local/gcc-4.5.4/lib/gcc/x86_64-unknown-linux-gnu/lib64/:/gpgpu-sim/usr/local/gcc-4.5.4/lib/gcc/x86_64-unknown-linux-gnu/4.5.4/:/usr/lib/x86_64-linux-gnu:/home/runner/gpgpu-sim_distribution/lib/gcc-4.5.4/cuda-4020/release:/gpgpu-sim/usr/local/gcc-4.5.4/lib64:/gpgpu-sim/usr/local/gcc-4.5.4/lib:/gpgpu-sim/usr/local/gcc-4.5.4/lib/gcc/x86_64-unknown-linux-gnu/lib64/:/gpgpu-sim/usr/local/gcc-4.5.4/lib/gcc/x86_64-unknown-linux-gnu/4.5.4/:/usr/lib/x86_64-linux-gnu:
    
  10. In the same shell, build the debug version of GPGPU-Sim then return to the directory above:

    pushd ~/gpgpu-sim_distribution/
    source setup_environment debug
    make
    popd
    
  11. Open and edit torque.sim and preface the very last line with "gdb --args ". After editing the last line in torque.sim should look something like:

    gdb --args /home/runner/gpgpu-sim_simulations/util/job_launching/../../benchmarks/bin/4.2/release/hotspot-rodinia-2.0-ft 30 6 40 ./data/result_30_6_40.txt
    
  12. Re-run the regression test in gdb by sourcing the torque.sim file:

    . torque.sim
    

    This will put you in at the (gdb) prompt. Setup any breakpoints needed and run.

gpgpu-sim_distribution's People

Contributors

aamirraihan avatar aamodt avatar amruth-s avatar andrewboktor avatar bftf avatar brad-mengchi avatar cng123 avatar coffeebeforearch avatar deval281shah avatar eltantawy avatar gangmul12 avatar gjulianm avatar jooybar avatar jwang323 avatar lucylufei avatar mkhairy avatar negargoli avatar negargoli93 avatar pigrew avatar psuchita avatar qqldd avatar redcarrottt avatar rgreen avatar rspliet avatar shen203 avatar shenjiangqiu avatar speverel avatar sspenst avatar tayler-hetherington avatar tgrogers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpgpu-sim_distribution's Issues

No PTX sections found with sm capability

How to solve this error?
dev branch, above sm_10,

Error: No PTX sections found with sm capability that is lower than current forced maximum capability
minimum ptx capability found = 0, maximum forced ptx capability = 20
User might want to change either the forced maximum capability from gpgpusim configuration or update the compilation to generate the required PTX version

Supporting CUDA 7.5

We have a couple of problems running with cuda 7.5 according to an email discussion I had with a GPGPU-Sim user.
He was able to get his benchmark(s) running by doing some modifications which are available on his/her fork https://github.com/andpic/gpgpu-sim_distribution

Reported by GPGPU-Sim user:
OpenCL 1.1 C++ API compatibility. I added to gpgpu-sim's source code a couple of API functions that are necessary to be able to compile code with the C++ interface. In particular, the "clRetain" functions and "clCreateKernelsInProgram".
PTX parsing. The problem was due to a constant argument passed to the kernel. In other words, gpgpu doesn't like when kernel arguments are not passed with buffers. I fixed the problem by just changing that .const to blank space during the parsing phase.
ptxas output parsing. Ptxas outputs an extra line with the gmem keyword, which is not recognised by gpgpu-sim's parser. I added a bash trick to exclude that line when the file is read.

The simulator compiles and runs well with the latest CUDA 7.5.

dram latency cycles

In the GTX480 configs folder, the dram_latency is defined as 100 core cycles. Can anyone help me understand how did they arrive at this number?

Thank you.

issue in make SDK

Hi, i am using the given VM, but when i am going to make the SDK, an error occurs "/usr/bin/ld: cannot find -lrendercheckgl_x86_64-lGL". Do you know how to fix it? Thank you

Makefile.makedepend no such file

while sourcing setup_environment was successful, I get an error that can not find uda-sim/Makefile.makedepend

`mahmood@vb:~/gpgpu-sim_distribution$ make

    Building GPGPU-Sim version 3.2.2 (build ) with CUDA version 4.2

if [ ! -d lib/gcc-4.6.3/cuda-4020/release ]; then mkdir -p lib/gcc-4.6.3/cuda-4020/release; fi;
if [ ! -d /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/libcuda ]; then mkdir -p /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/libcuda; fi;
if [ ! -d /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/cuda-sim ]; then mkdir -p /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/cuda-sim; fi;
if [ ! -d /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/cuda-sim/decuda_pred_table ]; then mkdir -p /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/cuda-sim/decuda_pred_table; fi;
if [ ! -d /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/gpgpu-sim ]; then mkdir -p /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/gpgpu-sim; fi;
if [ ! -d /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/libopencl ]; then mkdir -p /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/libopencl; fi;
if [ ! -d /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/libopencl/bin ]; then mkdir -p /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/libopencl/bin; fi;
if [ ! -d /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/intersim2 ]; then mkdir -p /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/intersim2; fi;
if [ ! -d /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/cuobjdump_to_ptxplus ]; then mkdir -p /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/cuobjdump_to_ptxplus; fi;
if [ ! -d /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/gpuwattch ]; then mkdir -p /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/gpuwattch; fi;
if [ ! -d /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/gpuwattch/cacti ]; then mkdir -p /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/gpuwattch/cacti; fi;
make -C ./src/cuda-sim/ depend
make[1]: Entering directory /home/mahmood/gpgpu-sim_distribution/src/cuda-sim' Makefile:149: /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/cuda-sim/Makefile.makedepend: No such file or directory touch /home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/cuda-sim/Makefile.makedepend makedepend -f/home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/cuda-sim/Makefile.makedepend -p/home/mahmood/gpgpu-sim_distribution/build/gcc-4.6.3/cuda-4020/release/cuda-sim/ cuda_device_printf.cc cuda-sim.cc instructions.cc memory.cc ptx_ir.cc ptx_loader.cc ptx_parser.cc ptx_sim.cc ptx-stats.cc 2> /dev/null make[1]: *** [depend] Error 127 make[1]: Leaving directory /home/mahmood/gpgpu-sim_distribution/src/cuda-sim'
make: *** [cuda-sim] Error 2
`

Running

Hello I am using 'gpgpusim-dev ver.' and running 'cudnn_samples_v7/mnistCUDNN' I get this error message.

_"CUDNN failure
Error: CUDNN_STATUS_NOT_INITIALIZED"
_

My environment settings are like below.
(Virtualbox)
Ubuntu 16.04.5 LTS
gcc 5.4.0 g++ 5.4.0
python 3.6.7
cuda 9.1
cudnn 7.0.5

Does anyone know how to solve that error message?

DNN workloads

Is it possible to simulate DNN workloads with the dev branch? Has anyone tried that?

Compilation issue with solution

While testing the dev branch with
ubuntu 16.04
gcc/g++ 4.8
cuda 7.5

In case of error: ‘isnan’ was not declared in this scope, just use std::isnan or add using namespace std; at the top of the problematic file

Any more recent cuda_runtime_api.cc?

I'm trying to run tensor core applications with dev branch.
When I try to use certain cuda api such as, cudaFuncSetAttributes and cudaMallocManaged, it says it's not in library.

Do you have more recent cuda library or should I have to find another way to use tensor core without those higher version CUDA libraries?

How can I change GPGPU-Sim's CUDA compute capability from 5.2 to 7.0

I built the gpgpu-sim 4.0.0 version and used it with CUDA tookkit 9.1 and CUDA SDK.

I successfully executed matrix multiplication in CUDA SDK.
However, I ran into error with CUDA compute capability when I tried to run "cudaTensorCoreGemm" in CUDA SDK sample application.

It all starts to execute nicely at first, but after loading all ptx configuration releated lines, error comes out like this.
image

I copied configuration files from SM7-Titan V, which should have compute capability 7.0 and it is shown in very first output of execution.
image

Do I have to change this compute capability number some how by looking inside gpgpu-sim simulator code? Or should I have to find some other way to link this 7.0 capability written in configuration file to simulator?

Thank you for any advice

dev branch builds but gpgpusim does not run

I'm trying to run some simulations on gpgpu-sim. I'm using the dev branch that supports some of the newer CUDA versions. The build succeeds but when I try to run a program, the CUDA calls go to the system installation of CUDA instead of GPGPU-Sim. No errors or warnings are thrown. I have copied config files to the folder where I run the program from. No modifications were made to the config files from the tested_configs folder.

I'm using nvidia-docker (nvidia/cuda:9.1-devel) and have installed all the dependencies. Environment setup and build are successful.

My $PATH is /root/gpgpu-sim_distribution/bin/:usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
My $CUDA_INSTALL_PATH is /usr/local/cuda

Any help would be highly appreciated

Running OpenCL but get nothing in the report

I used g++ to compile an OpenCL project. Then execute the code with "run_gpgpu-sim.sh".. However, it only outputed:

GPGPU-Sim finished running "/home/gpgpu-sim/a.out"
Used rundir=/home/gpgpu-sim/GTX480_rundir

How can I get the detailed report as running a C programme?
I am using the fully setup virtual machine provided by you.

gpgpusim has been built but unable to run a cuda application

In the Dev branch it is stated that after building gpgpusim, it is possible to run cuda based applications by chnaging the cuda library paths to gpgusim library folder. Also, the GPU config file should be in the working directory.

Well, I did that but the application says no cuda device found! I also ran deviceQuery and it also said there is no cuda.

mahmood@u1604:~/gpgpu-sim_distribution$ echo $LD_LIBRARY_PATH

mahmood@u1604:~/gpgpu-sim_distribution$ source setup_environment release
GPGPU-Sim version 3.2.2 (build ) WARNING ** GPGPU-Sim version 3.2.2 not fully tested with CUDA version 7.5 (please see README)
configured with GPUWattch.
setup_environment succeeded
mahmood@u1604:~/gpgpu-sim_distribution$ echo $LD_LIBRARY_PATH
/home/mahmood/gpgpu-sim_distribution/lib/gcc-4.8.5/cuda-7050/release:
mahmood@u1604:~/gpgpu-sim_distribution$ ls /home/mahmood/gpgpu-sim_distribution/lib/gcc-4.8.5/cuda-7050/release
libcudart.so    libcudart.so.4    libcudart.so.6.0  libcudart.so.8.0
libcudart.so.2  libcudart.so.5.0  libcudart.so.6.5
libcudart.so.3  libcudart.so.5.5  libcudart.so.7.5
mahmood@u1604:~/gunrock/build/bin$ ls
bc   chesapeake.mtx            hits   shared_lib_bc   shared_lib_example  sssp
bfs  config_quadro_islip.icnt  pr     shared_lib_bfs  shared_lib_pr       topk
cc   gpgpusim.config           salsa  shared_lib_cc   shared_lib_sssp     wtf
mahmood@u1604:~/gunrock/build/bin$ ./bfs market chesapeake.mtx --src=0 --unidirected
Loading Matrix-market coordinate-formatted graph ...
Reading from chesapeake.mtx:
  Parsing MARKET COO format (39 nodes, 340 directed edges)... Done parsing (0s).
  Converting 39 vertices, 340 directed edges (unordered tuples) to CSR format...
Done converting (0s).

Degree Histogram (39 vertices, 340 edges):
    Degree   0: 0 (0.00%)
    Degree 2^0: 0 (0.00%)
    Degree 2^1: 1 (2.56%)
    Degree 2^2: 22 (56.41%)
    Degree 2^3: 13 (33.33%)
    Degree 2^4: 2 (5.13%)
    Degree 2^5: 1 (2.56%)

  Converting 39 vertices, 340 directed edges (unordered tuples) to CSR format...
Done converting (0s).
Source vertex: 0
Using 1 GPU: [ 0 ].
[/home/mahmood/gunrock/gunrock/util/info.cuh, 470 @ gpu 32764] cudaGetDevice failed (CUDA error 38: no CUDA-capable device is detected)
[/home/mahmood/gunrock/gunrock/util/test_utils.cu, 61 @ gpu 0] cudaSetDevice failed. (CUDA error 38: no CUDA-capable device is detected)
[/home/mahmood/gunrock/gunrock/util/info.cuh, 484 @ gpu 0] cudaStreamCreate failed. (CUDA error 38: no CUDA-capable device is detected)
CODE REQUESTED INVALID CUDA DEVICE -2050039464
mahmood@u1604:~/NVIDIA_CUDA-7.5_Samples/1_Utilities/deviceQuery$ make
"/usr/local/cuda-7.5"/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o deviceQuery.o -c deviceQuery.cpp
"/usr/local/cuda-7.5"/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o deviceQuery deviceQuery.o 
mkdir -p ../../bin/x86_64/linux/release
cp deviceQuery ../../bin/x86_64/linux/release
mahmood@u1604:~/NVIDIA_CUDA-7.5_Samples/1_Utilities/deviceQuery$ ../../bin/x86_64/linux/release/deviceQuery 
../../bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

So, what should I do now?

Runtime Error: "undefined symbol: yyparse"

I have successfully built gpgpusim dev branch, but when I try to simulate my own kernel(which does not require cuBLAS), I face a runtime error indicating:

symbol lookup error: /ROOT/gpgpu-sim_distribution/lib/gcc-5.4.0/cuda-9000/release/libcudart.so.9.0: undefined symbol: yyparse

Here is the configuration of my setup:

  • Ubuntu 16.04
  • CUDA 9.0
  • GCC 5.4.0
  • GPGPU-Sim 4.0.0

Would really appreciate your help.
Thanks

Trig functions (sinf, cosf) incorrect PTX operands on CUDA 8.0

Using dev branch commit 551da3e, CUDA 8.0, and gcc-5.4.0

I haven't checked other commits or CUDA versions.

Compiling the following kernel (full code here)

__global__ void kernel(float *a, int offset)
{
  int i = offset + threadIdx.x + blockIdx.x*blockDim.x;
  float x = (float)i;
  float s = sinf(x);  // Generates problematic instructions
  float c = cosf(x); // Generates problematic instructions
  a[i] = a[i] + sqrtf(s*s+c*c); // Does not cause a problem
}

Will generate code causing a ptx runtime error:

./ptx_ir.h:1039: const operand_info& ptx_instruction::operand_lookup(unsigned int) const: Assertion `n < m_operands.size()' failed.

This is caused by the madc line, below. gpgpu-sim expects madc to have a fifth operand specifying the carry.

...
BB0_4:
.pragma "nounroll";
ld.const.u32 %r82, [%rd28];
	{
mad.lo.cc.u32 %r80, %r82, %r3, %r175;
madc.hi.u32 %r175, %r82, %r3, 0;
}
...

The program runs to completion if Taylor expansion versions of sinf and cosf are used instead of the cuda intrinsics.

Fail to prune PTX section list

Used Environments

  • Ubuntu 16.04.5
  • CUDA 8.0
  • cuDNN 7.1.4
  • gcc 5.4.0
  • g++ 5.4.0
  • python 2.7.12
  • GPU: GTX 1080 Ti

Problem Situation

Hello, I'm trying to run pytorch-gpgpu-sim but I'm facing a problem.

After it extract all the ptx files from libcudnn.so, it fails to get pruned PTX section list as shown at following message.

Extracting PTX file and ptxas options  194: libcudnn.194.sm_50.ptx -arch=sm_50
Extracting specific PTX file named libcudnn.195.sm_61.ptx
Extracting PTX file and ptxas options  195: libcudnn.195.sm_61.ptx -arch=sm_61
Extracting specific PTX file named libcudnn.196.sm_35.ptx
Extracting PTX file and ptxas options  196: libcudnn.196.sm_35.ptx -arch=sm_35
Error: No PTX sections found with sm capability that is lower than current forced maximum capability
 minimum ptx capability found = 0, maximum forced ptx capability = 61
 User might want to change either the forced maximum capability from gpgpusim configuration or update the compilation to generate the required PTX version

I guess that I have problem in setting sm capability, but I don't know where I have to modify. 

if(prunedList.empty()){
printf("Error: No PTX sections found with sm capability that is lower than current forced maximum capability \n minimum ptx capability found = %u, maximum forced ptx capability = %u \n User might want to change either the forced maximum capability from gpgpusim configuration or update the compilation to generate the required PTX version\n",min_ptx_capability_found,forced_max_capability);
abort();
}

Updates planned for gpgpu-sim

Hi,

I am planning to start using gpgpu-sim for my GPU architecture related research. I stumbled upon this branch yesterday and saw that there are quite a lot of updates to the previous gpgpu-sim that is on the master branch. Is there a plan to do a full documented release of this new version ? I am just curious as to what are the new features that will be supported here. Thank you.

gpgpusim with pytorch_gpgpusim

Issue: Syntax and Parse errors in the libcudnn ptx files.

Set up:
GPGPUSIM version 4.0.0
GCC: 4.9.4
CUDA: 9.1
CUDNN: 7.1
PYTORCH_BIN: /usr/local/cuda-9.1/lib64/libcudnn.so

Program: saxpy.cu link: https://devblogs.nvidia.com/easy-introduction-cuda-c-and-c/

Description: Standalone GPGPUSIM works fine with the above set-up and the example program run successfully.

After installing the PyTorch_GPGPUSIM and setting the PYTORCH_BIN = /usr/.../CUDA-9.1/.../libcudnn.so, the above mentioned programs fail to execute. The simulator extracts a large number of PTX files from libcudnn and the simulator generates syntax and parse errors.

Simulation log file attached here.
out_pytorch_sm70.txt

Thank you.

GPGPU-sim with cuDNN

Environment
Ubuntu 16.04.4 LTS
gcc/g++ 5.4.0 20160609
Python 2.7
CUDA 8.0
cuDNN 7.1.4
gpgpu-sim_distribution dev branch
pytorch-gpgpu-sim (removed the git dependencies of nervanagpu @ d4eefd5, since the repo is no longer there)

PYTORCH_BIN /usr/lib/x86_64-linux-gnu/libcudnn.so

The configurations used are from the configs folder of gpgpu-sim dev branch

MNIST
I use the MNIST sample from here. (Following deval281shah 's suggestions in another discussion. )
https://github.com/gpgpu-sim/gpgpu-sim_simulations

Config
I use TITANV config.
I also tried TITANX config, but a deadlock happened with that configuration.

The simulation runs for 39 minutes, and I checked it has stimulated a number of kernels, and it reported some related information such as IPC.

In the beginning, it generated a large amount of .. it cannot find all device function required.

Warning: cannot find deviceFun maxwell_zgemmBatched_32x32_raggedMn_ct
Warning: cannot find deviceFun maxwell_zgemmBatched_64x32_raggedMn_ct
Warning: cannot find deviceFun maxwell_zgemmBatched_32x32_raggedMn_cn
Warning: cannot find deviceFun maxwell_zgemmBatched_64x32_raggedMn_cn
Warning: cannot find deviceFun maxwell_zgemmBatched_32x32_raggedMn_tc
Warning: cannot find deviceFun maxwell_zgemmBatched_64x32_raggedMn_tc
Warning: cannot find deviceFun maxwell_zgemmBatched_32x32_raggedMn_tt
Warning: cannot find deviceFun maxwell_zgemmBatched_64x32_raggedMn_tt
Warning: cannot find deviceFun maxwell_zgemmBatched_32x32_raggedMn_tn
Warning: cannot find deviceFun maxwell_zgemmBatched_64x32_raggedMn_tn
Warning: cannot find deviceFun maxwell_zgemmBatched_32x32_raggedMn_nc

Therefore, cudaLaunchKernel fails to find the device function.

However, it ends up with
...

GPGPU-Sim PTX: Setting up arguments for 4 bytes starting at 0x7ffc0b96aa68..

GPGPU-Sim PTX: cudaLaunch for 0x0x4321f0 (mode=performance simulation) on stream 0
GPGPU-Sim PTX: ERROR launching kernel -- no PTX implementation found for 0x4321f0

Does anyone encounter this problem before? Any suggestions?

Thank you so much for your help.

ispass2009 workloads for the dev branch

For the master branch, it is possible to compile the ispass workloads. However, with the dev branch, there is no common.mk and hence the workloads cannot be built. Is there a new set for the dev branch?

Update test

I have successfully built dev branch on the following system

    ubuntu 16.04
    CUDA 8 GA1
    gcc 5.4

The following packages are needed

For CUDA:

    sudo apt-get install nvidia-384-dev libglu1-mesa-dev libxi-dev libxmu-dev
    bash cuda_8.0.44_linux.run

I didn't allow the run file to install the driver since I did that via apt-get.

For GPGPUSIM:

sudo apt-get install flex bison xutils-dev zlib1g-dev

Changes in the code:
1- Replace isnan() with std::isnan() in instructions.cc and cuda-math.h
2- Replace std::cout << "Failed to execute: " << cmd << std::endl; with std::cout << "Failed to execute: " << cmd.str() << std::endl; in cuda_runtime_api.cc
3- Replace std::cout << "Using command: " << cmd << std::endl; with std::cout << "Using command: " << cmd.str() << std::endl; in cuda_runtime_api.cc
4- Replace std::cout << "Trying to parse " << libcodfn << std::endl; with std::cout << "Trying to parse " << libcodfn.str() << std::endl; in cuda_runtime_api.cc

P.S: I wasn't able to create a new pull request.

gpgpusim : Aborted (core dumped) When run gpgpusim_simulation.

I was trying to run the gpgpu-sim_simulation.

When I was trying to run the benmark ibackprop, the following error pops:

terminate called after throwing an instance of 'std::ios_base::failure[abi:cxx11]'
  what():  basic_ios::clear: iostream error
Aborted (core dumped)

The command I use is following:

/root/gpgpusim-dev/gpgpu-sim_distribution/gpgpu-sim_simulations/util/job_launching/../../benchmarks/bin/4.2/release/backprop-rodinia-2.0-ft 4096 ./data/result-4096.txt

The following are my work environment:

echo $GPGPUSIM_CONFIG

> gcc-4.5.3/cuda-4020/release

echo $GPGPUSIM_ROOT

> /root/gpgpusim-dev/gpgpu-sim_distribution

My cuda version is cudnn 4.2.

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2012 NVIDIA Corporation
Built on Thu_Apr__5_00:24:31_PDT_2012
Cuda compilation tools, release 4.2, V0.2.1221

Is the error due to my cuda version? Is there anyone know how to fix this issue?

Failed to run cudaGetDeviceCount() with pytorch-gpgpu-sim

Used Environments

  • Ubuntu 16.04.5
  • CUDA 8.0
  • cuDNN 7.1.4
  • gcc 5.4.0
  • g++ 5.4.0
  • python 2.7.12

Problem Situation

Hello, I am trying to run pytorch-gpgpu-sim with gpgpu-sim_distribution dev branch.

I've built gpgpu-sim_distribution and pytorch-gpgpu-sim successfully.

However, I could not run my PyTorch model.

When I'm trying to run it, the PyTorch is terminated with following message.

THCudaCheck FAIL file=/home/redcarrottt/pytorch-gpgpu-sim/aten/src/THC/THCGeneral.cpp line=74 error=30 : unknown error
Traceback (most recent call last):
  File "./infer.py", line 130, in <module>
    infer(net, args)
  File "./infer.py", line 47, in infer
    net_gpu = net.cuda()
  File "/home/redcarrottt/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 258, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/redcarrottt/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 185, in _apply
    module._apply(fn)
  File "/home/redcarrottt/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 191, in _apply
    param.data = fn(param.data)
  File "/home/redcarrottt/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 258, in <lambda>
    return self._apply(lambda t: t.cuda(device))
RuntimeError: cuda runtime error (30) : unknown error at /home/redcarrottt/pytorch-gpgpu-sim/aten/src/THC/THCGeneral.cpp:74

As I know, the CUDA function which makes the error is cudaGetDeviceCount().
(called at aten/src/THC/THCGeneral.cpp:74)

I guess that my PyTorch's linking configuration is wrong.

Anyone knows what is the exact reason?
Thank you.

runtime error: "ImportError: /usr/local/cuda/lib64/libcublas.so.4: undefined symbol: cudaMemsetAsync"

Hi, all,
I am a new CNN learning and trying run cuda-convnet on gpgpu-sim_distribution simulator.
Cuda-convnet can run well on my server, but when I run cuda-convnet with the gpgpusim simulator, an error happenes. The error is "ImportError: /usr/local/cuda/lib64/libcublas.so.4: undefined symbol: cudaMemsetAsync".
The details are like this:

Importing _ConvNet C++ module
Traceback (most recent call last):
File "convnet.py", line 203, in
model = ConvNet(op, load_dic)
File "convnet.py", line 43, in init
IGPUModel.init(self, "ConvNet", op, load_dic, filename_options, dp_params=dp_params)
File "/root/convnet/trunk/gpumodel.py", line 88, in init
self.import_model()
File "convnet.py", line 49, in import_model
self.libmodel = import(lib_name)
ImportError: /usr/local/cuda/lib64/libcublas.so.4: undefined symbol: cudaMemsetAsync

The wired thing is Cuda-convnet can run well on my server. After I change my $PATH and $LD_LIBRARY_PATH to the original value, the error disappears.
I setup my gpgpusim simulator following the tutorial and copy the config files into the convnet's work path.

I build the gpgpusim with gcc-5.4.1. The GPU config chose GTX480.
My cuda version is 4.0.
Here is the output of my libcublas.so
nm -D /usr/local/cuda/lib64/libcublas.so |grep ‘cuda’

         U __cudaRegisterFatBinary
         U __cudaRegisterFunction
         U __cudaRegisterTexture
         U __cudaUnregisterFatBinary
         U cudaBindTexture
         U cudaConfigureCall
         U cudaCreateChannelDesc
         U cudaEventCreateWithFlags
         U cudaEventDestroy
         U cudaEventQuery
         U cudaEventRecord
         U cudaEventSynchronize
         U cudaFree
         U cudaFuncGetAttributes
         U cudaGetDevice
         U cudaGetDeviceProperties
         U cudaGetExportTable
         U cudaGetLastError
         U cudaLaunch
         U cudaMalloc
         U cudaMemcpy
         U cudaMemcpy2D
         U cudaMemcpy2DAsync
         U cudaMemcpyAsync
         U cudaMemsetAsync
         U cudaSetupArgument
         U cudaThreadSynchronize
         U cudaUnbindTexture

ll /usr/local/cuda/lib64/libcublas.so

lrwxrwxrwx 1 root root 14 Mar 17 16:10 /usr/local/cuda/lib64/libcublas.so -> libcublas.so.4*

Anyone know the reason for this error and how to fix it? Could anyone help me out with this issue? Thanks a lot in advance!

bool m_current_response_ready;

hi everyone
in gpu-cache.h in class mshr_table there is this code:

bool m_current_response_ready;
std::list<new_addr_type> m_current_response;

i dont see the m_current_response_ready in anywhere of the source code!

is there anything wrong ?

Error in compilation

Hi.

i've an error very strange that i do not know how to resolve.

The problem is in the end of the compilation when the linker ld is executed. This is the problem:

/usr/bin/ld: no se puede encontrar -lGL //(here is saying that it is not possible to find -lGL option)
collect2: error: ld returned 1 exit status
Makefile:143: fallo en las instrucciones para el objetivo 'lib/gcc-5.4.0/cuda-4000/release/libcudart.so

I am using ubuntu 16.04 and the version of ld came with the operative system. By other hand i've been searching for this option and i am not able to find nothing.

Somebody can help me?

Thanks you very much.

pytorch-gpgpu-sim "PYTORCH_BIN"

Hello, I am trying to simulate pytorch-gpgpu-sim with gpgpu-sim.dev.

on README of pytorch_gpgpusim, it says "export PYTORCH_BIN=/path/to/libcudnn.so", which I think it is doing dynamic linking with cudnn.
Is it true that pytorch use that "PYTORCH_BIN" path?
Is it right to dynamic link the cudnn when using PyTorch?

Thank you.

GPU memory size over 4GB

I'm trying to run a very large scale application which consumes over 4GB GPU memory on GPGPU-sim dev branch with GTX1080Ti config. (CUDA7.5, gcc-4.4.7, g++-4.4.7, Ubuntu14.04)

However, it doesn't work since GPU memory corruption (maybe overflow). So, I inspected source codes of GPGPU-sim and found "address_type" variable type at src/abstract_hardware_model.h.

src/abstract_hardware_model.h: line 70, typedef unsigned address_type;

I thought it may cause the memory overflow, I modified "unsigned" to "unsigned long long" for 64bit addressing, but it causes new problems with my GPU kernel. (My CUDA application works fine with native GPU GTX1080Ti)

Is it correct current version of GPGPU-sim has <4GB limitation? If so, how can I overcome it?

is this gpuwattch line size error?

In GTX480 gpgpusim.config, L1D block size is set to 128B. However, in gpuwattch_gtx480.xml, L1D block size is set to 32B. Is there any reason for setting value like that? or is this just an error?

Formatting warnings of printf

With the following specs
ubuntu 16.04
cuda 7.5
gcc/g++ 4.8

there are some warnings in ptx_parser.cc such as this

ptx_parser.cc:438:41: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘new_addr_type {aka long long unsigned int}’ [-Wformat=]
               addr+addr_pad + num_bits/8);

That line of code is

printf("from 0x%x to 0x%lx (global memory space) %u\n",   
                    addr+addr_pad,    
                    addr+addr_pad + num_bits/8,
                    g_const_alloc++);

The data types are

   size_t num_bits;
   unsigned addr_pad;
   new_addr_type addr;

and the last one is actually unsigned long long. So, I think the correct format is %llx for all of the statements that use addr.

Assertion `a.m_init && b.m_init' failed

For a cuda written code, I get this error from the simulator

ptx_ir.h:116: bool type_info_key_compare::operator()(const type_info_key&, const type_info_key&) const: Assertion `a.m_init && b.m_init' failed.

The call stack shows that in the following function

type_info *symbol_table::get_array_type( type_info *base_type, unsigned array_dim ) 
{
   type_info_key t = base_type->get_key();
   t.set_array_dim(array_dim);
   type_info *pt;
   pt = m_types[t] = new type_info(this,t);       // HERE
   return pt;
}

The following assertion fails


struct type_info_key_compare {
   bool operator()( const type_info_key &a, const type_info_key &b ) const
   {
      assert( a.m_init && b.m_init );
      if ( a.m_space_spec < b.m_space_spec ) return true;
      if ( a.m_scalar_type_spec < b.m_scalar_type_spec ) return true;
      if ( a.m_vector_spec < b.m_vector_spec ) return true;
      if ( a.m_alignment_spec < b.m_alignment_spec ) return true;
      if ( a.m_extern_spec < b.m_extern_spec ) return true;
      if ( a.m_array_dim < b.m_array_dim ) return true;
      if ( a.m_is_function < b.m_is_function ) return true;

      return false;
   }
};

I can not track what happens here although I attached GDB. Can someone shed a light? What is the purpose of that assertion?

libcudnn seems not to call cudaLaunchKernel in GPGPU-Sim.

I’ve tried to run cudnn_samples_v7 with GPGPU-Sim, but its cuDNN kernels does not run on GPGPU-Sim.
It makes following message when "g_debug_execution = 3".

GPGPU-Sim PTX: CUDA API function "cudaError_t cudaMemcpy(void*, const void*, size_t, cudaMemcpyKind)" has been called.
GPGPU-Sim PTX: cudaMemcpy(): devPtr = 0xc01a5300
GPGPU-Sim API: Stream Manager State
GPGPU-Sim API:    stream 0 has 1 operations
GPGPU-Sim API:       0 :  stream operation memcpy host-to-device
GPGPU-Sim: ** START simulation thread (detected work) **
GPGPU-Sim API: Stream Manager State
GPGPU-Sim API:    stream 0 has 1 operations
GPGPU-Sim API:       0 :  stream operation memcpy host-to-device
GPGPU-Sim API: stream 0 performing memcpy host-to-device
GPGPU-Sim PTX: copying 3136 bytes from CPU[0x7fffdc43aa00] to GPU[0xc01a5300] ...  done.
GPGPU-Sim: ** STOP simulation thread (no work) **
GPGPU-Sim: *** simulation thread starting and spinning waiting for work ***
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.020256 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.029696 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.037888 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.070240 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.072352 time requiring 2057744 memory

GPGPU-Sim PTX: CUDA API function "cudaError_t cudaMalloc(void**, size_t)" has been called.
GPGPU-Sim PTX: allocating 46080 bytes on GPU starting at address 0xc01a6000
GPGPU-Sim PTX: cudaMallocing 46080 bytes starting at 0xc01a6000..

cudaLaunchKernel function should be called after “Testing cudnnFindConvolutionForwardAlgorithm”, but it has never been called.

On the other hand, if I try to run CUDA sample(such as vectorAdd), it does work well.

I guess that my cuDNN library does not execute cudaLaunchKernel function in GPGPU-Sim ‘libcudart.so'.
It seems to call cudaLaunchKernel in original 'libcudart.so'.

ptx.tab.h not found

I have recently downloaded gpgpu-sim_distribution. I am using ubuntu 14.04 (64bit), gcc 4.4.7 and cudatoolkit 4.0, bison 2.4.1 and flex 2.5.35 with all other dependencies as mentioned in the README docs. But when I am building gpgpu-sim , I am getting the follwing error:

ptx_parser.cc:30: fatal error: ptx.tab.h: No such file or directory
compilation terminated.
make[1]: *** [/home/ssb/gpgpu-sim_distribution/build/gcc-4.4.7/cuda-4000/release/cuda-sim/ptx_parser.o] Error 1
make[1]: Leaving directory `/home/ssb/gpgpu-sim_distribution/src/cuda-sim'
make: *** [cuda-sim] Error 2

Earlier with the same versions of s/w, I was not getting any error and able to run gpgpusim and the ispass2009 benchmarks.
Kindly help what I have to do to get the successful build.

Thanks

not able to run pytorch-kernel

I tried to run pytorch with gpgpusim-dev, but i found that gpgpu-sim are not able to register cuda kernels in pytorch(like kernelPointwiseApply1 in aten/src/THC/THCApply.cuh).
Does anyone know that it is intended result of author or it is just my fault!

Thank you

Used env:
Ubuntu 16.04.6
gcc&g++ 5.4.0
CUDA 8.0
cuDNN 7.1.4

unimplemented PTX ISA problem

I found some of PTX ISA are not implemented while I was parsing pytorch libraries.
(found in 151th PTX file in libcaffe2_gpu.so)

After examining ptx.y and ptx.l files, following statements are seemed to not be implemented.
.callprototype,
.calltargets,
.branchtargets,
(above are critical in control flow)
.byte,
.maxnreg,
.reqntid,
.section,
.struct,
.union

wish that there's any progress

Pytorch-GPGPU-SIM error

When i run GPGPU-SIM based on CUDA it operates well. All things are operate well.
But when i run Pytorch-GPGPU-SIM it is not operate well.

I don't know exactly what is wrong

Here is my setting and my code . i use docker

Ubuntu : 16.04
gcc : 4.8.4
cuda : 8.0
cudnn : 6.0
제목 없음

Finally i get this error and stop.
My pytorch code is very simple for test.

import torch
from torch.autograd import Variable

a= torch.ones(2,2)
b= torch.ones(2,2)
print(a)

a=Variable(a, requires_grad=True).cuda()
b=Variable(a, requires_grad=True).cuda()
b= a+2
print(b)

It operates well without gpgpu-sim but in gpgpu-sim when b=a+2 the error is occur.

In addition to my code, in /pytorch-gpgpu-sim/test/ many sample codes are all generate error like this. What is problem? please help me...

Problem with Debugger mode

Hi ,
I export the environment variables (export GPGPUSIM_DEBUG=1).
When I start to executing the NVIDIA Benchmark , there are some commands for users to use .
But I don't have any idea about the commands ?
Here are the commands and my problem :

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1 . b : - set breakpoint
I don't know what's the file mean (is it a benckmark ?) and how to calculate the thread uid ?
2 . w - set watchpoint
The global address range ?
3 . del - delete breakpoint
What is the n mean ?
4 . dp - display pipeline contents on SM
There are shader cores in the CTAs will be initial state while starting executing the program.
If I set the 1 to n , it means shader core 1 in every CTA(maybe 0~5) or shader core 1 in CTA 0 ?

Regards.,

When launch an empty kernel, simulator would keep looping

When we launched a kernel with like gridDim(0,0,1), which is exactly an empty kernel, the kernel will launch successfully, and the gpu->active() will always be true because the variable "pending" is true, and stream->op will have nothing to do because there are no more CTAs left in that kernel to be executed.
The loop {}while(gpu->active) will never stop: src/gpgpusim_entrypoint.cc:102

cache_config::tag() will lose some bit

in gpu-cache.h:677,in tag(new_addr_type addr) function,
return addr& ~(m_line_sz-1). while addr is unsigned long long, and m_line_sz-1 is unsigned int, this code will loose the higher 32 bit infomation of addr.

for example:
unsigned long long a=0xFFFFFFFFFFFFFFFF;
unsigned int mask= 16;
unsigned long long mask_2=16;

unsigned long long result1=a & ~(mask-1);
unsigned long long result2=a &~(mask_2-1);
printf("%llx\n",result1);
printf("%llx\n",result2);
return 0;

the output is
fffffff0
fffffffffffffff0

disable L2 cache make error

gpgpu sim 3.x version can disable L2 cache.

but, the latest version can't do it.

Is there anyone who knows this problem

boost version

It seems that the ispass benchmarks are not compatible with all boost version. Also, it has not beed stated which boost version is used by the developers. May I know that?

gpgpusim : Aborted (core dumped)

I was trying to run the gpgpu-sim_simulation.

When I was trying to run any benmarks, They all appera the same mistake

virtual void shader_core_config::init(): Assertion `toks' failed.
12260 Aborted (core dumped)

How to see the actual contents of memory?

To be more specific, in GPUTrafficManager::_GeneratePacket, I want to be able to look at the actual contents of the flit.

There's a line that seemed promising, f->data = data that seemed promising but as far as I can tell it's just a mem_fetch object which doesn't contain the actual data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.