-
StreamHPC is a software development company in parallel software for many-core processors.
[TOC]
- task switching
- hardware concurrency
- Interprocess Communications (Microsoft)
- Inter-Process Communication (IPC) Introduction and Sample Code
- POSIX C pthread
- boost::thread
- c++11 std::thread
- Intel® 64 and IA-32 Architectures Software Developer Manuals
- Hotspots, FLOPS, and uOps: To-The-Metal CPU Optimization
8 commands to check cpu information on Linux:
- /proc/cpuinfo: The /proc/cpuinfo file contains details about individual cpu cores.
- lscpu: simply print the cpu hardware details in a user-friendly format
- cpuid: fetches CPUID information about Intel and AMD x86 processors
- nproc: just prints out the number of processing units available, note that the number of processing units might not always be the same as number of cores
- dmidecode: displays some information about the cpu, which includes the socket type, vendor name and various flags
- hardinfo: would produce a large report about many hardware parts, by reading files from the /proc directory
- lshw -class processor: lshw by default shows information about various hardware parts, and the '-class' option can be used to pickup information about a specific hardware part
- inxi: a script that uses other programs to generate a well structured easy to read report about various hardware components on the system
Sysbench -- Scriptable database and system performance benchmark, a cross-platform and multi-threaded benchmark tool
sysbench --test=cpu --cpu-max-prime=20000 --num-threads=4 run
htop - an interactive process viewer for Unix
- htop explained - Explanation of everything you can see in htop/top on Linux
- x86 Assembly
- winasm: The x86 Assembly community and official home of WinAsm Studio and HiEditor
- Easy Code Visual assembly IDE
- 0xAX/asm: Learning assembly for linux-x64
- Intel MMX & SSE
- ARM NEON
The OpenMP API specification for parallel programming, an Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism.
OpenMP有两种常用的并行开发形式: 一是通过简单的 fork/join 对串行程序并行化,二是采用 单程序多数据 对串行程序并行化。
OpenMP in CMakeLists.txt:
find_package(OpenMP)
if (OPENMP_FOUND)
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}")
endif()
OpenACC is a user-driven directive-based performance-portable parallel programming model designed for scientists and engineers interested in porting their codes to a wide-variety of heterogeneous HPC hardware platforms and architectures with significantly less programming effort than required with a low-level model.
Intel Threading Building Blocks (TBB) lets you easily write parallel C++ programs that take full advantage of multicore performance, that are portable and composable, and that have future-proof scalability.
-
For the Raspberry Pi GPU benchmark, use the OpenGL 2.1 test that comes with GeeXLab
-
msalvaris/gpu_monitor: Monitor your GPUs whether they are on a single computer or in a cluster
watch -n 10 nvidia-smi # 每隔10秒更新一下显卡
- ARM MALI GPU
- Nvidia GPU