Git Product home page Git Product logo

tsp's Introduction

Solving the Traveling Salesman Problem (TSP) with brute force and parallelism just for teaching and illustrative purpose.

Brute force parallelism to solve TSP

The code here is mostly adapted from the very nice talk by David Olsen, Faster Code Through Parallelism on CPUs and GPUs at CppCon 2019.

I've just slightly changed the way the permutations are computed which turns out to be slightly faster than the original.

See also companion slides by D. Olsen (Nvidia) at GTC2019 s9770-c++17-parallel-algorithms-for-nvidia-gpus-with-pgi-c++.pdf

Timings

Timings reported here are measured on my own desktop.

Hardware :

  • CPU Intel(R) Core(TM) i5-9400F @ 2.90 GHz - 6 cores
  • GPU Nvidia GeForce RTX 2060

Software:

serial version (reference)

Number of cities time (seconds)
10 0.064
11 0.66
12 8.48
13 130.7
14 1997.9

OpenMP pragmas

GNU compiler

Number of cities time (seconds)
10 0.044
11 0.152
12 1.64
13 24.2
14 368.3

OpenMP target for Nvidia

We need an OpenMP 4.5 capable compiler:

Just follow the steps mentioned here:

Performance are quite slow (compared to PGI/OpenAcc or Kokkos::CUDA below) by a factor of x3 when N >= 12.

Number of cities time (seconds)
10 0.077
11 0.104
12 0.39
13 3.56
14 59.5

I tried three compile toolchains:

  • clang 10.0 with cuda 10.1
  • clang 11.0 with cuda 11.2
  • clang 12.0 with cuda 11.3

OpenAcc (PGI compiler)

OpenAcc for multicore CPU

These results are strange, much too slow (maybe related to my host configuration ?). I also a strong performance drop when using nvhpc 20.11 versus 20.5 (??).

Number of cities time (seconds)
10 0.030
11 0.25
12 7.21
13 XX
14 XX

OpenAcc for GPU

Performance looks good (similar to Kokkos::CUDA below).

Number of cities time (seconds)
10 0.107
11 0.128
12 0.25
13 1.51
14 20.5

Kokkos

To build, you just need to edit kokkos/Makefile, and change the first line by modifying variable KOKKOS_PATH to the full path where you cloned kokkos sources.

Kokkos - OpenMP

Just run make in subdir kokkos.

Timings are similar to OpenMP pragma.

Number of cities time (seconds)
10 0.015
11 0.152
12 1.84
13 24.5
14 376.6

Kokkos - Cuda

Just run make KOKKOS_DEVICES=Cuda in subdir kokkos.

Timings are very similar to Openacc (GPU).

Number of cities time (seconds)
10 0.01
11 0.01
12 0.106
13 1.41
14 22.4

stdpar

stdpar for multicore cpu (nvc++ compiler)

Again, performance obtained here is odd; it's much slower than OpenMP with g++, but should be similar....

Number of cities time (seconds)
10 0.027
11 0.128
12 5.10
13 114.0
14 TODO

stdpar for GPU (nvc++ compiler)

Similar performance as OpenAcc and Kokkos/Cuda.

Number of cities time (seconds)
10 0.007
11 0.016
12 0.15
13 1.40
14 21.3

SYCL

There are a lot of SYCL implementation available, here we tried

SYCL OneAPI for x86 multicore

module load compiler/2021.1.1

Number of cities time (seconds)
10 1.29
11 1.40
12 2.62
13 21.3
14 TODO

for large values of N, we retrieve the expected results (OpenMP, Kokkos::OpenMP, ...).

SYCL OneAPI for Nvidia GPU (LLVM/intel)

module load llvm/12-intel-sycl-cuda

These results should be optimized by changing the number of block of threads and the block size (ร  la CUDA).

Performance are correct, maybe a bit slow for N=14.

Number of cities time (seconds)
10 0.001
11 0.008
12 0.11
13 1.86
14 37.4

tsp's People

Contributors

pkestene avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.