kahypar / mt-kahypar Goto Github PK

Mt-KaHyPar (Multi-Threaded Karlsruhe Hypergraph Partitioner) is a shared-memory multilevel graph and hypergraph partitioner equipped with parallel implementations of techniques used in the best sequential partitioning algorithms. Mt-KaHyPar can partition extremely large hypergraphs very fast and with high quality.

License: MIT License

CMake 3.03% HTML 0.09% C++ 94.31% C 0.55% Perl 0.23% Python 1.70% Shell 0.09%

hypergraph-partitioning partitioning graph-partitioning hypergraphs graphs graph-algorithms hypergraph algorithm-engineering partitioning-algorithms shared-memory

mt-kahypar's Issues

Execution Policy does not execute Refiner on last level if degree zero vertices are removed

Add command line option to change partition output file folder

Building from release tarball fails

Mt-KaHyPar can not be built from the release tarball because the directories of git submodules are not populated:

3 errors found in build log:
     13    -- Detecting C compile features
     14    -- Detecting C compile features - done
     15    -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
     16    -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
     17    -- Found Threads: TRUE
     18    -- Found Threads:
  >> 19    CMake Error at CMakeLists.txt:17 (add_subdirectory):
     20      The source directory
     21    
     22        /tmp/sebastian/spack-stage/spack-stage-mt-kahypar-1.0.0-edfz2bj2yxmuclvzq2lufvyy3eskovxd/spack-src/external_tools/googletest
     23    
     24      does not contain a CMakeLists.txt file.
     25    

     ...

     38    -- Default linker
     39    -- Performing Test KAHYPAR_HAS_CRC32
     40    -- Performing Test KAHYPAR_HAS_CRC32 - Success
     41    -- CMAKE_CXX_FLAGS:  -W -Wall -Wextra  -Wunused  -Wuninitialized -Wfatal-errors -Wcast-qual -Woverloaded-virtual -Wredundant-decls -Winit-self -pedantic -DPARANOID  -Wno-unused-function -pthread -
           std=c++17 -ltbb -ltbbmalloc_proxy -msse4.2 -mcrc32
     42    -- CMAKE_CXX_FLAGS_RELEASE: -O3 -DNDEBUG -O3 -mtune=native -march=native
     43    -- CMAKE_CXX_FLAGS_DEBUG: -g -g3 -fsanitize=undefined -fno-omit-frame-pointer -fsanitize=address
  >> 44    CMake Error at python/CMakeLists.txt:9 (add_subdirectory):
     45      The source directory
     46    
     47        /tmp/sebastian/spack-stage/spack-stage-mt-kahypar-1.0.0-edfz2bj2yxmuclvzq2lufvyy3eskovxd/spack-src/python/pybind11
     48    
     49      does not contain a CMakeLists.txt file.
     50    
     51    
  >> 52    CMake Error at python/CMakeLists.txt:11 (pybind11_add_module):
     53      Unknown CMake command "pybind11_add_module".
     54    
     55    
     56    -- Configuring incomplete, errors occurred!
     57    See also "/tmp/sebastian/spack-stage/spack-stage-mt-kahypar-1.0.0-edfz2bj2yxmuclvzq2lufvyy3eskovxd/spack-build-edfz2bj/CMakeFiles/CMakeOutput.log".

Eliminate singletons

We use singletons way too much and they're not particularly "parallel-friendly". In many cases they can simply be replaced by a member. This makes it easier to use certain things in parallel, e.g., Randomize, which currently sometimes requires requesting a processor ID.

Visualization for Memory Consumption

Would be nice, if we would have some visual representation of the memory consumption of our hypergraphs. The SDSL has there some nice concept by serializing the data structure in a tree structure enhanced with size information and writing out some visual representation in form of a pie chart using java script library "D3".

Force coarsening to reach contraction limit

Aborting coarsening before reaching the contraction limit can have a significant impact on the performance and scalability of the initial partitioning. There are 3 main reasons why contraction limit is not reach during coarsening:

1.) Maximum allowed hypernode weight prevents valid contractions
Currently, the maximum allowed contraction limit is set to total weight of the hypergraph divided by the contraction limit. The rational behind this is, that if the contraction limit is reached than each node should have the same weight. However, in case of social instances a majority of the hypernodes are matched to their high degree center nodes. The current maximum allowed node weight is in that case to restrictive and needs to be relaxed. One idea could be to incorporate the sum of the weight of the neighborhood of the node with the highest degree, which somehow guarantees that each hypernode can be matched with its corresponding center node. This should be inline with the maximum allowed block weight.

2.) Zero-degree vertices introduce size-one communities (see wb-edu.mtx.hgr)
We should remove all zero-degree vertices before partitioning and add them afterwards with a bin-packing heuristic to the partition. Moreover, after single-pin hyperedges are removed, we should perform an additional check for zero-degree vertices.

3.) Many small communities (see DAC2012 instances)
In case, we have many small isolated communities, it can be hard for coarsening to reach contraction limit. One idea could be to merge small communities into a big one and if during coarsening no further contraction is possible (because all vertices have degree zero), than we should randomly contract the hypernodes.

TBB Internal NUMA Support

As of recently, TBB provides NUMA support in its task arenas -- i.e. one arena per socket, pinning threads that joined the arena to the corresponding socket and such nice things.
We should consider using their integrated support instead of Tobi's custom code for two reasons: 1) less code for us to maintain, 2) might be faster.
Open question: can we still mock hardware topology? This is desirable for testing purposes.

Edit: Apparently TBB uses hwloc to parse hardware topology info. Hence, mocking topology sounds quite possible.

Make EPS and MAX_ITERATION in PLM configurable

We should make epsilon and max iteration in PLM configurable in context.h

See line 114 in plm.h:
for (int currentRound = 0; nodesMovedThisRound >= 0.01 * G.numNodes() && currentRound < 16; currentRound++)

print node partitioning information

Hi, I am wondering if mt-kahypar can print out information of each node belongs to which partition into a file?
If yes, where should I add codes or use which execution command parameters. Actually I try to use --verbose=true --enable-progress-bar=true --partition-output-folder=. All of them did not give me an output result file.

Implement better Work Stealing for Initial Partitioning

When doing recursion during initial partitioning we split the TBB Numa Arena with half the number of threads of the original arena and pass them into the recursion. However, this introduces somehow bad load balancing, in case one recursion terminates earlier than the other, because threads are not able to join the other recursion (number of threads are limited).

Finalize README

Missing sections:

What is Mt-KaHyPar: Description of Mt-KaHyPar with experimental results
System Requirements
Describe Library Interface

Your system has not enough cpus to execute MT-KaHyPar (> 1)

I tried to run MT-KaHyPar command. But I received the Error "Your system has not enough cpus to execute MT-KaHyPar (> 1)". Why? My system is WSL, which is a virtual Linux system on Windows 10 Platform.

Replace bit_magic.h with builtin functions

Add Id mapping class when copying or extracting blocks of the hypergraph

Test execution

When a test fails, its binary gets deleted, so I cannot run it, in for example a debugger.
Additionally we can't use --gtest_filter= to run some tests by themselves.

glibc++.so.6 version

Hi,

When I build and compile, I run mt-kahypar executable, I got this runtime error. I have no root access, how could I change its system link to my user path which I prepare a 30 higher version library.

Imbalance is wrong, if degree-zero hypernodes are removed

Simultaneous Execution and Thread Pinning

Our current thread pinning prevents simultaneous executions of mt-kahypar because we pin to the cpus 0, ... , p-1.

Use uint32_t node and hyperedge IDs in static hypergraph

make mtkahypar fatal error "utils/debug.hpp: no such file or directory"

Dear developers,

Under Ubuntu 23.04 and GCC 13.2 I am getting the following fatal error after running make mtkahypar:


In file included from /home/X/Documents/mt-kahypar/external_tools/growt/data-structures/table_config.hpp:6,

                 from /home/X/Documents/mt-kahypar/mt-kahypar/partition/mapping/target_graph.h:40,

                 from /home/X/Documents/mt-kahypar/lib/libmtkahypar.cpp:39:

`/home/X/Documents/mt-kahypar/external_tools/growt/data-structures/element_types/complex_slot.hpp:8:10: fatal error: utils/debug.hpp: No such file or directory

    8 | #include "utils/debug.hpp"
      |          ^~~~~~~~~~~~~~~~~

Any hint of what could have been happening?

Thank you in advance.

Implement Parallel Random Shuffling

How to build this if my TBB is installed in my user customized path?

I build TBB from source and install it under my user home directory, let's say /home/xxx/tbb/include and /home/xxx/tbb/lib64. And I also add this two path into my ~/.bashrc file append after variable PATH and LD_LIBRARY_PATH. Then I source this file also.

But when I build mt-kahypar with cmake .., it still shows cannot find TBB package. How should I solve it?

Btw, my gcc version is 12.0

IHypergraphSparsifier is missing a public virtual destructor

Hello!

We are writing a student project that uses the C interface and noticed that our application randomly freezes at global destruction. Throwing AddressSanitizer at the problem we got this report:

==11666==ERROR: AddressSanitizer: new-delete-type-mismatch on 0x619000006980 in thread T1:
  object passed to delete has wrong type:
  size of the allocated type:   1032 bytes;
  size of the deallocated type: 16 bytes.
    #0 0x7fa7770bc0a8 in operator delete(void*, unsigned long) (/lib64/libasan.so.8+0xbc0a8)
    #1 0x7fa7769f5f71 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop(tbb::internal::context_guard_helper<false>&, tbb::task*, long) ../../src/tbb/custom_scheduler.h:495
    #2 0x7fa7769f63b1 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) ../../src/tbb/custom_scheduler.h:636
    #3 0x7fa7769efbf6 in tbb::internal::arena::process(tbb::internal::generic_scheduler&) ../../src/tbb/arena.cpp:196
    #4 0x7fa7769ee31f in tbb::internal::market::process(rml::job&) ../../src/tbb/market.cpp:667
    #5 0x7fa7769ea26d in tbb::internal::rml::private_worker::run() ../../src/tbb/private_server.cpp:266
    #6 0x7fa7769ea4c8 in tbb::internal::rml::private_worker::thread_routine(void*) ../../src/tbb/private_server.cpp:219
    #7 0x7fa776a8e14c in start_thread (/lib64/libc.so.6+0x8b14c)
    #8 0x7fa776b0f9ff in clone3 (/lib64/libc.so.6+0x10c9ff)

0x619000006980 is located 0 bytes inside of 1032-byte region [0x619000006980,0x619000006d88)
allocated by thread T0 here:
    #0 0x7fa7770bb1a8 in operator new(unsigned long) (/lib64/libasan.so.8+0xbb1a8)
    #1 0x7fa7765e0ca0 in mt_kahypar::register_HypergraphUndefinedSparsifier::{lambda(mt_kahypar::Context const&)#1}::_FUN(mt_kahypar::Context const) (/home/sebastian/.local/opt/spack/opt/spack/linux-fedora37-haswell/gcc-12.2.1/mt-kahypar-master-bzm4vfxnyju3gqam4v3dqokbksb3ufav/lib64/libmtkahyparhgp.so+0x1e0ca0)

Thread T1 created by T0 here:
    #0 0x7fa77704b3e6 in __interceptor_pthread_create (/lib64/libasan.so.8+0x4b3e6)
    #1 0x7fa7769ea155 in rml::internal::thread_monitor::launch(void* (*)(void*), void*, unsigned long) ../../src/tbb/../rml/server/thread_monitor.h:218
    #2 0x7fa7769ea155 in tbb::internal::rml::private_worker::wake_or_launch() ../../src/tbb/private_server.cpp:297
    #3 0x7fa7769ea155 in tbb::internal::rml::private_server::wake_some(int) ../../src/tbb/private_server.cpp:395
    #4 0x60c000001e3f  (<unknown module>)

SUMMARY: AddressSanitizer: new-delete-type-mismatch (/lib64/libasan.so.8+0xbc0a8) in operator delete(void*, unsigned long)
==11666==HINT: if you don't care about these errors you may set ASAN_OPTIONS=new_delete_type_mismatch=0
==11666==ABORTING

I think that this stems from mt_kahypar::IHypergraphSparsifier not having a public virtual destructor.

Tests are not running in Travis CI

Problem Description:
Travis sets up a VM with 2 cores on an Cloud Infrastructure. The Cloud Infrastructure Provider deploy that VM on a machine with a restricted cpuset. Our application does not consider that cpuset and tries to pin threads on cpus that are disabled.

Steps to reproduce:
sudo cset shield -c 1-3 # Restricts cpuset to cpu 1 - 3

make concurrent_hypergraph_test
Running main() from /home/heuer/mt-kahypar/external_tools/googletest/googletest/src/gtest_main.cc
[==========] Running 25 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 25 tests from AConcurrentHypergraph
[ERROR] Failed to set thread affinity

sudo cset shield --reset # reset cpuset restriction

Move from Travis CI to Github Actions

Since Travis CI changed their pricing policy, we should move our CI environment to Github Action (unlimited builds for public repositories)

Upper bound of cut between two blocks

Hi,
Is there a data structure that records the total cuts between two blocks?
I want to avoid a very large cut between the two blocks after the partition. So I want to get the total cut value between each two blocks dynamically, and modify the gain function according to this cut.

In other words, I hope to optimize the cut objective while keeping the total cut between blocks within the upper bound. Is there any way to deal with this situation?

thanks!

Add more detailed functionality tests for greedy and LP initial partitioner

Community Coarsener sometimes starts with smallest community first

TBB seems to not execute jobs in FIFO-fashion. During community coarsening we want to process the communities in descending order of their size.

Best effort partitions?

Hey,
I am trying to use this as part of a larger application that is somewhat time constraint.
The partitioning is usually fast enough but sometimes stalls for hours. Is there a
way to set a timeout and just use the best partition found so far?
This can even happen on different runs when the input graph is the same,
is that normal behavior or am I doing anything wrong?
Thanks!

Issue a warning if wrong user config is auto-replaced

Improve Code Coverage to 85%

Once we make the repo public we should have a code coverage of 85%.

Union by Weight broken for multilevel coarsening

As discussed today: Using union by weight to identify representatives in multilevel coarsening only works as expected for the first level. As soon as the weights change, things might break.

hmod.precompute_attributes removes or does not remove singletons based off of the OS (inconsistent behaviour)

Depending on which OS is being used the following will print different results:

import hypernetx as hnx
import hypernetx.algorithms.hypergraph_modularity as hmod
formulaDict,stringTree = readFile("./CDL-FMINCE/toybox.dimacs") # code by me
H=formulaToHypergraph(formulaDict)
HDual = H.dual()
H=hmod.precompute_attributes(H)
HDual = hmod.precompute_attributes(HDual)

#this should diverge based off of OS

print(len(list(H.nodes())))
print(len(list(HDual.nodes())))

On the linux systems the results are 544 and 590, while on Windows I get 175 and 221.

I was using ANTLR4 with python integration to parse a dimacs (SAT) file and interpret it as a hypergraph. The resulting graph has 544 nodes and 590 edges before the attributes are computed.

Compared Systems

Ubuntu system:

Ubuntu 20.04.6 LTS
wsl 2
Python 3.8.10
pip installation manager

Debian

SMP Debian 5.10.127-1
Python 3.9.2
pip installation manager

Windows

Windows 10 Home
Python 3.11.3
Anaconda virtual environment (+VSCode)

Replace std::pair range types with custom range classes

Problems with TBB 2021 Versions

The releases of TBB from 2021 no longer contain some headers we use:

tbb_stddef.h which is read to determine the interface version with CMake. This information was moved to https://github.com/oneapi-src/oneTBB/blob/master/include/oneapi/tbb/version.h
task_scheduler_init.h became deprecated in favor of this (?) https://github.com/oneapi-src/oneTBB/blob/master/include/oneapi/tbb/global_control.h

For now the Ubuntu Repos all provide the 2020 Versions of TBB, but rolling release distros already use the 2021 version, which Ubuntu will eventually use, too.

Refactor Hypergraph

Since we are now able to perform n-Level as well as multilevel hypergraph partitioning, we should have one dynamic hypergraph and one static hypergraph.

Extend sanitze check of context for non-supported configurations

Remove different code paths for debug mode in community detection

Refactor community detection code such that use the same code path in debug and release mode

see e.g. line 181 in plm.h:
#ifndef NDEBUG std::for_each(nodes.begin(), nodes.end(), moveNode); #else tbb::parallel_for_each(nodes, moveNode); #endif

Remove non-evaluated command line option parameters

lgtm uses too old cmake version

Find a way to get lgtm to use cmake version >= 3.16

Language Editing of Command Line Options

Spawn for each KaHyPar call an own task in DirectInitialPartitioner

Spawn for each KaHyPar-Call an own task in DirectInitialPartitioner
Consider calling convertToKaHyParHypergraph only once

Maintenance of border vertices is buggy

Example:
State before move:
he = 1 pin_count_in_from_part = 2 pin_count_in_to_part = 2 edge_size = 4
u = 3 from = 1 to = 0
v = 4 from = 1 to = 0
he = 1 pin_count_in_from_part_after = 0 pin_count_in_to_part_after = 3 edge_size = 4
he = 1 pin_count_in_from_part_after = 1 pin_count_in_to_part_after = 4 edge_size = 4

In this example, vertices 3 and 4 move from block 1 to block 0 in parallel. Hyperedge 1 has 4 pins. After moving both vertices to block 0 he 1 is a non-cut hyperedge. However, a hyperedge becomes a non-cut hyperedge if
1.) pin_count_in_from_part_after == 0
2.) pin_count_in_to_part_after == edge_size
In the example above this condition is not triggered.

Get rid of tbb::blocked_range in parallel_for

Travis tests pass even if gtests fail

Support for Windows Line Endings

Since we now have a windows build, we need to support windows line endings when reading a (hyper)graph file.

Localized Label Propagation

Currently label propagation is executed not on each level of the n-level hierarchy. Whether LP is executed or not on the current level is determined by an execution policy. An other approach is to execute label propagation on each level by performing a strongly localized version of LP only on the uncontracted hypernodes. Since, this would not scale in case we only uncontract only one pair of hypernodes, this should be an feature of the batch uncontraction mode.

Heavy Assertions

Instead of enabling certain assertions for phases via CMake, we could use a single HEAVY_ASSERTION macro that looks up a variable similar to the way the KaHyPar logger does.
What do you think @kittobi1992 ?

Compile a static executable program

Hi,
I am trying to compile a static executable program that can run in different Linux environments,
However, I found it difficult to compile static programs because of the TBB library.
Is there a method for compiling static programs? Or is there any suggestion to run the program in different environments.

thanks!

kahypar / mt-kahypar Goto Github PK

mt-kahypar's Issues

Depending on which OS is being used the following will print different results:

Compared Systems

Recommend Projects

Recommend Topics

Recommend Org