Git Product home page Git Product logo

hypergraph-message-passing's Introduction

Message Passing on Hypergraphs

Inference and theoretical study of higher-order community structures

License: MIT Made with Python Code style: black ARXIV: 2312.00708 Treedom


Real data panel

This repository contains the implementation of the methods presented in

   [1] Message-Passing on Hypergraphs: Detectability, Phase Transitions and Higher-Order Information
        Nicolò Ruggeri*, Alessandro Lonardi*, and Caterina De Bacco
        [ arXiv ] [ Publication ]

Below, we explain how to run the Message-Passing (MP), Expectation-Maximization (EM) and sampling procedures presented in the paper.
The code is made available for the public, if you make use of it please cite our work in the form of the reference [1] above.

Code installation

The code was developed utilizing Python 3.9, and can be downloaded and used locally as-is.
To install the necessary packages, run the following command

pip install -r requirements.txt

Message passing

It is possible to run message passing and infer the marginal probabilities of nodes to belong to certain communities, while conditioning on given community priors n and affinity matrix p. To do so, execute the command:

python main_message_passing.py --em_iter 1 --mp_iter 2000 --mp_thresh 1e-5
--mp_patience 50 --dropout 0.75 --seed 123 --save_dir ./mp_results
--hye_file ./data/synthetic/sample.txt --n ./data/synthetic/n_prior.txt
--p ./data/synthetic/p_phase_transition_1.txt --K 4

The results will be saved in the output folder, and contain the node marginal probabilities, messages from nodes to hyperedges, messages from hyperedges to nodes, and free energy. Also convergence diagnostics are included, namely the difference in estimated marginal between consecutive MP iterations.

The main arguments accepted are:

  • real_dataset or hye_file: the dataset to run inference on (see more on input data format below).
  • max_hye_size: the maximum hyperedge size accepted. All hyperedges with higher size are discarded during inference.
  • n, p: fixed parameters given to the model, they will not be updated. n is the prior distribution of the communities, p the affinity matrix, both in a format to be loaded via numpy.loadtxt.
  • K: number of communities in the model. If n and/or p are provided, they must respectively have shapes (K,) and (K, K).
  • mp_iter, mp_thresh, mp_patience: setting the maximum number of MP iterations. If for more than a consecutive number of iterations (specified by mp_patience) the change in marginal probabilities falls below the threshold, inference is stopped. Otherwise, inference stops when the maximum number of iterations mp_iter has been reached.
  • dropout: the percentage of randomly dropped messages in the MP updates.
  • save_dir: the folder where results are saved.
  • seed: random seed.

Input data format A hypergraph is saved as a list of hyperedges. Therefore, the script accepts hyperedge lists both in pickle and text format, as in the example files ./data/synthetic/sample.pkl and ./data/synthetic/sample.txt. Importantly, nodes are numbered from 0 to N-1, where N is the total number of nodes.

Expectation-Maximization inference

Running EM inference allows inferring all the parameters of the model directly from the data. Specifically, if K is the number of communities, one can infer the following values:

  • n, the vector of length K where the entries specify the frequencies of the inferred communities
  • p, the K x K symmetric affinity matrix

as well as the values inferred from message passing, see above. Convergence diagnostics for EM are included: the difference in n and p values between consecutive EM iterations.

Inference is run via the relative script main_message_passing.py. For example, to run inference on the High School dataset utilized in the paper, run

python main_message_passing.py --real_dataset high_school --seed 234 --max_hye_size 5
--em_iter 20 --em_thresh 1e-5 --mp_iter 2000 --mp_thresh 1e-5 --mp_patience 50 --K 10
--dropout 0.99 --save_dir ./high_school_results

Sampling synthetic data

In the paper, we also present a new sampling algorithm that generates hypergraphs from the probabilistic model. For example, samples similar to those in ./data/synthetic are obtained as:

python main_sampling.py --p ./data/synthetic/p_phase_transition_1.txt --max_hye_size 50  
--node_assignments ./data/synthetic/node_assignment_phase_transition.txt
--allow_repeated True --seed 345 --save_dir ./sampling_results

The main arguments are:

  • p the affinity matrix.
  • max_hye_size the maximum size of the sampled hyperedges.
  • node_assignments the array of node assignments to the communities. See the referenced file for an example structure.
  • allow_repeated whether to allow repeated hyperedges. If False, it allows for faster sampling and the approximation is negligible in sparse regimes.
  • seed random seed.
  • save_dir output path specifying the directory where results are saved.

hypergraph-message-passing's People

Contributors

nickruggeri avatar aleable avatar

Stargazers

Jeongwhan Choi avatar Adrián Arnaiz-Rodríguez avatar Kazuki Nakajima avatar Francesco Lotito avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.