Git Product home page Git Product logo

Comments (23)

TomKellyGenetics avatar TomKellyGenetics commented on July 18, 2024 1

You're welcome. Please let me know if you have any problems with it. I'm not so confident with Python so I'm going by the docs on this one. It doesn't accept as many parameters as leidenalg.findclusters but it seems to return partitions (setting the random seed seems to work).

This toy example (maybe because the datasets is so small) doesn't seem to work unless you fine-tune the resolution. It would be good to know if it works on real data (I know they can't always be shared before publication) and how it compares to the Python leidenalg (and Louvain if it supports Bipartite graphs).

I'm okay to keep communications on GitHub where possible (my email is listed as maintainer in the DESCRIPTION if you prefer direct contact). Then other users can see what changes are in development and find these examples. I find the GitHub interface useful to code snippets and figures. It's also helpful to me to track issues in progress that take so long they'd probably get buried in my inbox. Between my current projects and having a young family, I don't have as much time for open-source projects as I'd like. Please bear in mind it may take time to update tests, examples, and get it on CRAN.

from leiden.

TomKellyGenetics avatar TomKellyGenetics commented on July 18, 2024 1

@PCK1992 Thanks for following up. That's great to hear. In that case, I'll go ahead with merging this into the next release once I have documentation and testing set up for it.

For now, I'll leave this issue open for others to see that Bipartite graphs will be supported in the development version.

from leiden.

vtraag avatar vtraag commented on July 18, 2024 1

This is expected behaviour. Note that you are not using modularity, but another quality function, namely the Constant Potts Model. Indeed, for this quality function, the singleton partition is optimal for a resolution parameter larger than or equal to 1. Personally, I am typically working with this quality function, which I find much more intuitive than modularity (which sometimes behaves quite strangely). In this quality function, clusters have an internal density higher than the resolution parameter, while the density between communities is lower than the resolution parameter. The "relevant" resolution values are typically in the order of the density of the network.

It is possible to use the bipartite version of modularity using a particular trick. This is explained in the Bipartite documentation.

from leiden.

TomKellyGenetics avatar TomKellyGenetics commented on July 18, 2024

This package calls the python version via reticulate. In principle it should pass arguments to the python leidenalg module and perform the same computations, supporting all of the same arguments. Provided you have the latest version of leidenalg installed for python, it should also support this in R.

Have you tried running it on a bipartite graph? Would you be able to produce a minimal reproducible example? If I'm able to reproduce the graph you're having issues with, I can look into it.

from leiden.

PCK1992 avatar PCK1992 commented on July 18, 2024

Awesome! I don't know exactly how the passing to python works so I came up with a small example:

library(leiden)
library(igraph)


net<- networkdata::southern_women #might need to install networkdata library
is.bipartite(net) 

communities <-leiden(net)
ID <- V(net)$name

results <- cbind(ID, communities)

Perhaps it would make sense to include an example like this in the documentation?

from leiden.

TomKellyGenetics avatar TomKellyGenetics commented on July 18, 2024

Have you tested this example? Please note that dependencies on non-CRAN packages are not permitted to be submitted to CRAN. When submitting to CRAN, it will need to automatically run tests, examples, and generate vignettes.

Data is passed to Python via internal functions. There should be no need to interact with it directly (as long as the leidenalg module is installed).

from leiden.

PCK1992 avatar PCK1992 commented on July 18, 2024

My apologies, I should have been more clear what I intended with this example. Running the code technically works but the results show that some of the events and persons are in the same community which should not be the case (if I'm not completely mistaken) since they are in different modes.

print(results)

      ID          communities
 [1,] "EVELYN"    "1"        
 [2,] "LAURA"     "1"        
 [3,] "THERESA"   "1"        
 [4,] "BRENDA"    "1"        
 [5,] "CHARLOTTE" "1"        
 [6,] "FRANCES"   "1"        
 [7,] "ELEANOR"   "1"        
 [8,] "PEARL"     "3"        
 [9,] "RUTH"      "3"        
[10,] "VERNE"     "3"        
[11,] "MYRA"      "2"        
[12,] "KATHERINE" "2"        
[13,] "SYLVIA"    "2"        
[14,] "NORA"      "2"        
[15,] "HELEN"     "2"        
[16,] "DOROTHY"   "3"        
[17,] "OLIVIA"    "3"        
[18,] "FLORA"     "3"        
[19,] "E1"        "1"        
[20,] "E2"        "1"        
[21,] "E3"        "1"        
[22,] "E4"        "1"        
[23,] "E5"        "1"        
[24,] "E6"        "1"        
[25,] "E7"        "1"        
[26,] "E8"        "3"        
[27,] "E9"        "3"        
[28,] "E10"       "2"        
[29,] "E11"       "3"        
[30,] "E12"       "2"        
[31,] "E13"       "2"        
[32,] "E14"       "2"    

What I meant with including an example like this for the documentation is a working example that shows how to use the package in a way that takes the bipartite graph structure into account. It looks like it's not enough to just have a bipartite igraph object to achieve this but it is not clear to me how to pass this information using the package.

from leiden.

TomKellyGenetics avatar TomKellyGenetics commented on July 18, 2024

I've looked into it. The "bipartite" graphs are supported in the latest version (0.7.0) of leidenalg in section 4.1.2, see the python module documentation for details. These are separate functions now currently supported by the R package. Bipartite methods were added in version 0.6.1 and are not supported by the R version currently.

library("leiden")
install.packages("drat"); drat::addRepo("schochastics"); install.packages("networkdata")
library("leiden"); library("networkdata"); library("igraph"); library("graphsim")

net <-networkdata::southern_women #might need to install networkdata library
is.bipartite(net) 
plot_directed(net)
partition <- leiden(net, partition_type = 'RBConfigurationVertexPartition')
plot_directed(net, fill.node = c("palevioletred", "lightblue")[partition])

new_RBC

This does not account for bipartite structures which do not account for this:

Bipartite networks are special in the sense that they have only links between the two different classes, and no links within a class are allowed.

Correcting this will require changes to internal functions to call .CPMVertexPartition.Bipartite() in python.

from leiden.

TomKellyGenetics avatar TomKellyGenetics commented on July 18, 2024

It should be possible to implement an R version of the Bipartite methods providing it can be done in Python. I notice you submitted an issue to the leidenalg repository. Was it possible to run in Python. If so it should be possible to run it here:
https://github.com/TomKellyGenetics/leiden/blob/18ec773593a0c65432205509399d83add0a81799/R/find_partition.R#L71#L71-L79

from leiden.

TomKellyGenetics avatar TomKellyGenetics commented on July 18, 2024

The development version of leiden now supports Bipartite graphs in R. You can try it with:

devtools::install_github("TomKellyGenetics", ref = "dev")

This will still require testing before it can be added to documentation but here's a minimal example:

library("leiden"); library("networkdata"); library("igraph"); library("graphsim")
net <-networkdata::southern_women
partition <- leiden(net, partition_type = "CPMVertexPartition.Bipartite", resolution_parameter = 0.1)
plot_directed(net, fill.node = c("palevioletred", "lightblue")[partition])

CPM_Bipartite

Note here that the "resolution" parameter is important for testing the results.

partition <- leiden(net, partition_type = "CPMVertexPartition.Bipartite", resolution_parameter = 0.3)
plot_directed(net, fill.node = c("palevioletred", "lightblue", "forestgreen", "purple")[partition])

CPM_Bipartite_Res0 3

This is different from the bipartite "types":

plot_directed(net, fill.node = c("palevioletred", "lightblue")[as.integer(V(net)$type)+1])

Bipartite_types

If you wish to reproduce this, the plotting function I wrote is available on CRAN.

from leiden.

PCK1992 avatar PCK1992 commented on July 18, 2024

This is awesome, thank you so much! I will test this function on my data sets and will report back. Let me know if you would like to keep this discussion in the repository or if you would prefer to move it to email.

from leiden.

PCK1992 avatar PCK1992 commented on July 18, 2024

I tried it on the first data set (8k nodes and 27k edges, one connected component) and get the following error:

partition05 <- leiden(test2, partition_type = "CPMVertexPartition.Bipartite", resolution_parameter = 0.5)
Error in py_to_r(bipartite_layers[[1]]) : 
  could not find function "py_to_r"

I get the same error messages with different resolution parameters. I checked the graph test2 and is.bipartite returns TRUE. To specify the classes I used the igraph function V(g)$type <- bipartite_mapping(g)$type. I checked whether the labels are correct and they are so that shouldn't be a problem.

My session info

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] leiden_0.3.4.9001 igraph_1.2.5      skimr_2.1.1       data.table_1.12.8

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6     rstudioapi_0.11  knitr_1.28       magrittr_1.5     rappdirs_0.3.1   tidyselect_1.1.0 lattice_0.20-41  R6_2.4.1         rlang_0.4.6      dplyr_0.8.5      tools_3.6.3     
[12] grid_3.6.3       xfun_0.14        htmltools_0.4.0  ellipsis_0.3.1   assertthat_0.2.1 digest_0.6.25    tibble_3.0.1     lifecycle_0.2.0  crayon_1.3.4     Matrix_1.2-18    purrr_0.3.4     
[23] repr_1.1.0       base64enc_0.1-3  vctrs_0.3.0      glue_1.4.1       compiler_3.6.3   pillar_1.4.4     reticulate_1.16  jsonlite_1.6.1   pkgconfig_2.0.3 

from leiden.

TomKellyGenetics avatar TomKellyGenetics commented on July 18, 2024

This requires additional functions from igraph and reticulate. I've added a patch to import these with library("leiden") in the dev branch (it should work if you re-install the development version and import it again). If this doesn't work , try library("reticulate") before calling the leiden function.

from leiden.

PCK1992 avatar PCK1992 commented on July 18, 2024

Looks like it works now! My procedure: I reinstalled the development library loaded leiden and igraph seperately

It runs but I get the following error message:

> partition01 <- leiden(test2, partition_type = "CPMVertexPartition.Bipartite", resolution_parameter = 0.1)
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  restarting interrupted promise evaluation

In subsequent runs I do not get this error (even for the same line of code) so I wonder what this is.
I will report back once I get a chance to go through the results.

By the way, I tried installing the graphsim package to see if the error is related to that but I got an error message

Installing package into ‘/home/patrick/R/x86_64-pc-linux-gnu-library/3.6’
(as ‘lib’ is unspecified)
* installing *source* package ‘graphsim’ ...
** using staged installation
** R
** data
*** moving datasets to lazyload DB
Warning: file ‘TGFBeta_Smad_graph.rda’ has magic number 'X'
  Use of save versions prior to 2 is deprecated
Error in load(zfile, envir = tmp_env) : 
  bad restore file magic number (file may be corrupted) -- no data loaded
ERROR: lazydata failed for package ‘graphsim’
* removing ‘/home/patrick/R/x86_64-pc-linux-gnu-library/3.6/graphsim’
Error: Failed to install 'graphsim' from GitHub:
  (converted from warning) installation of package ‘/tmp/RtmpoKn8Z6/file4ad33964a7ed/graphsim_0.1.2.9001.tar.gz’ had non-zero exit status

I can open a separate issue on the other repository if you think it's worth it.

from leiden.

TomKellyGenetics avatar TomKellyGenetics commented on July 18, 2024

I'm not sure what is causing the issue you've run into calling leiden.

Don't worry about the graphsim package. I'm only using it here for the plotting functions. The install error is related to recent changes I made on issues I'm already working on. Please use the CRAN version for this package. Thanks for letting me know!

from leiden.

PCK1992 avatar PCK1992 commented on July 18, 2024

After some tests with different networks, I could not find any bugs or problems - it works great! But you are right, it is extremely sensitive to the resolution parameter and I had to do quite a bit of testing.

from leiden.

PCK1992 avatar PCK1992 commented on July 18, 2024

After some thorough testing, I discovered a behavior that is quite interesting but I'm not sure if this is related to the R version or the implementation of Leiden in Python.

Bottom line is: It looks like the bipartite version only accepts values between 0 and 1, unlike the unipartite version that works fine with values over 1. What I discovered is that if you use a resolution parameter of 1 the number of nodes becomes equal to the number of communities.

list_parameters = seq(0, 1, by = 0.0025)
results = list()
n=1

for (parameter in list_parameters) {
  results_cd <- leiden(net, resolution_parameter = parameter)
  num_communities <- length(unique(results_cd))
  results[[n]] = num_communities
  n = n+1
}

data_combined <- data.frame(matrix(unlist(results), nrow=length(results), byrow=T),stringsAsFactors=FALSE)

cd_plot <- as.data.frame(cbind(list_parameters, data_combined))

names(cd_plot)[1] <- "parameter"
names(cd_plot)[2] <- "num_communities"

ggplot(cd_plot, aes(x = parameter, y = num_communities)) +
  geom_line( color="blue", size=2, alpha=0.9, linetype=1) +
  ggtitle("Relationship between Number of Communities and Resolution Parameter")

image

Unfortunately, I can't share my data but I could reproduce this behavior with different networks. Is this a package issue on the R side? I assume this is intended behavior on the Python side.

from leiden.

TomKellyGenetics avatar TomKellyGenetics commented on July 18, 2024

This is quite different to the behaviour of the regular (non-Bipartite) version of the Leiden algorithm. Thanks for pointing it out. I was really only changing the resolution parameter because the results didn't make much sense (and I thought it was only for this toy example data).

I believe the development version of leiden in R is correctly passing arguments to the Python leidenalg library. Perhaps @vtraag could help with this?

from leiden.

TomKellyGenetics avatar TomKellyGenetics commented on July 18, 2024

Sorry I may have misunderstood the docs for the Python version? It is possible to use the Bipartite method for other quality functions (e.g., ModularityVertexPartition) not just CPMVertexPartition? I was confused because it is a subheading here:
https://leidenalg.readthedocs.io/en/stable/reference.html#leidenalg.CPMVertexPartition.Bipartite

In that case, it should be possible to do this for other methods:
4d24a42

from leiden.

vtraag avatar vtraag commented on July 18, 2024

It is possible to use the Bipartite method for other quality functions (e.g., ModularityVertexPartition) not just CPMVertexPartition?

No, it is not. The support for bipartite clustering is only implemented for CPMVertexPartition. However, there is a small trick that allows you to mimic modularity when using CPM. This is explained in the documentation, see specifically the notes, and the argument degree_as_node_size (make sure to correct the resolution parameter still). In principle it could also be added to some other vertex partitions (but not Surprise or Significance), but I have not done that (yet, anyway).

from leiden.

TomKellyGenetics avatar TomKellyGenetics commented on July 18, 2024

Thanks, that explains the trouble I had calling it on other quality functions. I've added that argument to the R version and managed to pass the "mimicking" Modularity parameters in the R version.

partition_Modularity <- leiden(net, partition_type = "CPMVertexPartition.Bipartite", degree_as_node_size = TRUE, resolution_parameter = 0.005)
plot_directed(net, fill.node = RColorBrewer::brewer.pal(8,"Set1")[as.integer(partition_Modularity)])

test1

As mentioned, the resolution parameter is very important here.

partition_Modularity <- leiden(net, partition_type = "CPMVertexPartition.Bipartite", degree_as_node_size = TRUE, resolution_parameter = 0.015)
plot_directed(net, fill.node = RColorBrewer::brewer.pal(8,"Set1")[as.integer(partition_Modularity)])

test2

If support for Bipartite graphs is added to other functions, it shouldn't be too much trouble to add them to the R version.

from leiden.

TomKellyGenetics avatar TomKellyGenetics commented on July 18, 2024

I've prepared unit tests and a brief vignette on bipartite graphs in R. I will upload this as a minor release (0.3.4) provided checks on Travis CI pass. I will do this before migrating the development version to call igraph::community_leiden (as discussed in #1) since further released will need to wait until this dependency is on CRAN.

from leiden.

TomKellyGenetics avatar TomKellyGenetics commented on July 18, 2024

The leiden R package version 0.3.4 passes all checks and is submitted to CRAN. There is a dedicate vignette for Bipartite graphs here.

from leiden.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.