Comments (23)
You're welcome. Please let me know if you have any problems with it. I'm not so confident with Python so I'm going by the docs on this one. It doesn't accept as many parameters as leidenalg.findclusters
but it seems to return partitions (setting the random seed
seems to work).
This toy example (maybe because the datasets is so small) doesn't seem to work unless you fine-tune the resolution. It would be good to know if it works on real data (I know they can't always be shared before publication) and how it compares to the Python leidenalg (and Louvain if it supports Bipartite graphs).
I'm okay to keep communications on GitHub where possible (my email is listed as maintainer in the DESCRIPTION if you prefer direct contact). Then other users can see what changes are in development and find these examples. I find the GitHub interface useful to code snippets and figures. It's also helpful to me to track issues in progress that take so long they'd probably get buried in my inbox. Between my current projects and having a young family, I don't have as much time for open-source projects as I'd like. Please bear in mind it may take time to update tests, examples, and get it on CRAN.
from leiden.
@PCK1992 Thanks for following up. That's great to hear. In that case, I'll go ahead with merging this into the next release once I have documentation and testing set up for it.
For now, I'll leave this issue open for others to see that Bipartite graphs will be supported in the development version.
from leiden.
This is expected behaviour. Note that you are not using modularity, but another quality function, namely the Constant Potts Model. Indeed, for this quality function, the singleton partition is optimal for a resolution parameter larger than or equal to 1. Personally, I am typically working with this quality function, which I find much more intuitive than modularity (which sometimes behaves quite strangely). In this quality function, clusters have an internal density higher than the resolution parameter, while the density between communities is lower than the resolution parameter. The "relevant" resolution values are typically in the order of the density of the network.
It is possible to use the bipartite version of modularity using a particular trick. This is explained in the Bipartite documentation.
from leiden.
This package calls the python version via reticulate
. In principle it should pass arguments to the python leidenalg
module and perform the same computations, supporting all of the same arguments. Provided you have the latest version of leidenalg
installed for python, it should also support this in R.
Have you tried running it on a bipartite graph? Would you be able to produce a minimal reproducible example? If I'm able to reproduce the graph you're having issues with, I can look into it.
from leiden.
Awesome! I don't know exactly how the passing to python works so I came up with a small example:
library(leiden)
library(igraph)
net<- networkdata::southern_women #might need to install networkdata library
is.bipartite(net)
communities <-leiden(net)
ID <- V(net)$name
results <- cbind(ID, communities)
Perhaps it would make sense to include an example like this in the documentation?
from leiden.
Have you tested this example? Please note that dependencies on non-CRAN packages are not permitted to be submitted to CRAN. When submitting to CRAN, it will need to automatically run tests, examples, and generate vignettes.
Data is passed to Python via internal functions. There should be no need to interact with it directly (as long as the leidenalg
module is installed).
from leiden.
My apologies, I should have been more clear what I intended with this example. Running the code technically works but the results show that some of the events and persons are in the same community which should not be the case (if I'm not completely mistaken) since they are in different modes.
print(results)
ID communities
[1,] "EVELYN" "1"
[2,] "LAURA" "1"
[3,] "THERESA" "1"
[4,] "BRENDA" "1"
[5,] "CHARLOTTE" "1"
[6,] "FRANCES" "1"
[7,] "ELEANOR" "1"
[8,] "PEARL" "3"
[9,] "RUTH" "3"
[10,] "VERNE" "3"
[11,] "MYRA" "2"
[12,] "KATHERINE" "2"
[13,] "SYLVIA" "2"
[14,] "NORA" "2"
[15,] "HELEN" "2"
[16,] "DOROTHY" "3"
[17,] "OLIVIA" "3"
[18,] "FLORA" "3"
[19,] "E1" "1"
[20,] "E2" "1"
[21,] "E3" "1"
[22,] "E4" "1"
[23,] "E5" "1"
[24,] "E6" "1"
[25,] "E7" "1"
[26,] "E8" "3"
[27,] "E9" "3"
[28,] "E10" "2"
[29,] "E11" "3"
[30,] "E12" "2"
[31,] "E13" "2"
[32,] "E14" "2"
What I meant with including an example like this for the documentation is a working example that shows how to use the package in a way that takes the bipartite graph structure into account. It looks like it's not enough to just have a bipartite igraph object to achieve this but it is not clear to me how to pass this information using the package.
from leiden.
I've looked into it. The "bipartite" graphs are supported in the latest version (0.7.0) of leidenalg in section 4.1.2, see the python module documentation for details. These are separate functions now currently supported by the R package. Bipartite methods were added in version 0.6.1 and are not supported by the R version currently.
library("leiden")
install.packages("drat"); drat::addRepo("schochastics"); install.packages("networkdata")
library("leiden"); library("networkdata"); library("igraph"); library("graphsim")
net <-networkdata::southern_women #might need to install networkdata library
is.bipartite(net)
plot_directed(net)
partition <- leiden(net, partition_type = 'RBConfigurationVertexPartition')
plot_directed(net, fill.node = c("palevioletred", "lightblue")[partition])
This does not account for bipartite structures which do not account for this:
Bipartite networks are special in the sense that they have only links between the two different classes, and no links within a class are allowed.
Correcting this will require changes to internal functions to call .CPMVertexPartition.Bipartite()
in python.
from leiden.
It should be possible to implement an R version of the Bipartite methods providing it can be done in Python. I notice you submitted an issue to the leidenalg repository. Was it possible to run in Python. If so it should be possible to run it here:
https://github.com/TomKellyGenetics/leiden/blob/18ec773593a0c65432205509399d83add0a81799/R/find_partition.R#L71#L71-L79
from leiden.
The development version of leiden
now supports Bipartite graphs in R. You can try it with:
devtools::install_github("TomKellyGenetics", ref = "dev")
This will still require testing before it can be added to documentation but here's a minimal example:
library("leiden"); library("networkdata"); library("igraph"); library("graphsim")
net <-networkdata::southern_women
partition <- leiden(net, partition_type = "CPMVertexPartition.Bipartite", resolution_parameter = 0.1)
plot_directed(net, fill.node = c("palevioletred", "lightblue")[partition])
Note here that the "resolution" parameter is important for testing the results.
partition <- leiden(net, partition_type = "CPMVertexPartition.Bipartite", resolution_parameter = 0.3)
plot_directed(net, fill.node = c("palevioletred", "lightblue", "forestgreen", "purple")[partition])
This is different from the bipartite "types":
plot_directed(net, fill.node = c("palevioletred", "lightblue")[as.integer(V(net)$type)+1])
If you wish to reproduce this, the plotting function I wrote is available on CRAN.
from leiden.
This is awesome, thank you so much! I will test this function on my data sets and will report back. Let me know if you would like to keep this discussion in the repository or if you would prefer to move it to email.
from leiden.
I tried it on the first data set (8k nodes and 27k edges, one connected component) and get the following error:
partition05 <- leiden(test2, partition_type = "CPMVertexPartition.Bipartite", resolution_parameter = 0.5)
Error in py_to_r(bipartite_layers[[1]]) :
could not find function "py_to_r"
I get the same error messages with different resolution parameters. I checked the graph test2
and is.bipartite
returns TRUE
. To specify the classes I used the igraph function V(g)$type <- bipartite_mapping(g)$type
. I checked whether the labels are correct and they are so that shouldn't be a problem.
My session info
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] leiden_0.3.4.9001 igraph_1.2.5 skimr_2.1.1 data.table_1.12.8
loaded via a namespace (and not attached):
[1] Rcpp_1.0.4.6 rstudioapi_0.11 knitr_1.28 magrittr_1.5 rappdirs_0.3.1 tidyselect_1.1.0 lattice_0.20-41 R6_2.4.1 rlang_0.4.6 dplyr_0.8.5 tools_3.6.3
[12] grid_3.6.3 xfun_0.14 htmltools_0.4.0 ellipsis_0.3.1 assertthat_0.2.1 digest_0.6.25 tibble_3.0.1 lifecycle_0.2.0 crayon_1.3.4 Matrix_1.2-18 purrr_0.3.4
[23] repr_1.1.0 base64enc_0.1-3 vctrs_0.3.0 glue_1.4.1 compiler_3.6.3 pillar_1.4.4 reticulate_1.16 jsonlite_1.6.1 pkgconfig_2.0.3
from leiden.
This requires additional functions from igraph
and reticulate
. I've added a patch to import these with library("leiden")
in the dev
branch (it should work if you re-install the development version and import it again). If this doesn't work , try library("reticulate")
before calling the leiden function.
from leiden.
Looks like it works now! My procedure: I reinstalled the development library loaded leiden and igraph seperately
It runs but I get the following error message:
> partition01 <- leiden(test2, partition_type = "CPMVertexPartition.Bipartite", resolution_parameter = 0.1)
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
restarting interrupted promise evaluation
In subsequent runs I do not get this error (even for the same line of code) so I wonder what this is.
I will report back once I get a chance to go through the results.
By the way, I tried installing the graphsim
package to see if the error is related to that but I got an error message
Installing package into ‘/home/patrick/R/x86_64-pc-linux-gnu-library/3.6’
(as ‘lib’ is unspecified)
* installing *source* package ‘graphsim’ ...
** using staged installation
** R
** data
*** moving datasets to lazyload DB
Warning: file ‘TGFBeta_Smad_graph.rda’ has magic number 'X'
Use of save versions prior to 2 is deprecated
Error in load(zfile, envir = tmp_env) :
bad restore file magic number (file may be corrupted) -- no data loaded
ERROR: lazydata failed for package ‘graphsim’
* removing ‘/home/patrick/R/x86_64-pc-linux-gnu-library/3.6/graphsim’
Error: Failed to install 'graphsim' from GitHub:
(converted from warning) installation of package ‘/tmp/RtmpoKn8Z6/file4ad33964a7ed/graphsim_0.1.2.9001.tar.gz’ had non-zero exit status
I can open a separate issue on the other repository if you think it's worth it.
from leiden.
I'm not sure what is causing the issue you've run into calling leiden.
Don't worry about the graphsim
package. I'm only using it here for the plotting functions. The install error is related to recent changes I made on issues I'm already working on. Please use the CRAN version for this package. Thanks for letting me know!
from leiden.
After some tests with different networks, I could not find any bugs or problems - it works great! But you are right, it is extremely sensitive to the resolution parameter and I had to do quite a bit of testing.
from leiden.
After some thorough testing, I discovered a behavior that is quite interesting but I'm not sure if this is related to the R version or the implementation of Leiden in Python.
Bottom line is: It looks like the bipartite version only accepts values between 0 and 1, unlike the unipartite version that works fine with values over 1. What I discovered is that if you use a resolution parameter of 1
the number of nodes becomes equal to the number of communities.
list_parameters = seq(0, 1, by = 0.0025)
results = list()
n=1
for (parameter in list_parameters) {
results_cd <- leiden(net, resolution_parameter = parameter)
num_communities <- length(unique(results_cd))
results[[n]] = num_communities
n = n+1
}
data_combined <- data.frame(matrix(unlist(results), nrow=length(results), byrow=T),stringsAsFactors=FALSE)
cd_plot <- as.data.frame(cbind(list_parameters, data_combined))
names(cd_plot)[1] <- "parameter"
names(cd_plot)[2] <- "num_communities"
ggplot(cd_plot, aes(x = parameter, y = num_communities)) +
geom_line( color="blue", size=2, alpha=0.9, linetype=1) +
ggtitle("Relationship between Number of Communities and Resolution Parameter")
Unfortunately, I can't share my data but I could reproduce this behavior with different networks. Is this a package issue on the R side? I assume this is intended behavior on the Python side.
from leiden.
This is quite different to the behaviour of the regular (non-Bipartite) version of the Leiden algorithm. Thanks for pointing it out. I was really only changing the resolution parameter because the results didn't make much sense (and I thought it was only for this toy example data).
I believe the development version of leiden
in R is correctly passing arguments to the Python leidenalg
library. Perhaps @vtraag could help with this?
from leiden.
Sorry I may have misunderstood the docs for the Python version? It is possible to use the Bipartite method for other quality functions (e.g., ModularityVertexPartition) not just CPMVertexPartition? I was confused because it is a subheading here:
https://leidenalg.readthedocs.io/en/stable/reference.html#leidenalg.CPMVertexPartition.Bipartite
In that case, it should be possible to do this for other methods:
4d24a42
from leiden.
It is possible to use the Bipartite method for other quality functions (e.g., ModularityVertexPartition) not just CPMVertexPartition?
No, it is not. The support for bipartite clustering is only implemented for CPMVertexPartition
. However, there is a small trick that allows you to mimic modularity when using CPM. This is explained in the documentation, see specifically the notes, and the argument degree_as_node_size
(make sure to correct the resolution parameter still). In principle it could also be added to some other vertex partitions (but not Surprise or Significance), but I have not done that (yet, anyway).
from leiden.
Thanks, that explains the trouble I had calling it on other quality functions. I've added that argument to the R version and managed to pass the "mimicking" Modularity parameters in the R version.
partition_Modularity <- leiden(net, partition_type = "CPMVertexPartition.Bipartite", degree_as_node_size = TRUE, resolution_parameter = 0.005)
plot_directed(net, fill.node = RColorBrewer::brewer.pal(8,"Set1")[as.integer(partition_Modularity)])
As mentioned, the resolution
parameter is very important here.
partition_Modularity <- leiden(net, partition_type = "CPMVertexPartition.Bipartite", degree_as_node_size = TRUE, resolution_parameter = 0.015)
plot_directed(net, fill.node = RColorBrewer::brewer.pal(8,"Set1")[as.integer(partition_Modularity)])
If support for Bipartite graphs is added to other functions, it shouldn't be too much trouble to add them to the R version.
from leiden.
I've prepared unit tests and a brief vignette on bipartite graphs in R. I will upload this as a minor release (0.3.4) provided checks on Travis CI pass. I will do this before migrating the development version to call igraph::community_leiden
(as discussed in #1) since further released will need to wait until this dependency is on CRAN.
from leiden.
The leiden R package version 0.3.4 passes all checks and is submitted to CRAN. There is a dedicate vignette for Bipartite graphs here.
from leiden.
Related Issues (20)
- "pandas" error while installing leiden (found what is wrong) HOT 5
- Limitation on the number of cells? HOT 13
- Error installing package HOT 13
- extracting networks from R HOT 2
- Reproducibility HOT 2
- Error updating package HOT 6
- Error while installing Leiden HOT 1
- Error in installing ‘leiden’ package HOT 2
- faulty r-reticulate conda environment HOT 10
- Multi_core/thread support HOT 2
- Error in r_to_py_impl(converted, convert = convert)
- install error, invalid pointer HOT 2
- RGL issue HOT 2
- install dependencies error
- Package installation segfault error on CentOS HOT 7
- Support attribute "consider_empty_community" of leidenalg.Optimiser HOT 1
- Installation errors HOT 2
- resolution parameter and number of iteration in leiden R package HOT 1
- Query about "NAs introduced by coercion to integer range" warning
- Leiden consistently unable to correctly identify installed dependencies
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from leiden.