Git Product home page Git Product logo

pygrank's People

Contributors

maniospas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pygrank's Issues

Torch GNN support

Current helper methods for GNNs are centered on tensorflow and keras. Create backend operations to abstract them so that they can also be implemented through torch. This needs to add a new test to make sure everything is working.

Related tests: tests.test_gnn.test_appnp

Automatic citation discovery

Since node ranking algorithms can comprise multiple components, some of which are implicitly determined (e.g. through default instantiation or specific argument parameters), create methods that can summarize used components and provide citations.

Usefulness: This can help streamline citation practices.

Related tests: None

Potential issue in the GNN demonstrator example with tensorflow backend

During the review process of the library's paper, a reviewer pointed out that the following error occurs in their local system with TensorFlow 3.9.2 and Python 3.10.6.

TypeError: Sequential.call() got multiple values for argument 'training'

This occurs when running the code of the APPNP example. The issue lies fully with the example and not with any additional library functionality - it will not motivate a hotfix.

Investigate whether this issue is unique to the version of TensorFlow or whether the latter has yet again updated something that will break the example in all future versions. At the very least, this error should not occur in github actions.

If this is not the case, investigate whether this issue is platform-dependent.

optimization_dict not improving performance

The optimization_dict argument to the ClosedFormGraphFilter class does not seem to produce as an improvement in runnng time. This could indicate either a bug or bottlenecks in other parts of the pipeline, e.g. in graph signal instantiation.

Version: run with version 2.3 adjusted to run experiments 50 times when measuring time

Demonstration:

>>> import pygrank as pg
>>> optimization_dict = dict()
>>> pg.benchmark_print(pg.benchmark({"HK": pg.HeatKernel(optimization_dict=optimization_dict)}, pg.load_datasets_all_communities(["bigraph"]), metric="time"))
               	 HK 
bigraph0       	 3.06
bigraph1       	 3.36
>>> pg.benchmark_print(pg.benchmark({"HK": pg.HeatKernel()}, pg.load_datasets_all_communities(["bigraph"]), metric="time"))
               	 HK 
bigraph0       	 2.98
bigraph1       	 2.96

Related tests: None

Implement krylov space analysis in tensorflow

Implement krylov space analysis for the tensorflow backend. This could require defining additional backend operations.

Related tests: tests.test_filter_optimization.test_lanczos_speedup

Citation discovery for postprocessors

Recommend how to cite graph filters and tuners through their cite() method.

Related tests: test_autorefs.test_autorefs, test_autorefs.test_postprocessor_citations

Autotune too slow for backends other than numpy

Investigate why filter tuning can be exceptionally slow for backends other than numpy.

  • Focus on matvec first.
  • Slowness persists when L1 is used as optimization measure instead of AUC for matvec, investigate potentially slow operations (e.g., indexing?) in that project.

Related tests: None (change backends in the current compare_filter_tuning.py in the playground)

Check FairWalk correctness

Fairwalk does not achieve the same level of fairness (as high a pRule) as other fairness-aware heuristics during tests.
This could arise from erroneous implementation. If the implementation is found to be correct, separate its tests from other heuristics to account for the lower expected improvement.

Related tests: tests.test_fairness.test_fair_heuristics

Module 'networkx' has no attribute 'to_scipy_sparse_matrix'

I encountered this issue while running benchmark.

~/.local/lib/python3.10/site-packages/pygrank/algorithms/autotune/tuning.py in rank(self, graph, personalization, *args, **kwargs)
...
---> M = G.to_scipy_sparse_array() if isinstance(G, fastgraph.Graph) else nx.to_scipy_sparse_matrix(G, weight=weight, dtype=float)
         renormalize = float(renormalize)
         left_reduction = reduction #(lambda x: backend.degrees(x)) if reduction == "sum" else reduction

AttributeError: module 'networkx' has no attribute 'to_scipy_sparse_matrix'

Performant use of sparse matrices?

I'm trying to use pygrank with larger graphs: 100k-1m nodes, hundreds of millions of edges. My graphs are in sparse matrix format. So far I've just converted to networkx and used those:

g = nx.from_scipy_sparse_array(A, create_using=nx.DiGraph)

signal_dict = {i: 1.0 for i in seeds}

signal = pg.to_signal(g, signal_dict)

# normalize signal
signal.np /= signal.np.sum()

result = algorithm(signal).np

Is there a more performant option available?

Numeric graph signal operations

Currently, there needs to be clear distinction between graph signal objects and extraction of their .np fields.
Reframe code so that, when signal operations are employed, their respective .np fields are used in their place.
This can help write comphrehensible high-level code.

Implementing PageRank as a GenericGraphFilter does not seem to work

In the following code, the two algorithm should be close to equivalent, yet there is a signficant Mabs error between them.

>>> import pygrank as pg
>>> graph = next(pg.load_datasets_graph(["graph9"]))
>>> ranks1 = pg.Normalize(pg.PageRank(0.85, tol=1.E-12, max_iters=1000)).rank(graph, {"A": 1})
>>> ranks2 = pg.Normalize(pg.GenericGraphFilter([0.85**i for i in range(20)], tol=1.E-12)).rank(graph, {"A": 1})
>>> print(pg.Mabs(ranks1)(ranks2))
0.025585056372903574

Related tests: tests.test_filters.test_custom_runs

Citation discovery for graph filters

Recommend how to cite graph filters and tuners through their cite() method.

Related tests: test_autorefs.test_autorefs, test_autorefs.test_filter_citations

Tune on non-seeds?

Is it possible to run the tuners with non-seed nodes? For example if I have a seed_set and a target_set can I run the tuner diffusions with the signal from the former but optimize for metrics defined with respect to the latter? In this case I have a desired ranking of the nodes in the target_set.

Seamless verbosity

Automatically display verbose progress (e.g. for dataset downloading) that disappears once tasks are complete (e.g. to resume normal benchmarking prints).

Possible sparse_dot_mkl integration?

I saw that you have your own sparse matrix library called matvec which parallelizes sparse-dense multiplication. There is an existing Python library called sparsedot which does the same but with scipy csr/csc matrices https://github.com/flatironinstitute/sparse_dot.

I benchmarked the two with a matrix of size

<6357204x6357204 sparse matrix of type '<class 'numpy.float32'>'
	with 3614017927 stored elements in Compressed Sparse Column format>

With the 32 core/64 thread server CPU I'm testing on the times for 10 matrix-vec multiplications on the right and left are:

matvec
right-vec  25.15
left-vec  19.47

sparse_dot csc
right-vec  40.17
left-vec  14.91

sparse_dot csr
right-vec  10.38
left-vec  28.53

The times look competitive. I'm not sure if matvec has some other advantages I'm not considering here, but that sparsedot works with the existing scipy types would be a huge benefit (for my usecase, at least). Sparsedot does require installing the mkl library and for giant matrices as above requires the environment variable MKL_INTERFACE_LAYER=ILP64.

Convergence management tracking

IImplement a high-level way of summarizing convergence analysis, for example to help measure running time and iterations when algorithms are wrapped by postprocessors (including iterative schemes).
For example, a list of all convergence manager run outcomes could be obtained. Perhaps this could be achieved with some combination of dependent algorithm discovery and keeping convergence manager history on restart.

Related tests: tests.test_filters.test_convergence_string_conversion

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.