Deion Use existing code to build up a link prediction demo f

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Build demo link prediction code from existing about stellargraph HOT 4 CLOSED

stellargraph commented on May 21, 2024

Build demo link prediction code from existing

from stellargraph.

Comments (4)

arc0 commented on May 21, 2024

@youph - can you please review?

from stellargraph.

arc0 commented on May 21, 2024

@arc0 - just testing

from stellargraph.

youph commented on May 21, 2024

@PantelisElinas

Comments

The code runs and yields good results. Sufficient description and comments, the logic is fairly easy to follow and appears correct.
Below are some things I think are worth considering.

Possible issues/Suggested fixes

main.py, line 95: G_edges() should be changed to g.edges()
edge_splitter.py, class EdgeSplitter: a brief description of the class's functionality, as well as its input and outputs is needed
edge_splitter.py, line 25: the comment "the train data with edges removed" seems misleading; I'd suggest replacing it with "a placeholder for the graph that will remain after removing edges from the original graph
edge_splitter.py, line 61: the copy statement seems redundant, since it's already been done in init
edge_splitter.py, lines 43-44: shouldn't we first sample negative edges, and only then sample and remove positive edges? Otherwise, some of the negative edges sampled from the leftover graph might in fact be the positive edges we just removed.
edge_splitter.py, line 91: this in fact negates the previous comment... But still, it seems cleaner to me (and removing the need for self.g_train) to sample negative links from self.g first, and then sample and remove positive links from the same self.g; then all you have to do is return the sampled links and the leftover self.g instead of self.g_train (which is no longer needed).
edge_splitter.py, line 91: note that even for p=0.5, the number of negative edges to sample is NOT equal to the number of positive edges sampled, due to positive edges being sampled from non-min-spanning tree edges. This means that the binary training set of pos/neg edges is unbalanced even for p=0.5. Might be no problem, but worth recognising.
main.py, line 145: the message is misleading: it's unclear which operator this applies to, and this is not a train score, it is a test score for the classifier trained on a training subset of the training set of edges, evaluated on a test subset of the training set of edges :) Never mind, it's too complex to print it like that, and I got the meaning of it anyway. Perhaps best to put a comment in the code on what this score means?

from stellargraph.

PantelisElinas commented on May 21, 2024

I have addressed some of the comments.

Some of the others I will address as the code is further developed into something more general. For example, as it stands now, I want to keep the self.g and self.g_train copies of the original and training graph. It permits the EdgeSplitter class to operate on the original graph to perform additional splits without the user having to always pass the graph as a parameter to the train_test_split() method. This functionality might change in the future if deemed more useful.

I am aware that the number of negative and positive samples might be different if there aren't enough edges not on the minimum spanning tree to remove. This is only an issue for small, sparsely connected graphs when p is large, e.g., p = 0.5. This should not be an issue for large more connected graphs or if p is small.

Thanks for the review!

from stellargraph.

Build demo link prediction code from existing about stellargraph HOT 4 CLOSED

Comments (4)

Comments

Possible issues/Suggested fixes

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent