Git Product home page Git Product logo

Comments (4)

arc0 avatar arc0 commented on May 21, 2024

@youph - can you please review?

from stellargraph.

arc0 avatar arc0 commented on May 21, 2024

@arc0 - just testing

from stellargraph.

youph avatar youph commented on May 21, 2024

@PantelisElinas

Comments

The code runs and yields good results. Sufficient description and comments, the logic is fairly easy to follow and appears correct.
Below are some things I think are worth considering.

Possible issues/Suggested fixes

  • main.py, line 95: G_edges() should be changed to g.edges()
  • edge_splitter.py, class EdgeSplitter: a brief description of the class's functionality, as well as its input and outputs is needed
  • edge_splitter.py, line 25: the comment "the train data with edges removed" seems misleading; I'd suggest replacing it with "a placeholder for the graph that will remain after removing edges from the original graph
  • edge_splitter.py, line 61: the copy statement seems redundant, since it's already been done in init
  • edge_splitter.py, lines 43-44: shouldn't we first sample negative edges, and only then sample and remove positive edges? Otherwise, some of the negative edges sampled from the leftover graph might in fact be the positive edges we just removed.
  • edge_splitter.py, line 91: this in fact negates the previous comment... But still, it seems cleaner to me (and removing the need for self.g_train) to sample negative links from self.g first, and then sample and remove positive links from the same self.g; then all you have to do is return the sampled links and the leftover self.g instead of self.g_train (which is no longer needed).
  • edge_splitter.py, line 91: note that even for p=0.5, the number of negative edges to sample is NOT equal to the number of positive edges sampled, due to positive edges being sampled from non-min-spanning tree edges. This means that the binary training set of pos/neg edges is unbalanced even for p=0.5. Might be no problem, but worth recognising.
  • main.py, line 145: the message is misleading: it's unclear which operator this applies to, and this is not a train score, it is a test score for the classifier trained on a training subset of the training set of edges, evaluated on a test subset of the training set of edges :) Never mind, it's too complex to print it like that, and I got the meaning of it anyway. Perhaps best to put a comment in the code on what this score means?

from stellargraph.

PantelisElinas avatar PantelisElinas commented on May 21, 2024

I have addressed some of the comments.

Some of the others I will address as the code is further developed into something more general. For example, as it stands now, I want to keep the self.g and self.g_train copies of the original and training graph. It permits the EdgeSplitter class to operate on the original graph to perform additional splits without the user having to always pass the graph as a parameter to the train_test_split() method. This functionality might change in the future if deemed more useful.

I am aware that the number of negative and positive samples might be different if there aren't enough edges not on the minimum spanning tree to remove. This is only an issue for small, sparsely connected graphs when p is large, e.g., p = 0.5. This should not be an issue for large more connected graphs or if p is small.

Thanks for the review!

P.

from stellargraph.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.