Comments (4)
@youph - can you please review?
from stellargraph.
@arc0 - just testing
from stellargraph.
Comments
The code runs and yields good results. Sufficient description and comments, the logic is fairly easy to follow and appears correct.
Below are some things I think are worth considering.
Possible issues/Suggested fixes
- main.py, line 95: G_edges() should be changed to g.edges()
- edge_splitter.py, class EdgeSplitter: a brief description of the class's functionality, as well as its input and outputs is needed
- edge_splitter.py, line 25: the comment "the train data with edges removed" seems misleading; I'd suggest replacing it with "a placeholder for the graph that will remain after removing edges from the original graph
- edge_splitter.py, line 61: the copy statement seems redundant, since it's already been done in init
- edge_splitter.py, lines 43-44: shouldn't we first sample negative edges, and only then sample and remove positive edges? Otherwise, some of the negative edges sampled from the leftover graph might in fact be the positive edges we just removed.
- edge_splitter.py, line 91: this in fact negates the previous comment... But still, it seems cleaner to me (and removing the need for self.g_train) to sample negative links from self.g first, and then sample and remove positive links from the same self.g; then all you have to do is return the sampled links and the leftover self.g instead of self.g_train (which is no longer needed).
- edge_splitter.py, line 91: note that even for p=0.5, the number of negative edges to sample is NOT equal to the number of positive edges sampled, due to positive edges being sampled from non-min-spanning tree edges. This means that the binary training set of pos/neg edges is unbalanced even for p=0.5. Might be no problem, but worth recognising.
- main.py, line 145: the message is misleading: it's unclear which operator this applies to, and this is not a train score, it is a test score for the classifier trained on a training subset of the training set of edges, evaluated on a test subset of the training set of edges :) Never mind, it's too complex to print it like that, and I got the meaning of it anyway. Perhaps best to put a comment in the code on what this score means?
from stellargraph.
I have addressed some of the comments.
Some of the others I will address as the code is further developed into something more general. For example, as it stands now, I want to keep the self.g and self.g_train copies of the original and training graph. It permits the EdgeSplitter class to operate on the original graph to perform additional splits without the user having to always pass the graph as a parameter to the train_test_split() method. This functionality might change in the future if deemed more useful.
I am aware that the number of negative and positive samples might be different if there aren't enough edges not on the minimum spanning tree to remove. This is only an issue for small, sparsely connected graphs when p is large, e.g., p = 0.5. This should not be an issue for large more connected graphs or if p is small.
Thanks for the review!
P.
from stellargraph.
Related Issues (20)
- GRAPHSAGE "model.fit()" reaching exception in StellarGraph "Inductive node classification and representation learning using GraphSAGE" demo
- Add Reddit dataset into datasets api
- Trouble installing Stellargraph in Google Colab HOT 4
- Colab Python Version Issue : Stellar graph version not support to the current python version on colab HOT 14
- Reproducibility for UniformRandomMetaPathWalk
- What can I use as a node feature of an Heterogeneous Graph for hinSAGE? HOT 1
- ModuleNotFoundError: No module named 'stellargraph' HOT 3
- INVALID_ARGUMENT error in the Unsupervised GraphSAGE example HOT 1
- hinSAGE error using multiple node types: else {k: dim for k, _ in ([self.subtree_schema] + self.neigh_trees)[layer]} indexerror: list index out of range HOT 1
- How would I predict the embedding vector of a new, unseen node with hinSAGE?
- Can't run "Node representation learning with Deep Graph Infomax" demo HOT 1
- HinSAGE gives list index out of range as soon as I make the layer size a 2D array
- How can I make GraphSAGE produce the same node classification results for each run on the same dataset?
- GCN Semi-supervised classification for new nodes
- Columns and DataType Not Explicitly Set on line 35 of test_directed_node_generator.py
- Node Features Propagation and Prediction
- Constructing a StellarGraph directly from a NetworkX graph has been replaced by the `StellarGraph.from_networkx` function
- Demo doesn't work on Google Colab HOT 2
- Docs missing on Website
- Vulnerable versions of packages 'cryptograpy', 'aiohttp' are installed together with 'stellargraph'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stellargraph.