Git Product home page Git Product logo

graph-stats-book's Introduction

Network Machine Learning in Python

This book provides an introduction to graph statistics, with a focus on useful representations of graphs and their applications on real data.

Google Drive Brainstorm Book Proposal Compiled Jupyter Book

Usage

Building the book

If you'd like to develop on and build the Network Machine Learning in Python book, you should:

  • Clone this repository and run
  • Run pip install -r requirements.txt (it is recommended you do this within a virtual environment)
  • (Recommended) Remove the existing network_machine_learning_in_python/_build/ directory
  • Run jupyter-book build network_machine_learning_in_python/

A fully-rendered HTML version of the book will be built in network_machine_learning_in_python/_build/html/index.html.

Hosting the book

The html version of the book is hosted on the gh-pages branch of this repo. A GitHub actions workflow has been created that automatically builds and pushes the book to this branch on a push or pull request to main.

If you wish to disable this automation, you may remove the GitHub actions workflow and build the book manually by:

  • Navigating to your local build; and running,
  • ghp-import -n -p -f network_machine_learning_in_python/_build/html

This will automatically push your build to the gh-pages branch. More information on this hosting process can be found here.

Contributors

We welcome and recognize all contributions. You can see a list of current contributors in the contributors tab.

Credits

This project is created using the excellent open source Jupyter Book project and the executablebooks/cookiecutter-jupyter-book template.

Color Schemes

Code

Functions specific to this book - e.g., plotting functions we use regularly - has been stored in the subpackage below. https://github.com/neurodata/graphbook-code

graph-stats-book's People

Contributors

asaadeldin11 avatar bdpedigo avatar jasonkyuyim avatar loftusa avatar pssf23 avatar sampan501 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

graph-stats-book's Issues

Delete all 4 nested and higher indexes

Any index that is X.Y.Z.U or X.Y.Z.U.V needs to become X.Y.A, using floatingboxes with an appropriate title, from:
Remark: for theory results,
Case Study: for real data examples that can be cited as real data examples,
Example: for simulation descriptions, or "external work/homework" problems,
Concept: for ideas that will be cited later on that are mathematical results or equations (typically).

Dependency Installation Error

When I ran the pip install -r requirements.txt command, the following error was generated when attempting to collect the distributions of scipy and matplotlib.

ERROR: Could not find a version that satisfies the requirement scipy==1.6.3 (from versions: 0.8.0, 0.9.0, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.12.1, 0.13.0, 0.13.1, 0.13.2, 0.13.3, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.16.0, 0.16.1, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 1.0.0b1, 1.0.0rc1, 1.0.0rc2, 1.0.0, 1.0.1, 1.1.0rc1, 1.1.0, 1.2.0rc1, 1.2.0rc2, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.3.0rc1, 1.3.0rc2, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.4.0rc1, 1.4.0rc2, 1.4.0, 1.4.1, 1.5.0rc1, 1.5.0rc2, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.5.4)
ERROR: No matching distribution found for scipy==1.6.3
ERROR: Could not find a version that satisfies the requirement matplotlib>=3.4.1 (from versions: 0.86, 0.86.1, 0.86.2, 0.91.0, 0.91.1, 1.0.1, 1.1.0, 1.1.1, 1.2.0, 1.2.1, 1.3.0, 1.3.1, 1.4.0, 1.4.1rc1, 1.4.1, 1.4.2, 1.4.3, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 2.0.0b1, 2.0.0b2, 2.0.0b3, 2.0.0b4, 2.0.0rc1, 2.0.0rc2, 2.0.0, 2.0.1, 2.0.2, 2.1.0rc1, 2.1.0, 2.1.1, 2.1.2, 2.2.0rc1, 2.2.0, 2.2.2, 2.2.3, 2.2.4, 2.2.5, 3.0.0rc2, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0rc1, 3.1.0rc2, 3.1.0, 3.1.1, 3.1.2, 3.1.3, 3.2.0rc1, 3.2.0rc3, 3.2.0, 3.2.1, 3.2.2, 3.3.0rc1, 3.3.0, 3.3.1, 3.3.2, 3.3.3, 3.3.4)
ERROR: No matching distribution found for matplotlib>=3.4.1

System and environment specs:

MacOS 10.14.6
Python 3.6.13

Every other dependency got collected properly. I removed the scipy and matplotlib entries from the requirements.txt and installed them separately (scipy 1.5.4 and matplotlib 3.3.4).

Finish off book goals

For review with jovo

  • integrate Jesus Ch 7.3
  • finish random walk stuff
  • write Chapter 3
  • finish gnn stuff
  • revise sambit section
  • revise ali section
  • review and remove comments with alex
  • VNSGM

Questions for jovo

  • figure out what to do with references and add throughout?
  • figure out what to do about cross-chapter linking?
  • do we want equation numbering? if so, how?
  • do i resend email of completed draft to O'Reilly? If so, when?
  • how do we get collaborators more involved?

Stuff to do

  • Why embed networks? Embedding is just a procedure. explain as much.

important stuff to add to the latent position section

  • subsection that shows what happens when you average a bunch of adjacency matrices together that are pulled from the same RDPG (you approach P)
  • subsection that shows more clearly that the latent position matrix lets you estimate P (and why)

Figure out why heatmaps are sometimes different sizes

e.g., here (in multigraph representation learning), the code that generated the left graph and the right two graphs should be essentially the same, just with a different colormap, but for some reason the axes are different sizes.

Screen Shot 2021-05-08 at 3 12 31 PM

Reflect specific lauren changes to chapter 3 flow

update figures

delete section 1 and move adj mtx stuff to section 2
move the rest to regularization of the edges
separate regularization into regularization of nodes and edges (separate)
put "intro to causal inference" in the main content, and put "bag of features" limitations there
put "bag of nodes and networks" in the "next" section or chapter 5

Separate Ch3

Adjacency matrix stuff to Section 2, Laplacian stuff to "Regularization of Edges", Degree matrix stuff to Section 2, delete section 1, split regularization of nodes and regularization of edges

Candidate titles

  • Hands-on Network Statistics with Graspologic
  • Applied Statistical Connectomics

list of data sources

for if we want to use real data. I'll edit this periodically.

Enron graph, C.elegans graphs, Blog network graphs: http://www.cis.jhu.edu/~parky/vn/

Karate club, les mis, word adjacencies, football, dolphin social network, political blog, books about politics, c. elegans, power grid, coathorships in network science, and more: http://www-personal.umich.edu/~mejn/netdata/

emails, jazz musicians, privacy network: https://deim.urv.cat/~alexandre.arenas/data/welcome.htm

stanford collection (SNAP) with a ton of large networks:
http://snap.stanford.edu/data/

KONECT project: 1326 networks total, largest collection I've found
http://konect.cc/networks/

graph classification data sets:
https://github.com/nd7141/graph_datasets

torch geometric datasets:
https://pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html

netzschleuder catalogue:
https://networks.skewed.de/

add citations

add citations to all sections referencing and giving credit to the articles from which the methodology originated.
thank you :)

Change SIEM notation

I'd like the cluster assignment matrix to have a unique letter, but it feels like we've used most of the good letters in some capacity. Make something stnadardized btwn SIEM and the estimation/testing section for SIEMs

finish full read through

@ebridge2 adding this as 'medium priority'. I am interpreting that to mean "if there are any high-priority tasks, do those prior to continuing my readthrough, else do 1 hr read through 1 hr medium-priority task per day"

fix all spelling mistakes / sentence structure stuff / clean up code / remove x.y.z.zz stuff as I go

  • ch 1
  • ch 2
  • ch 3
  • ch 4
  • ch 5
  • ch 6
  • ch 7
  • ch 8
  • ch 9
  • appendix

make subpackage for common functions

should be importable from within the book.
will largely include plotting functions and network generation stuff. Anything that we'll need to do repetitively.

homogenize notation

particularly indexing notation, I think there are a bunch of places where I say "the $i_{th}$ thing", and that should actually be "the $i^{th}$ thing".

vertex nomination single-graph quick summary/notes

i) start with an n by n graph G and a vertex of interest v*;.
ii) pick your favorite embedding method to get n points in R^{d} (each d-dimensional vector is to be interpreted as a representation of a vertex in G);
iii) then, depending on the embedding method you use, pick a distance / dissimilarity on R^{d} (for instance, if i suspect my graph is generated from an SBM and i embed the graph using ASE / LSE i may choose either Euclidean distance for simplicity or Mahalanobis distance to exploit the assumed structure of the point in R^{d});
iv) calculate the distances between the vector corresponding to v* and vectors corresponding to the other n - 1 vertices;
v) return the closest k vertices to v* says your selected distance;

make sure the version of graphbook_code in the docker is the most up-to-date

I just started a new docker run and it was using an old version of graphbook_code that was missing some imports. Checked that my docker pull neurodata/graphstatsbook:0.0.1 was up-to-date (it was). Possibly I was just running the wrong image (e.g., neurodata/graphstatsbook:latest). Need to double-check.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.