The graph-stats-book from neurodata

get figs 2, 3 from statistical connectomics into chapter 4

specifically the part where we talk about network statistics.
I think those figures are really good for explaining why network statistics can be bad.

complete the O'Reilly form to submit an inquiry re:book

this includes an outline.
please include link that that webpage here

Write section on causal inference

Delete all 4 nested and higher indexes

Any index that is X.Y.Z.U or X.Y.Z.U.V needs to become X.Y.A, using floatingboxes with an appropriate title, from:
Remark: for theory results,
Case Study: for real data examples that can be cited as real data examples,
Example: for simulation descriptions, or "external work/homework" problems,
Concept: for ideas that will be cited later on that are mathematical results or equations (typically).

Use graspologic master in book

and all additions on my branch that we're using should just be in the subpackage

Add community assignments with label argument

Optional labels argument, that if specified, produces community labels with a hypothetical title and adds a legend out in the right side

Provide a link on the README to the subpackage

As the title states.

Fix the (now broken) figure outputs for 4.9

Revert the numbers back to match up

make titles in plots always left-aligned

Dependency Installation Error

When I ran the pip install -r requirements.txt command, the following error was generated when attempting to collect the distributions of scipy and matplotlib.

ERROR: Could not find a version that satisfies the requirement scipy==1.6.3 (from versions: 0.8.0, 0.9.0, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.12.1, 0.13.0, 0.13.1, 0.13.2, 0.13.3, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.16.0, 0.16.1, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 1.0.0b1, 1.0.0rc1, 1.0.0rc2, 1.0.0, 1.0.1, 1.1.0rc1, 1.1.0, 1.2.0rc1, 1.2.0rc2, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.3.0rc1, 1.3.0rc2, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.4.0rc1, 1.4.0rc2, 1.4.0, 1.4.1, 1.5.0rc1, 1.5.0rc2, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.5.4)
ERROR: No matching distribution found for scipy==1.6.3

ERROR: Could not find a version that satisfies the requirement matplotlib>=3.4.1 (from versions: 0.86, 0.86.1, 0.86.2, 0.91.0, 0.91.1, 1.0.1, 1.1.0, 1.1.1, 1.2.0, 1.2.1, 1.3.0, 1.3.1, 1.4.0, 1.4.1rc1, 1.4.1, 1.4.2, 1.4.3, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 2.0.0b1, 2.0.0b2, 2.0.0b3, 2.0.0b4, 2.0.0rc1, 2.0.0rc2, 2.0.0, 2.0.1, 2.0.2, 2.1.0rc1, 2.1.0, 2.1.1, 2.1.2, 2.2.0rc1, 2.2.0, 2.2.2, 2.2.3, 2.2.4, 2.2.5, 3.0.0rc2, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0rc1, 3.1.0rc2, 3.1.0, 3.1.1, 3.1.2, 3.1.3, 3.2.0rc1, 3.2.0rc3, 3.2.0, 3.2.1, 3.2.2, 3.3.0rc1, 3.3.0, 3.3.1, 3.3.2, 3.3.3, 3.3.4)
ERROR: No matching distribution found for matplotlib>=3.4.1

System and environment specs:

MacOS 10.14.6
Python 3.6.13

Every other dependency got collected properly. I removed the scipy and matplotlib entries from the requirements.txt and installed them separately (scipy 1.5.4 and matplotlib 3.3.4).

make "terminology" section

Finish off book goals

For review with jovo

Questions for jovo

figure out what to do with references and add throughout?
figure out what to do about cross-chapter linking?
do we want equation numbering? if so, how?
do i resend email of completed draft to O'Reilly? If so, when?
how do we get collaborators more involved?

Stuff to do

Why embed networks? Embedding is just a procedure. explain as much.

choose details for ch. 2 whole connectome

important stuff to add to the latent position section

subsection that shows what happens when you average a bunch of adjacency matrices together that are pulled from the same RDPG (you approach P)
subsection that shows more clearly that the latent position matrix lets you estimate P (and why)

make latent position plots always show ticks

currently I don't have numbers showing value in the latent position pairplot thingy in grpahbook_code

Figure out why heatmaps are sometimes different sizes

e.g., here (in multigraph representation learning), the code that generated the left graph and the right two graphs should be essentially the same, just with a different colormap, but for some reason the axes are different sizes.

fix table of contents

we want sequential numbering for each chapter

Reflect specific lauren changes to chapter 3 flow

update figures

delete section 1 and move adj mtx stuff to section 2
move the rest to regularization of the edges
separate regularization into regularization of nodes and edges (separate)
put "intro to causal inference" in the main content, and put "bag of features" limitations there
put "bag of nodes and networks" in the "next" section or chapter 5

default grey coloring for adjacency matrix plots

and title should always be left-aligned.

@jovo mentioned this a few meetings ago. Maybe discuss more? I'm generally a big fan of black/white over grey

Export colormaps as an object from the graphbook code package

Colormaps will be palettes, not just strings of palette names

add signal subgraph estimation

I just realized this doesn't have a subsection anywhere, unless I'm missing something?

Change color schemes to new colors

See: bottom of readme.

Add function that pulls the appropriate colorbrewer pallate from seaborne

Replace n choose 2 notation

Black format the book

Separate Ch3

Adjacency matrix stuff to Section 2, Laplacian stuff to "Regularization of Edges", Degree matrix stuff to Section 2, delete section 1, split regularization of nodes and regularization of edges

harmonize beginning of single network models

I wrote a part in the beginning of that subsection. It's not harmonious with the rest of the subsection yet.

Figure out latent distribution testing profiling

Candidate titles

Hands-on Network Statistics with Graspologic
Applied Statistical Connectomics

list of data sources

for if we want to use real data. I'll edit this periodically.

Enron graph, C.elegans graphs, Blog network graphs: http://www.cis.jhu.edu/~parky/vn/

Karate club, les mis, word adjacencies, football, dolphin social network, political blog, books about politics, c. elegans, power grid, coathorships in network science, and more: http://www-personal.umich.edu/~mejn/netdata/

emails, jazz musicians, privacy network: https://deim.urv.cat/~alexandre.arenas/data/welcome.htm

stanford collection (SNAP) with a ton of large networks:
http://snap.stanford.edu/data/

KONECT project: 1326 networks total, largest collection I've found
http://konect.cc/networks/

graph classification data sets:
https://github.com/nd7141/graph_datasets

torch geometric datasets:
https://pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html

netzschleuder catalogue:
https://networks.skewed.de/

sentence structure global changes

convert all 'you's to 'we's

any more?

finish joint embedding section

Omni with covariates.
Issue: what should I show?

Switch to dots for adjacency matrices

like in here.

I played with this briefly, and something was off when I tried plotting on particular axes when I had more than one axis in a figure.

Reflect Lauren chapter 2 changes

from the pdfs for chapter 2 and 3

every adjacency matrix should say which side is source nodes and which is target nodes

According to jovo.

If this is gonna be in all heatmaps in the book, my biggest concern is that plots will get cluttered.
Need to figure out a clean way of doing this.

Release and link in the book a "COMPLETED and REPRODUCIBLE" set of jupyter notebooks which reproduce every figure in the book

Figure out way to delete annotations in hypothes.is if not the original author

subsections are getting too clogged up with random people's annotations

add citations

add citations to all sections referencing and giving credit to the articles from which the methodology originated.
thank you :)

Change SIEM notation

I'd like the cluster assignment matrix to have a unique letter, but it feels like we've used most of the good letters in some capacity. Make something stnadardized btwn SIEM and the estimation/testing section for SIEMs

finish full read through

@ebridge2 adding this as 'medium priority'. I am interpreting that to mean "if there are any high-priority tasks, do those prior to continuing my readthrough, else do 1 hr read through 1 hr medium-priority task per day"

fix all spelling mistakes / sentence structure stuff / clean up code / remove x.y.z.zz stuff as I go

rename "why use statistical models?"

ch5: it's not really what the chapter's about

I'd call it "Types of Statistical Models", or "Statistical Models for Representation"

write a proposed outline

make subpackage for common functions

should be importable from within the book.
will largely include plotting functions and network generation stuff. Anything that we'll need to do repetitively.

homogenize notation

particularly indexing notation, I think there are a bunch of places where I say "the $i_{th}$ thing", and that should actually be "the $i^{th}$ thing".

standardize color palette across the book

that way we don't have a bunch of different color schemes every time we want to do a plot

schedule weekly meeting

vertex nomination single-graph quick summary/notes

i) start with an n by n graph G and a vertex of interest v*;.
ii) pick your favorite embedding method to get n points in R^{d} (each d-dimensional vector is to be interpreted as a representation of a vertex in G);
iii) then, depending on the embedding method you use, pick a distance / dissimilarity on R^{d} (for instance, if i suspect my graph is generated from an SBM and i embed the graph using ASE / LSE i may choose either Euclidean distance for simplicity or Mahalanobis distance to exploit the assumed structure of the point in R^{d});
iv) calculate the distances between the vector corresponding to v* and vectors corresponding to the other n - 1 vertices;
v) return the closest k vertices to v* says your selected distance;

Edit the "reproducible figures" files to reflect all the code updates

provide link to google docs folder that jesse made

make sure the version of graphbook_code in the docker is the most up-to-date

I just started a new docker run and it was using an old version of graphbook_code that was missing some imports. Checked that my docker pull neurodata/graphstatsbook:0.0.1 was up-to-date (it was). Possibly I was just running the wrong image (e.g., neurodata/graphstatsbook:latest). Need to double-check.

add references to everything

We're really bad at having citations right now

add citations to everything

neurodata / graph-stats-book Goto Github PK

graph-stats-book's Introduction

Network Machine Learning in Python

Usage

Building the book

Hosting the book

Contributors

Credits

Color Schemes

Code

graph-stats-book's People

Contributors

Stargazers

Watchers

Forkers

graph-stats-book's Issues

For review with jovo

Questions for jovo

Stuff to do

Recommend Projects

Recommend Topics

Recommend Org