uwnetlab / nate Goto Github PK

View Code? Open in Web Editor NEW

11.0 11.0 2.0 9.16 MB

Research at the intersection of natural language processing and social network analysis.

Home Page: http://networkslab.org/

License: MIT License

Python 100.00%

burst-analysis data-science natural-language-processing python social-network-analysis

nate's People

Contributors

Stargazers

Watchers

Forkers

shouwangbuqi vishalbelsare

nate's Issues

Sasha: Kitchen Sink Summary Function Wish List

NATE SUMMARY FUNCTIONS, REPORTS, AND VISUALIZATIONS:
The story of the kitchen sink

Network
- of Nodes
- of Edges
- of Isolates
- of Subcomponents
- of Subcomponents > 1
- Average size of subcomponents
- Average size of subcomponents that are NOT the giant component
- Directed, Undirected, or Mixed?
- Bipartite?
  -- If Directed: DAG?
  -- If Directed: Average Indegree, average Outdegree
  -- If Not Directed: Average degree
- Node Attributes
- Edge Attributes

** Social Network Visualizations:
-- Plot histogram of degree (indegree and outdegree if they exist)
-- Plot histogram of MGD
-- Plot histogram of ESP
-- Plot network
--- Edge thickness by weight
--- Edge thickness by similarity score

Network SimSummary

Average similarity
Average dissimilarity

** Network SimSummary Visualizations

Plot histogram of node-node similarity scores
Plot histograms of all similarity scores calculated

Semantic Summary Stuff

Number of Discourse Categories

** Semantic Visualizations

Alluvial Flow diagram

** Semantic Network

Plot Semantic Network
-- Colour by Discourse Categories(?)
--

Comb through `exports` for bugs - there are likely to be plenty

histograms of node-node similarity scores

should be a method in the socnet, simnet, and docnet objects as all three have similarity analyses.

Sasha also want to be able to produce a histogram of all similarity scores calculated. This could be an argument for the method.

Changes to SVO plots

Look into fixing layout (don't have to scroll left so much)
Remove color from nodes
Make nodes rectangular
Filter for lower case and non-ascii characters
Output to pause-able movie format
Interactive, slider-based browser format (D3, perhaps?)
HEATMAPS

.summary()

A method that returns basic summary information about a Nate network object.

To include in output:

number of nodes
number of edges
number of isolates
number of subcomponents
number of subcomponents > 1
average size of subcomponents
average size of subcomponents that are not the giant component
modularity score
number of communities
directed / undirected
n-mode
if directed: average in degree, out degree
if undirected: average degree
list of node attributes
list of edge attributes

Remove special columns from namedtuple in nate class

Currently, all of the 'columns' passed into a nate object are stored in self.data, which can be very memory-inefficient.

If we keep the special columns (text, time, ID) separate, and package all of the non-special columns into the self.data object, we'll get the best of both worlds (easy compartmentalization in namespaces, memory efficiency)

Not critical unless we encounter memory bottlenecks.

Change `head` and `head`-like methods in `Nate` classes to return random samples

Currently, they return ranges or individual records (based on an slice or integer subscript, respectively). It's more useful to return a random sample.

Add timestamps/datetime to svo.svo_to_df() method output

John said:

"It would be amazing to have, for example, a tidy dataframe with a datetime index for doing some simple time series stuff.
2:11 PM
Or the time stamps in a column that we can make a datetime object."

`export_df()` method of `bursts` object throws error for SVO data

degree histogram

Quick visualization of degree distribution

Pierson: Get better at coding

Ur bad at code get better

histogram of ESP

histogram of edgewise shared partners

Create degree rank plots, but for entire SVOs (edges = shared S or O)

pip install network_backbone (Malcom's fork) rather than use his module

See: https://github.com/malcolmvr/backbone_network

Implement checks for preprocessing (and run if missing) for each pipeline

Currently, it's necessary to run nate.preprocess to get the spaCy data necessary to instantiate a nate pipeline.

First, pipelines should elegantly check to see if the necessary preprocessing has been completed. This should be simple and is a logical endpoint.

For further user friendliness, though, it would be prudent to enable each of the pipeline-returning functions to also run preprocessing using defaults that will configure the preprocess function to meet their requirements.

Low priority.

uwnetlab / nate Goto Github PK

nate's People

Contributors

Stargazers

Watchers

Forkers

nate's Issues

of Nodes

of Edges

of Isolates

of Subcomponents

of Subcomponents > 1

** Social Network Visualizations: -- Plot histogram of degree (indegree and outdegree if they exist) -- Plot histogram of MGD -- Plot histogram of ESP -- Plot network --- Edge thickness by weight --- Edge thickness by similarity score

Recommend Projects

Recommend Topics

Recommend Org

** Social Network Visualizations:
-- Plot histogram of degree (indegree and outdegree if they exist)
-- Plot histogram of MGD
-- Plot histogram of ESP
-- Plot network
--- Edge thickness by weight
--- Edge thickness by similarity score