Git Product home page Git Product logo

nebula-livejournal's Introduction

nebula-livejournal

LiveJournal Dataset is a Social Network Dataset in one file with two columns(FromNodeId, ToNodeId).

$ head soc-LiveJournal1.txt
# Directed graph (each unordered pair of nodes is saved once): soc-LiveJournal1.txt
# Directed LiveJournal friednship social network
# Nodes: 4847571 Edges: 68993773
# FromNodeId	ToNodeId
0	1
0	2
0	3
0	4
0	5
0	6

It could be accessed in https://snap.stanford.edu/data/soc-LiveJournal1.html.

Dataset statistics
Nodes 4847571
Edges 68993773
Nodes in largest WCC 4843953 (0.999)
Edges in largest WCC 68983820 (1.000)
Nodes in largest SCC 3828682 (0.790)
Edges in largest SCC 65825429 (0.954)
Average clustering coefficient 0.2742
Number of triangles 285730264
Fraction of closed triangles 0.04266
Diameter (longest shortest path) 16
90-percentile effective diameter 6.5

Dataset Download and Preprocessing

Download

It is accesissiable from the official web page:

$ cd nebula-livejournal/data
$ wget https://snap.stanford.edu/data/soc-LiveJournal1.txt.gz

Comments in data file should be removed to make the data import tool happy.

Preprocessing

$ gzip -d soc-LiveJournal1.txt.gz
$ sed -i '1,4d' soc-LiveJournal1.txt

Import dataset to Nebula Graph

With Nebula Importer

Nebula-Importer is a Golang Headless import tool for Nebula Graph.

You may need to edit the config file under nebula-importer/importer.yaml on Nebula Graph's address and credential。

Then, Nebula-Importer could be called in Docker as follow:

$ cd nebula-livejournal

$ docker run --rm -ti \
    --network=nebula-net \
    -v nebula-importer/importer.yaml:/root/importer.yaml \
    -v data/:/root \
    vesoft/nebula-importer:v2 \
    --config /root/importer.yaml

Or if you have the binary nebula-importer locally:

$ cd data
$ <path_to_nebula-importer_binary> --config ../nebula-importer/importer.yaml

With Nebula Exchange

Nebula-Exchange is a Spark Application to enable batch and streaming data import from multiple data sources to Nebula Graph.

This is en example on providing a test envrioment of Nebula Exchange

  • Setup Spark for Exchange
docker run --name spark-master --network nebula-docker-compose_nebula-net \
    -h spark-master -e ENABLE_INIT_DAEMON=false -d \
    bde2020/spark-master:2.4.5-hadoop2.7

Download nebula-exchange package inside the spark container, please refer to version mapping table here for the version of exchange you would use.

docker exec -it spark-master bash
cd ~
wget https://repo1.maven.org/maven2/com/vesoft/nebula-exchange/2.6.0/nebula-exchange-2.6.0.jar
  • Prepare for Exchange Config File First we need to know GraphD and MetaD address, here as we bootstrap the nebula cluster with docker compose, we could check them as:
$ docker port nebula-docker-compose_metad0_1 | grep ^9559
9559/tcp -> 0.0.0.0:49189
$ docker port nebula-docker-compose_metad1_1 | grep ^9559
9559/tcp -> 0.0.0.0:49190
$ docker port nebula-docker-compose_metad2_1 | grep ^9559
9559/tcp -> 0.0.0.0:49188

And in this case the three meta are listening on port 49188, 49189 and 49190. Then we could have the Configration for Nebula Exchange following https://docs.nebula-graph.io/2.6.1/nebula-exchange/use-exchange/ex-ug-import-from-csv/ . Suppose we save the configuration file as exchange-config.conf

  • Run the exchange applicaiton
docker exec -it spark-master bash
cd /root/
/spark/bin/spark-submit --master local \
    --class com.vesoft.nebula.exchange.Exchange nebula-exchange-2.6.0.jar\
    -c exchange-config.conf

Run Algorithms with Nebula Graph

Nebula-Algorithm is a Spark/GraphX Application to run Graph Algorithms with data consumed from files or a Nebula Graph Cluster.

Supported Algorithms for now:

Name Use Case
PageRank page ranking, important node digging
Louvain community digging, hierarchical clustering
KCore community detection, financial risk control
LabelPropagation community detection, consultation propagation, advertising recommendation
ConnectedComponent community detection, isolated island detection
StronglyConnectedComponent community detection
ShortestPath path plan, network plan
TriangleCount network structure analysis
BetweennessCentrality important node digging, node influence calculation
DegreeStatic graph structure analysis

Ad-hoc Spark Env setup

Here I assume the Nebula Graph was bootstraped with Nebula-Up, thus nebula is running in a Docker Network named nebula-docker-compose_nebula-net.

Then let's start a single server spark:

docker run --name spark-master --network nebula-docker-compose_nebula-net \
    -h spark-master -e ENABLE_INIT_DAEMON=false -d \
    -v nebula-algorithm/:/root \
    bde2020/spark-master:2.4.5-hadoop2.7

Thus we could make spark application submt inside this container:

docker exec -it spark-master bash
cd /root/
# download Nebula-Algorithm Jar Packagem, 2.0.0 for example, for other versions, refer to nebula-algorithm github repo and documentations.
wget https://repo1.maven.org/maven2/com/vesoft/nebula-algorithm/2.0.0/nebula-algorithm-2.0.0.jar

Run Algorithms

There are many altorithms supported by Nebula-Algorithm, here some of their configuration files were put under nebula-algorithm as an example.

Before using them, please first edit and change Nebula Graph Cluster Addresses and credentials.

vim nebula-altorithm/algo-pagerank.conf

Then we could enter the spark container and call corresponding algorithms as follow.

Please adjust your --driver-memeory accordingly, i.e. pagerank altorithm:

/spark/bin/spark-submit --master "local" --conf spark.rpc.askTimeout=6000s \
    --class com.vesoft.nebula.algorithm.Main \
    --driver-memory 16g nebula-algorithm-2.0.0.jar \
    -p pagerank.conf

After the algorithm finished, the output will be under the path insdie the container defined in conf file:

    write:{
        resultPath:/output/
    }

nebula-livejournal's People

Contributors

wey-gu avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.