dream-lab / goffish_v3 Goto Github PK
View Code? Open in Web Editor NEWLatest version of GoFFish Distributed Graph Processing Platforms
Latest version of GoFFish Distributed Graph Processing Platforms
Support for both source and sink vertex IDs/pointers to be accessible from IEdge
, rather than just sink. This may help with samples like spanning tree.
Remote non-FullInfo readers as they are depreacted, and replaced with the FullInfo readers with full hadop pipeline..
Add Iterable<S> getRemoteSubgraphs()
to ISubgraph
to allow access to neighboring SG IDs directly rather than iterating through all remote vertices. e.g. helps in connected components.
Optionally, also allow access to all subgraph IDs in the metagraph.
Hama's bsp.sync actually transmits the messages. Having the memory before/after sync allows us to estimate the memory used on the output message buffer. Having the time before/after gives the communiction duration as well.
BFS implementation can use a faster queue than a priority queue
format: PERF.SG.SEND_MSG_COUNT : sgid, superstep_num, sg_msg_count,broadcast_msg_count,total : messages sent per superstep
*sendToAll increases broadcast_msg_count by 1. So use: total msg count = sg_msg_count + total_sg * broadcast_msg_count
Issue: Sum of all sent messages in ith superstep is not equal to sum of all received messages in (i+1)th superstep.
Typically, send_msg_count << recv_msg_count
Add helper methods for long getRemoteVertexCount
, long getLocalEdgeCount
, long getBoundaryEdgeCount
, long getBoundaryVertexCount
.
Useful for pre-allocating collections.
Add method for Iterable<IVertex> getBoundaryVertices()
and Iterable<IVertex> getBoundaryEdges()
to identify local vertices that have an outedge to a remote vertex, and outedges that have a remote vertex as sink.
For hama, readers should be in separate package, etc.
Current implementation scans thru all remote vertices to find neighboring subgraphs. Need to maintain an internal adjacency list of subgraph ID or add Iterator <S> getRemoteSubgraphs()
to ISubgraph
API to make this efficient.
Not clear of where V getLocalState()
is used in IRemoteVertex
. Remove to save pointer space?
Add support for Iterator<IVertex> getRemoteInEdges()
to ISubgraph
and Iterator<IVertex> getInEdges()
to IVertex
. This allows backwards traversals, e.g. in GoDB.
The subgraph object can only be accessed after the compute class is instantiated. So a call to getSubgraph() inside constructor of compute class throws a NullPointerException.
Relevant code snippets:
Example where subgraph object might be used inside a constructor (full code):
/**
* Input has <num_clusters>,<max_edge_cuts>,<max_iters>
*
* @param initMsg
*/
public KMeans(String initMsg) {
String[] inp = initMsg.split(",");
//SubgraphValue value = new SubgraphValue();
value.k = Integer.parseInt(inp[0]);
value.maxEdgeCrossing = Long.parseLong(inp[1]);
value.maxIterations = Integer.parseInt(inp[2]);
// NOTE: Subgraph state is not available in the constructor.
//getSubgraph().setSubgraphValue(value);
}
Framework code snippet which creates subgraphCompute Object and assigns subgraph object to it (full code):
/*
* Creating SubgraphCompute objects
*/
for (ISubgraph<S, V, E, I, J, K> subgraph : partition.getSubgraphs()) {
Class<? extends AbstractSubgraphComputation<S, V, E, M, I, J, K>> subgraphComputeClass;
subgraphComputeClass = (Class<? extends AbstractSubgraphComputation<S, V, E, M, I, J, K>>) conf
.getClass(GraphJob.SUBGRAPH_COMPUTE_CLASS_ATTR, null);
if (subgraphComputeClass == null)
throw new RuntimeException("Could not load subgraph compute class");
AbstractSubgraphComputation<S, V, E, M, I, J, K> abstractSubgraphComputeRunner;
if (initialValue != null) {
Object []params = {initialValue};
abstractSubgraphComputeRunner = ReflectionUtils.newInstance(subgraphComputeClass, params);
}
else
abstractSubgraphComputeRunner = ReflectionUtils.newInstance(subgraphComputeClass);
// FIXME: Subgraph value is not available to user in the subgraph-compute's constructor,
// since it is added only after the object is created (using setSubgraph).
SubgraphCompute<S, V, E, M, I, J, K> subgraphComputeRunner = new SubgraphCompute<S, V, E, M, I, J, K>();
subgraphComputeRunner.setAbstractSubgraphCompute(abstractSubgraphComputeRunner);
abstractSubgraphComputeRunner.setSubgraphPlatformCompute(subgraphComputeRunner);
subgraphComputeRunner.setSubgraph(subgraph);
subgraphComputeRunner.init(this);
subgraphs.add(subgraphComputeRunner);
}
The JSON input produced by HAMA fastgen is of the format
[srcid, 0 , [[sinkid1,edgevalue1],[sinkid2, edgevalue2]... ]]
**example ** [99,0,[[32,1995],[17,1809],[2,969],[50,1278],[25,321],[28,390]....]]
while the format required by LongTextJSONReader is of the form
[srcid,partitionid,srcvalue,[[sinkid1,edgeid1,edgevalue1],[sinkid2,edgeid2,edgevalue2]... ]]
what to do ?
Platform: Windows10
JAVA: Oracle 8.0
Moven: 3.6
I followed instructions from this https://github.com/dream-lab/goffish_v3/tree/master/giraph
I'm getting below errors?
[ERROR] Failed to execute goal org.sonatype.plugins:munge-maven-plugin:1.0:munge (munge) on project goffish-giraph: Execution munge of goal org.sonatype.plugins:munge-maven-plugin:1.0:munge failed: basedir D:\work\GoFFish\giraph\goffish-giraph\src\test\java does not exist -> [Help 1]
Add API support for Iterable<IEdge> getLocalOutEdges()
and Iterable<IEdge> getRemoteOutEdges()
to ISubgraph, similar to methods for the vertices.
goffish_v3/hama/v3.1/src/main/java/in/dream_lab/goffish/hama/GraphJobRunner.java
Have synchronized
one for the eventual
private void sendMessage(String peerName, Message<K, M> message)
rather than for each sendMessage
Many files have incorrect license terms. Code is licensed under Apache License, Version 2.0 and Copyright is held by DREAM:Lab, IISc. It is NOT "Licensed to the Apache Software Foundation (ASF)"
this.subgraphID = (K) new LongWritable(); // FIXME: This wont work for non-long subgraph IDs!
Support for pointer based access to sink vertex in IEdge
using a method IVertex getSinkVertex()
. This will avoid an additional lookup when traversing through local subgraph.
Adding support for lamba expression in some functions api(E.g. getSubgraph(), getVertices())......
mvn –Phadoop_yarn –Dhadoop.version=2.7.2 -DskipTests clean package -s settings.xml [INFO] Scanning for projects... [INFO] ------------------------------------------------------------------------ [INFO] Reactor Build Order: [INFO] [INFO] Apache Giraph Parent [INFO] Apache Giraph Core [INFO] Apache Giraph Blocks Framework [INFO] Apache Giraph Examples [INFO] Apache Giraph Accumulo I/O [INFO] Apache Giraph HCatalog I/O [INFO] Apache Giraph Gora I/O [INFO] Apache Giraph Distribution [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building Apache Giraph Parent 1.2.0 [INFO] ------------------------------------------------------------------------ [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Apache Giraph Parent ............................... FAILURE [ 0.003 s] [INFO] Apache Giraph Core ................................. SKIPPED [INFO] Apache Giraph Blocks Framework ..................... SKIPPED [INFO] Apache Giraph Examples ............................. SKIPPED [INFO] Apache Giraph Accumulo I/O ......................... SKIPPED [INFO] Apache Giraph HCatalog I/O ......................... SKIPPED [INFO] Apache Giraph Gora I/O ............................. SKIPPED [INFO] Apache Giraph Distribution ......................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 0.576 s [INFO] Finished at: 2018-04-13T16:41:32+05:30 [INFO] Final Memory: 9M/109M [INFO] ------------------------------------------------------------------------ [ERROR] Unknown lifecycle phase "–Phadoop_yarn". You must specify a valid lifecycle phase or a goal in the format <plugin-prefix>:<goal> or <plugin-group-id>:<plugin-artifact-id>[:<plugin-version>]:<goal>. Available lifecycle phases are: validate, initialize, generate-sources, process-sources, generate-resources, process-resources, compile, process-classes, generate-test-sources, process-test-sources, generate-test-resources, process-test-resources, test-compile, process-test-classes, test, prepare-package, package, pre-integration-test, integration-test, post-integration-test, verify, install, deploy, pre-clean, clean, post-clean, pre-site, site, post-site, site-deploy. -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/LifecyclePhaseNotFoundException
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.