Git Product home page Git Product logo

goffish_v3's People

Contributors

abhilashsharma avatar dipkakwani avatar hullas1502 avatar humus- avatar keladhruv avatar kideinstein avatar prajay avatar simmhan avatar spavz avatar tilaksn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

goffish_v3's Issues

iterator over subgraph neighbors

Add Iterable<S> getRemoteSubgraphs() to ISubgraph to allow access to neighboring SG IDs directly rather than iterating through all remote vertices. e.g. helps in connected components.

Optionally, also allow access to all subgraph IDs in the metagraph.

Add logging before/after bsp.sync, memory after bsp.sync

Hama's bsp.sync actually transmits the messages. Having the memory before/after sync allows us to estimate the memory used on the output message buffer. Having the time before/after gives the communiction duration as well.

Logging of MSG_COUNT

format: PERF.SG.SEND_MSG_COUNT : sgid, superstep_num, sg_msg_count,broadcast_msg_count,total : messages sent per superstep

*sendToAll increases broadcast_msg_count by 1. So use: total msg count = sg_msg_count + total_sg * broadcast_msg_count

Issue: Sum of all sent messages in ith superstep is not equal to sum of all received messages in (i+1)th superstep.
Typically, send_msg_count << recv_msg_count

API for getting boundary entities

Add method for Iterable<IVertex> getBoundaryVertices() and Iterable<IVertex> getBoundaryEdges() to identify local vertices that have an outedge to a remote vertex, and outedges that have a remote vertex as sink.

Make GoFFish Hama's `sendToNeighbors` more efficient

Current implementation scans thru all remote vertices to find neighboring subgraphs. Need to maintain an internal adjacency list of subgraph ID or add Iterator <S> getRemoteSubgraphs() to ISubgraph API to make this efficient.

Subgraph object not available in the subgraph compute constructor

The subgraph object can only be accessed after the compute class is instantiated. So a call to getSubgraph() inside constructor of compute class throws a NullPointerException.

Relevant code snippets:

Example where subgraph object might be used inside a constructor (full code):

        /**
         * Input has <num_clusters>,<max_edge_cuts>,<max_iters>
         *
         * @param initMsg
        */
        public KMeans(String initMsg) {
                String[] inp = initMsg.split(",");
                //SubgraphValue value = new SubgraphValue();
                value.k = Integer.parseInt(inp[0]);
                value.maxEdgeCrossing = Long.parseLong(inp[1]);
                value.maxIterations = Integer.parseInt(inp[2]);
                // NOTE: Subgraph state is not available in the constructor.
                //getSubgraph().setSubgraphValue(value);

        }

Framework code snippet which creates subgraphCompute Object and assigns subgraph object to it (full code):

    /*
     * Creating SubgraphCompute objects
     */
    for (ISubgraph<S, V, E, I, J, K> subgraph : partition.getSubgraphs()) {
      Class<? extends AbstractSubgraphComputation<S, V, E, M, I, J, K>> subgraphComputeClass;
      subgraphComputeClass = (Class<? extends AbstractSubgraphComputation<S, V, E, M, I, J, K>>) conf
              .getClass(GraphJob.SUBGRAPH_COMPUTE_CLASS_ATTR, null);
      if (subgraphComputeClass == null)
        throw new RuntimeException("Could not load subgraph compute class");

      AbstractSubgraphComputation<S, V, E, M, I, J, K> abstractSubgraphComputeRunner;

      if (initialValue != null) {
        Object []params = {initialValue};
        abstractSubgraphComputeRunner = ReflectionUtils.newInstance(subgraphComputeClass, params);
      }
      else
        abstractSubgraphComputeRunner = ReflectionUtils.newInstance(subgraphComputeClass);

      // FIXME: Subgraph value is not available to user in the subgraph-compute's constructor,
      // since it is added only after the object is created (using setSubgraph).
      SubgraphCompute<S, V, E, M, I, J, K> subgraphComputeRunner = new SubgraphCompute<S, V, E, M, I, J, K>();
      subgraphComputeRunner.setAbstractSubgraphCompute(abstractSubgraphComputeRunner);
      abstractSubgraphComputeRunner.setSubgraphPlatformCompute(subgraphComputeRunner);
      subgraphComputeRunner.setSubgraph(subgraph);
      subgraphComputeRunner.init(this);
      subgraphs.add(subgraphComputeRunner);
    }

incompatible json format

The JSON input produced by HAMA fastgen is of the format
[srcid, 0 , [[sinkid1,edgevalue1],[sinkid2, edgevalue2]... ]]
**example ** [99,0,[[32,1995],[17,1809],[2,969],[50,1278],[25,321],[28,390]....]]

while the format required by LongTextJSONReader is of the form
​ [srcid,partitionid,srcvalue,[[sinkid1,edgeid1,edgevalue1],[sinkid2,edgeid2,edgevalue2]... ]]

what to do ?

GoFFish v3 build errors

Platform: Windows10
JAVA: Oracle 8.0
Moven: 3.6

I followed instructions from this https://github.com/dream-lab/goffish_v3/tree/master/giraph

I'm getting below errors?

[ERROR] Failed to execute goal org.sonatype.plugins:munge-maven-plugin:1.0:munge (munge) on project goffish-giraph: Execution munge of goal org.sonatype.plugins:munge-maven-plugin:1.0:munge failed: basedir D:\work\GoFFish\giraph\goffish-giraph\src\test\java does not exist -> [Help 1]

Limit `synchronized` in sendMessage

goffish_v3/hama/v3.1/src/main/java/in/dream_lab/goffish/hama/GraphJobRunner.java

Have synchronized one for the eventual
private void sendMessage(String peerName, Message<K, M> message) rather than for each sendMessage

Update licenses, authors in all files.

Many files have incorrect license terms. Code is licensed under Apache License, Version 2.0 and Copyright is held by DREAM:Lab, IISc. It is NOT "Licensed to the Apache Software Foundation (ASF)"

unable to install

mvn –Phadoop_yarn –Dhadoop.version=2.7.2 -DskipTests clean package -s settings.xml [INFO] Scanning for projects... [INFO] ------------------------------------------------------------------------ [INFO] Reactor Build Order: [INFO] [INFO] Apache Giraph Parent [INFO] Apache Giraph Core [INFO] Apache Giraph Blocks Framework [INFO] Apache Giraph Examples [INFO] Apache Giraph Accumulo I/O [INFO] Apache Giraph HCatalog I/O [INFO] Apache Giraph Gora I/O [INFO] Apache Giraph Distribution [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building Apache Giraph Parent 1.2.0 [INFO] ------------------------------------------------------------------------ [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Apache Giraph Parent ............................... FAILURE [ 0.003 s] [INFO] Apache Giraph Core ................................. SKIPPED [INFO] Apache Giraph Blocks Framework ..................... SKIPPED [INFO] Apache Giraph Examples ............................. SKIPPED [INFO] Apache Giraph Accumulo I/O ......................... SKIPPED [INFO] Apache Giraph HCatalog I/O ......................... SKIPPED [INFO] Apache Giraph Gora I/O ............................. SKIPPED [INFO] Apache Giraph Distribution ......................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 0.576 s [INFO] Finished at: 2018-04-13T16:41:32+05:30 [INFO] Final Memory: 9M/109M [INFO] ------------------------------------------------------------------------ [ERROR] Unknown lifecycle phase "–Phadoop_yarn". You must specify a valid lifecycle phase or a goal in the format <plugin-prefix>:<goal> or <plugin-group-id>:<plugin-artifact-id>[:<plugin-version>]:<goal>. Available lifecycle phases are: validate, initialize, generate-sources, process-sources, generate-resources, process-resources, compile, process-classes, generate-test-sources, process-test-sources, generate-test-resources, process-test-resources, test-compile, process-test-classes, test, prepare-package, package, pre-integration-test, integration-test, post-integration-test, verify, install, deploy, pre-clean, clean, post-clean, pre-site, site, post-site, site-deploy. -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/LifecyclePhaseNotFoundException

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.