Comments (4)
I see two approaches possible:
a) using smart structure instead of ArrayBuffer[Int] to do dedup during adding neighbours
b) do post processing for sorting or duplicate elimination
I assume that the signature of below function should remain unchanged to makes things easier
def readEdgesBySource(): (Int2ObjectMap[ArrayBuffer[Int]], Int2IntArrayMap) = {
Solution (a) introduces additional memory overhead for keeping both set and list in case of unsorted with duplicates elimination plus some extra memory copying at the end of the processing. It should come in zero cost if neither deduplication nor sorting is applied.
Solution (b) can do the postprocessing on each value in the edgesBySource map. It will cost additional memory per neighbours list (much less than solution (a)). Memory copying will happen also in this case for some cases e.g. dedup with original order.
Time complexity ibn both cases is similar of n * O(klogk) for sorting (when n is number of nodes and k is #of neighbours). Duplicate elimination in both cases is feasible in additional O(m) cost (m = #edges).
Do you have any suggestions which approach to follow?
I have (a) implemented and tested - as it seemed to be more elegant one :-)
from cassovary.
IIUC you are saying that (a) requires at least 2x memory in your implementation as you would keep both a set and a list? If so, that is just too much. You could keep only a SortedSet in (a) and then convert to Array at the end. Alternatively, post-processing as in (b) looks good too.
from cassovary.
Yes, in solution (a) there is twice as much memory used. Unfortunately it is impossible to use only SortedSet(a) if you want to keep the order of neighbours as given in the file. It is possible if we decide that deduplication=true and sort=false is not acceptable.
from cassovary.
Makes sense -- (b) sounds more flexible then, doesn't it?
On Wed, Jul 15, 2015 at 8:23 PM, Tomek Εasica [email protected]
wrote:
Yes, in solution (a) there is twice as much memory used. Unfortunately it
is impossible to use only SortedSet(a) if you want to keep the order of
neighbours as given in the file. It is possible if we decide that
deduplication=true and sort=false is not acceptable.β
Reply to this email directly or view it on GitHub
#185 (comment).
from cassovary.
Related Issues (20)
- [Bug?] PageRank sum is much less 1 HOT 11
- Exception loading graph in SharedArrayBasedDirectedGraph HOT 5
- java.lang.OutOfMemoryError: PermGen space HOT 5
- Allow sorting and de-duplication of neighbors in AdjacencyListGraphReader as well
- Allow sparse array representation in ArrayBasedDirectedGraph HOT 1
- Convert Node.inboundNodes() and outboundNodes() to have Array[Int] instead of Seq[Int] HOT 7
- build falling HOT 1
- Build fails due to twitter metrics dependency HOT 3
- Do not use finagle-stats inside cassovary-core
- Write a Cassovary shell HOT 1
- Refactor API: s/node/vertex HOT 3
- Implement GraphX API on Cassovary HOT 1
- Implement Giraph API on Cassovary
- [question] Property Graph? HOT 1
- PerformanceBenchmark doesn't seem to be able to read gzipped files HOT 1
- Add optional command-line arg in PerformanceBenchmark to store only outgoing edges HOT 1
- Inconsistency in Node API HOT 1
- [question] What is the status of Cassovary? HOT 1
- How to Fix a Self-Dependence Error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cassovary.