dicl / veloxmr Goto Github PK
View Code? Open in Web Editor NEWData processing component of the Velox Big Data Framework (VBDF)
License: Apache License 2.0
Data processing component of the Velox Big Data Framework (VBDF)
License: Apache License 2.0
@zwigul can we use unique_ptr instead of shared_ptr in here?
https://github.com/DICL/EclipseMR/blob/2336cf253dbbb7f46ba1a0ca7ed3f78fa40bea77/src/mapreduce/nodes/peermr.cc#L209
From now on EclipseMR takes changes from EclipseCACHE, we have to fix the history.
This is the structure of intermediate data at reducer side.
Intermediate data
= (idata of key range 1, idata of key range 2, ..., idata of key range N)idata of key range k
= (iblock 1, iblock 2, ..., iblock M)
Currently, for any idata of a specific key range, key and values are sorted within each iblock, but not across the entire iblocks.
As a result, a reducer will get a key with "partially sorted" list of values.
I evaluated of the performance of IWriter itself without network parts.
This test mimics the procedure that messages::KeyValueShuffle
instances arrives at TaskExecutor::write_key_value
function and the arrived key-values are written into the object of IWriter
class as follows:
IWriter::add_key_value
function with the generated key-value.We've been tested VeloxMR with kmeans using 10 GB data on 40 nodes, where the amount of shuffled data on each node is approximately 256 MB, so this test is performed with 256 MB key-values.
std::move
(enable/disable): To avoid data duplication when we insert key-values to iblock.I ran each test 8 times and take the average.
IWriter::add_key_value
function is called over 10 million times, and it takes no more than 2 minutes in total (regarding the key-value generation takes around 30 seconds).std::move
) decreases.std::move
rather degrades the performance.This release should be fully compatible with Eclipse DFS 1.0.* series
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.