harbby / astarte Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
分布式计算系统中经常会通过shuffle在节点间传输大量数据.IO瓶颈(磁盘/网络)也shuffle环节最大的挑战.
通过减少传输字节数来提升IO是非常有效的手段.常见做法是高效的序列化器+压缩来提升IO性能.
所有编码器解码器都继承自Encoder<E>
且由Encoders
类进行引用:
接口设计如下:
public interface Encoder<E>
extends Serializable
{
public void encoder(E value, DataOutput output)
throws IOException;
public E decoder(DataInput input)
throws IOException;
}
待补充
该patch将sort shuffle引入astarte。合并后将会在批计算中移除HashShuffle。
引入sort shuffle后的好处:
map端进行map sort. reduce端进行merge sort。
采用spark的做法: shuffle service不对map file进行合并读取。直接转发压缩文件。reduce端进行合并。
net channel = MapTaskNum * reduceNumber(hash shuffle =reduceNumber * executorNum )
序列化器的核心功能已经合入主分支(详见: #4)
现计划进一步来引入类型系统.
现阶段先将其他缺失的类型进行补充:
byte,short,char,float,TimeStamp.....
)byte[], short[], float[].....
)Map<K,V>
类型Tuple[3-10-22]
private static class LongEncoder
implements Encoder<Long>
{
@Override
public void encoder(Long value, DataOutput output)
throws IOException
{
output.writeLong(value);
}
@Override
public Long decoder(DataInput input)
throws IOException
{
return input.readLong();
}
}
详见: Encoders
该patch将引入code generation。旨在提高业务代码密度。
以这个例子为例:
dataSource.map(x->x+1).map(x->x * 3).filter(x % 2 ==0).count()
将生成如下物理执行计划:
...
通过code generation后将优化为如下伪代码:
while(dataSource.hashNext()) {
input = dataSource.nextRow()
input = input + 1
input = input * 3
if (input % 2 ==0) {
collect.collect(input)
}
}
现代框架普遍存在大框架小业务的尴尬(俗称太重
)。在提高密度的同时也面向过程的开始,将所有OO抽象一层层剥离。
理想的情况下。所有处理数据的业务代码将全部压缩在一个循环体
中。这也是人在面向过程手写代码的形式。
高密度代码在性能上具有非常多的优势,这里暂略。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.