Git Product home page Git Product logo

nativetask's Introduction

#What is NativeTask? NativeTask is a performance oriented native engine for Hadoop MapReduce.

NativeTask can be used transparently as a replacement of in-efficient Map Output Collector , or as a full native runtime which support native mapper and reducer written in C++. Please check wiki and this paper for details NativeTask: A Hadoop Compatible Framework for High Performance.

Some early discussions of NativeTask can be found at MAPREDUCE-2841.

#What is the benefit?

1. Superior Performance

For CPU intensive job like WordCount, we can provides 2.6x performance boost transparently, or 5x performance boost when running as full native runtime. native MapOutputCollector mode

2. Compatibility and Transparency

NativeTask can be transparently enabled in MRv1 and MRv2, requiring no code/binary change for existing MapReduce jobs. If certain required feature has not been supported yet, NativeTask will automatically fallback to default implementation.

3. Feature Complete

NativeTask is feature complete, it supports:

  • Most key types and all value types(subclass of Writable). For a comprehensive list of supported keys, please check the Wiki Page.
  • Platforms like HBase/Hive/Pig/Mahout.
  • Compression codec like Lz4/Snappy/Gzip.
  • Java/Native combiner.
  • Hardware checksumming CRC32C.
  • Non-sorting MapReduce paradigm when sorting is not required.

4. Full Extensibility

Developers are allowed to extend NativeTask to support more key types, and to replace building blocks of NativeTask with a more efficient implementation dynamically without re-compilation of the source code.

#How to use NativeTask?

NativeTask can works in two modes,

1. Transparent Collector Mode. In this mode, NativeTask works as transparent replacement of current in-efficient Map Output Collector, with zero changes required from user side.

2. Native Runtime Mode In this mode, NativeTask works as a dedicated native runtime to support native mapper and native reducer written in C++.

Here is the steps to enable NativeTask in transparent collector mode:

  1. clone NativeTask repository
git clone https://github.com/intel-hadoop/nativetask.git
  1. Checkout the right source branch

To build NativeTask for hadoop1.2.1,

git checkout hadoop-1.0

To build NativeTask for Hadoop2.2.0,

git checkout master
  1. patch Hadoop (${HADOOP_ROOTDIR} points to the root directory of Hadoop codebase)
cd nativetask
cp patch/hadoop-2.patch ${HADOOP_ROOTDIR}/
cd ${HADOOP_ROOTDIR}
patch -p0 < hadoop-2.patch
  1. build NativeTask with Hadoop
cd nativetask
cp -r . ${HADOOP_ROOTDIR}/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask
cd ${HADOOP_ROOTDIR}
mvn install -DskipTests -Pnative
  1. install NativeTask
cd ${HADOOP_ROOTDIR}/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target
cp hadoop-mapreduce-client-nativetask-2.2.0.jar /usr/lib/hadoop-mapreduce/
cp native/target/usr/local/lib/libnativetask.so /usr/lib/hadoop/lib/native/
  1. run MapReduce Pi example with native output collector
hadoop jar hadoop-mapreduce-examples.jar pi -Dmapreduce.job.map.output.collector.class=org.apache.hadoop.mapred.nativetask.NativeMapOutputCollectorDelegator 10 10
  1. check the task log and NativeTask is successfully enabled if you see the following log
INFO org.apache.hadoop.mapred.nativetask.NativeMapOutputCollectorDelegator: Native output collector can be successfully enabled! 

Please check wiki for how to run MRv1 over NativeTask and HBase, Hive, Pig and Mahout support

Contributors

Contacts

For questions and support, please contact

Further information

For further documents, please check the Wiki Page.

nativetask's People

Contributors

clockfly avatar decster avatar manuzhang avatar sproblvem avatar zoken avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.