Git Product home page Git Product logo

mapreduce-lite's Introduction

MapReduce Lite is a C++ implementation of the MapReduce programming paradigm.

Pros

First of all, MapReduce Lite is Lite!

  • It does not rely on a distributed filesystem -- it can simply use local filesystem;
  • It does not have a dynamic task scheduling system -- the map/reduce tasks were scheduled before the parallel job is started;
  • There is zero deployment / configuration cost -- just link your program against the MapReduce Lite library statically and run it.

In addition to the functions described in Google's famous MapReduce paper, known as batch reduction mode in MapReduce Lite, there is also an incremental reduction mode, doing the shuffling phase in memory and without disk access. In this mode, MapReduce Lite programs run much faster than rigid implementations like Hadoop.

Cons

As a lite implementation, MapReduce Lite does not support fault recovery, which, however, is arguably not too difficult to achieve if we do not require backup workers or global counters and can use a distributed filesystem (DFS).

Applications

In Tencent, we have been using MapReduce Lite with a Tencent's DFS to run jobs like search engine log processing, search and ads click model training, and distributed language model training.

A Sample

using mapreduce_lite::Mapper;
using mapreduce_lite::BatchReducer;
using mapreduce_lite::ReduceInputIterator;

class WordCountMapper : public Mapper {
 public:
  void Map(const std::string& key, const std::string& value) {
    std::vector<std::string> words;
    SplitStringUsing(value, " ", &words);
    for (int i = 0; i < words.size(); ++i) {
      Output(words[i], "1");
    }
  }
};
REGISTER_MAPPER(WordCountMapper);

class WordCountBatchReducer : public BatchReducer {
 public:
  void Reduce(const string& key, ReduceInputIterator* values) {
    int sum = 0;
    LOG(INFO) << "key:[" << key << "]";
    for (; !values->Done(); values->Next()) {
      //LOG(INFO) << "value:[" << values->value() << "]";
      istringstream parser(values->value());
      int count = 0;
      parser >> count;
      sum += count;
    }
    ostringstream formater;
    formater << key << " " << sum;
    Output(key, formater.str());
  }
};
REGISTER_BATCH_REDUCER(WordCountBatchReducer);

Install

Please refer to the HowToInstall document.

Updates

  1. 2013-10-4: MapReduce Lite supports Mac OS X and FreeBSD in addition to Linux. You can build your MapReduce Lite programs using GCC or Clang.

mapreduce-lite's People

Contributors

aksnzhy avatar wangkuiyi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mapreduce-lite's Issues

在incremental model下的错误

不知道为什么,在使用incremental_test.sh测试reduce时,使用1个reduce,1到3个mapper时是没问题的,一旦超过3个mapper,就会在reduce端发生异常错误,错误显示是:double free。我在单机和集群环境下都遇到了这个问题,不知道你们在测试时遇到这个问题了吗?

Installation issue

I am trying to build this project in my ubuntu16.04 x86 vm by following the steps of "https://github.com/wangkuiyi/mapreduce-lite/blob/master/doc/install.md", but it failed on the last step of building mapreduce-lite.

I am listing what I have done (so that maybe someone can help me out), thanks in advance.

1.Install gcc, cmake and protobuf
sudo apt-get install gcc binutils
sudo apt-get install cmake
sudo apt-get install libprotobuf-dev libprotoc-dev

2.Install gflags (2.0), Boost (1.54.0) and libevent (2.0.21-stable)

  1. gflags
    cd gflags-2.0
    ./configure --prefix=/home/xxx/Downloads/gflags-2.0
    make && make install
    ln -s /home/xxx/Downloads/gflags-2.0 /home/xxx/Downloads/gflags
  2. boost
    cd /home/xxx/Downloads/
    tar xjf boost_1_54_0.tar.bz2
    cd boost_1_54_0
    ./bootstrap --prefix=/home/xxx/Downloads/boost_1_54_0
    ./b2 -j1
    ./b2 install
    ln -s /home/xxx/Downloads/boost_1_54_0 /home/xxx/Downloads/boost
  3. libevent
    cd /home/xxx/Downloads
    tar xjf libevent-2.0.21-stable.tar.bz2
    cd libevent-2.0.21-stable
    ./configure --prefix=/home/xxx/Downloads/libevent-2.0.21-stable
    make && make install -i
    ln -s /home/xxx/Downloads/libevent-2.0.21-stable /home/xxx/Downloads/libevent
  1. Download gtest 1.7.0 and build mapreduce-lite
    1. gtest
      cd mapreduce-lite/src
      ln -s /home/xxx/Downloads/gtest-1.7.0 gtest
  1. make some changes in mapreduce-lite-master\CMakeLists.txt
    set(CMAKE_INSTALL_PREFIX "/home/xxx/Downloads/mapreduce-lite-master")
    set(THIRD_PARTY_DIR "/home/xxx/Downloads")

change from add_definitions(" -Wall -Wno-sign-compare -Werror -O2 ")
to add_definitions(" -Wall -Wno-sign-compare -O2 ")
because it will treat warnign as errors

  1. build mapreduce-lite
    cd /home/xxx/Downloads/mapreduce-lite-master
    mkdir build
    cd build
    cmake ..
    make -j1 (or just make) <---- failed here
    make install

But on the last second step of "Build MapReduce Lite", , when I am running "make", I got the following error -

[ 70%] Linking CXX executable sorted_buffer_iterator_test
../gtest/libgtest.a(gtest-all.cc.o): In function `testing::internal::StreamingListener::SocketWriter::MakeConnection()':
gtest-all.cc:(.text+0x99d2): warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
[ 70%] Built target sorted_buffer_iterator_test
**make[2]: * No rule to make target '../src/mapreduce_lite/PROTOBUF_PROTOC_EXECUTABLE-NOTFOUND', needed by 'src/mapreduce_lite/protofile.pb.cc'. Stop.
CMakeFiles/Makefile2:1450: recipe for target 'src/mapreduce_lite/CMakeFiles/mapreduce_lite.dir/all' failed
make[1]: *** [src/mapreduce_lite/CMakeFiles/mapreduce_lite.dir/all] Error 2
Makefile:127: recipe for target 'all' failed

run "sudo apt-get install protobuf-compiler", but it does not help solve this build error.

Appreciate it if someone can provide some comment/suggestion.

Thanks in advance.

A python runtime error

I built this project successfully, but some errors occurred when I try to run it in the single machine model.

I modify the IP variable to 127.0.0.1 and chmod +x wordcount_run_locally.sh

when I run ./wordcount_run_locally.sh, the error occurred in util.py, error trace shows :

mrlite.py: Job started at Mon Sep  1 20:28:21 2014
mrlite.py: Copy binary executable and python scripts to machines
Traceback (most recent call last):
  File "../scheduler/mrlite.py", line 279, in <module>
    main(sys.argv[1:])
  File "../scheduler/mrlite.py", line 257, in main
    scheduler = MRLiteJobScheduler(options)
  File "../scheduler/mrlite.py", line 85, in __init__
    self.dispatch_file(file_list, options.mapreduce_tmp_dir)
  File "/home/alex/mapreduce-lite/scheduler/util.py", line 126, in dispatch_file
    self.wait_cmd(process, cmd_scp)
  File "/home/alex/mapreduce-lite/scheduler/util.py", line 92, in wait_cmd
    raise RuntimeError(mesg)
RuntimeError: Fail with retcode(255): mkdir -p /tmp/mrlite-alex

It seems that there are somethings wrong in function : wait_cmd(process, cmd_scp)
the retcode is 255, but seems Useless.

I tested it on Ubuntu 13.04 and 14.04, the error is the same.

make -j8 error

i have builded all the third-party libraries from source code,and replace that value in mapreduce-lite/CMakeLists.txt, but there are some errors as follows:

[ 1%] [ 5%] [ 10%] [ 20%] [ 21%] Built target gtest
Built target gmock
Built target base
Built target strutil
[ 25%] Built target gmock_main
[ 28%] Built target system
Built target hash
[ 33%] [ 36%] Built target sorted_buffer
Built target gtest_main
[ 40%] [ 41%] [ 43%] [ 45%] Built target class_register_test
[ 46%] Built target cvector_test
Built target stl-util_test
Built target varint32_test
[ 48%] Built target split_string_test
[ 50%] Built target join_strings_test
Built target strcodec_test
[ 51%] [ 53%] Built target stringprintf_test
Built target md5_hash_test
[ 56%] [ 56%] [ 58%] [ 60%] Built target filepattern_test
[ 61%] Built target condition_variable_test
Building CXX object src/mapreduce_lite/CMakeFiles/mapreduce_lite.dir/socket_communicator.cc.o
Built target mutex_test
Built target memory_allocator_test
[ 63%] Built target memory_piece_io_test
[ 65%] Built target memory_piece_less_than_test
[ 66%] Built target memory_piece_test
[ 68%] [ 70%] Built target sorted_buffer_iterator_test
Built target sorted_buffer_test
[ 71%] Built target sorted_buffer_regression_test
In file included from /root/zhijie.lv/3rd-party/boost/include/boost/bind/bind.hpp:29:0,
from /root/zhijie.lv/3rd-party/boost/include/boost/bind.hpp:22,
from /root/zhijie.lv/3rd-party/boost/include/boost/thread/detail/thread.hpp:29,
from /root/zhijie.lv/3rd-party/boost/include/boost/thread/thread_only.hpp:22,
from /root/zhijie.lv/3rd-party/boost/include/boost/thread/thread.hpp:12,
from /root/zhijie.lv/3rd-party/boost/include/boost/thread.hpp:13,
from /root/zhijie.lv/mapreduce-lite/src/mapreduce_lite/socket_communicator.h:28,
from /root/zhijie.lv/mapreduce-lite/src/mapreduce_lite/socket_communicator.cc:20:
/root/zhijie.lv/3rd-party/boost/include/boost/bind/arg.hpp: In constructor ?.oost::arg::arg(const T&)?.
/root/zhijie.lv/3rd-party/boost/include/boost/bind/arg.hpp:37:22: warning: typedef ?._must_be_placeholder?.locally defined but not used [-Wunused-local-typedefs]
typedef char T_must_be_placeholder[ I == is_placeholder::value? 1: -1 ];
^
In file included from /root/zhijie.lv/3rd-party/boost/include/boost/atomic.hpp:12:0,
from /root/zhijie.lv/3rd-party/boost/include/boost/thread/pthread/once_atomic.hpp:20,
from /root/zhijie.lv/3rd-party/boost/include/boost/thread/once.hpp:20,
from /root/zhijie.lv/3rd-party/boost/include/boost/thread.hpp:17,
from /root/zhijie.lv/mapreduce-lite/src/mapreduce_lite/socket_communicator.h:28,
from /root/zhijie.lv/mapreduce-lite/src/mapreduce_lite/socket_communicator.cc:20:
/root/zhijie.lv/3rd-party/boost/include/boost/atomic/atomic.hpp: At global scope:
/root/zhijie.lv/3rd-party/boost/include/boost/atomic/atomic.hpp:202:16: error: ?.intptr_t?.was not declared in this scope
typedef atomic<uintptr_t> atomic_uintptr_t;
^
/root/zhijie.lv/3rd-party/boost/include/boost/atomic/atomic.hpp:202:25: error: template argument 1 is invalid
typedef atomic<uintptr_t> atomic_uintptr_t;
^
/root/zhijie.lv/3rd-party/boost/include/boost/atomic/atomic.hpp:202:43: error: invalid type in declaration before ?.?.token
typedef atomic<uintptr_t> atomic_uintptr_t;
^
In file included from /root/zhijie.lv/3rd-party/boost/include/boost/system/system_error.hpp:14:0,
from /root/zhijie.lv/3rd-party/boost/include/boost/thread/exceptions.hpp:22,
from /root/zhijie.lv/3rd-party/boost/include/boost/thread/pthread/thread_data.hpp:10,
from /root/zhijie.lv/3rd-party/boost/include/boost/thread/thread_only.hpp:17,
from /root/zhijie.lv/3rd-party/boost/include/boost/thread/thread.hpp:12,
from /root/zhijie.lv/3rd-party/boost/include/boost/thread.hpp:13,
from /root/zhijie.lv/mapreduce-lite/src/mapreduce_lite/socket_communicator.h:28,
from /root/zhijie.lv/mapreduce-lite/src/mapreduce_lite/socket_communicator.cc:20:
/root/zhijie.lv/3rd-party/boost/include/boost/system/error_code.hpp:222:36: warning: ?.oost::system::posix_category?.defined but not used [-Wunused-variable]
static const error_category & posix_category = generic_category();
^
/root/zhijie.lv/3rd-party/boost/include/boost/system/error_code.hpp:223:36: warning: ?.oost::system::errno_ecat?.defined but not used [-Wunused-variable]
static const error_category & errno_ecat = generic_category();
^
/root/zhijie.lv/3rd-party/boost/include/boost/system/error_code.hpp:224:36: warning: ?.oost::system::native_ecat?.defined but not used [-Wunused-variable]
static const error_category & native_ecat = system_category();
^
make[2]: *** [src/mapreduce_lite/CMakeFiles/mapreduce_lite.dir/socket_communicator.cc.o] Error 1
make[1]: *** [src/mapreduce_lite/CMakeFiles/mapreduce_lite.dir/all] Error 2
make: *** [all] Error 2

A Link error

按照文档安装了需要的库,编译链接时出现错误:

Linking CXX static library libmapreduce_lite.a
[ 85%] Built target mapreduce_lite
Scanning dependencies of target protofile_test
[ 87%] Building CXX object src/mapreduce_lite/CMakeFiles/protofile_test.dir/protofile_test.cc.o
Linking CXX executable protofile_test
/usr/bin/ld: cannot find -lboost_thread-mt
../gtest/libgtest.a(gtest-all.cc.o):在函数‘testing::internal::StreamingListener::SocketWriter::MakeConnection()’中:
gtest-all.cc:(.text+0x6fe4): 警告: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.