Git Product home page Git Product logo

tpie's Introduction

TPIE - The Templated Portable I/O Environment

LGPL-3.0 License unit tests documentation

The TPIE (Templated Portable I/O Environment) library is a tool box providing efficient and convenient tools to ease the implementation of algorithms and data structures on very large sets of data.

Project status

As of 2021, this project does not see much development, but it is used heavily in production. There has not been a TPIE release for a long time, but the master branch is very stable and is unlikely to see major breaking changes.

Table of Contents

Documentation

For help, installation, usage and documentation, visit the TPIE homepage; the API documentation is also hosted there and on GitHub pages.

Dependencies

TPIE has a mandatory dependency on the Boost library and requires a C++ compiler that supports the 14 standard (e.g. GNU 7+ and Clang 10+). Furthermore, optionally TPIE can use the Snappy, LZ4, and ZSTD compression algorithms, if available.

Usage

Building TPIE is done entirely with CMake.

In your project

Assuming this repository is cloned into the external/tpie folder of your project, then add TPIE as a subdirectory in your own CMakeLists.txt.

add_subdirectory (external/tpie tpie)

TPIE is then linked to each target executable of choice.

add_executable(<target> <source>)
target_link_libraries(<target> tpie)
set_target_properties(<target> PROPERTIES CXX_STANDARD 14)

Then build your project with CMake. For other ways to install TPIE, see Compiling and installing TPIE in the documentation.

Unit tests

To compile the entire project and run the unit tests execute the following commands.

# Compile entire TPIE project
mkdir build
cd build
cmake -D CXX_STANDARD 14 ..

# Compile tests
make

# Run unit tests
ctest

Examples

The example/helloworld.cpp provides a rudementary example for how to use TPIE, assuming it has been installed. The apps/ folder includes multiple non-trivial examples. To compile and run any of the CMake enabled examples in apps execute the following commands

# CMake
mkdir build
cd build
cmake -D CXX_STANDARD 14 ..

# Compile
cd apps/<app>/
make

# Run
./<app>

License

The software and documentation files in this repository are provided under the LGPL 3.0 License.

tpie's People

Contributors

mortal avatar antialize avatar svendcs avatar thomasmoelhave avatar tyilo avatar adanner avatar freekvw avatar ssoelvsten avatar jallan avatar prentow avatar mbeckem avatar coolwanglu avatar shlomif avatar

Stargazers

neo avatar  avatar Peter Li avatar Lucas W. avatar BigCai avatar Kai Wang avatar toge avatar AddressXception avatar  avatar Wahid Niam avatar George Erickson avatar Hoang Phan avatar Ariel Shtul avatar  avatar YangXin avatar  avatar  avatar chirlchen avatar Yuan, Man avatar Stas avatar xiaoerlageid avatar T.J. Corona avatar Philippe Ombredanne avatar KG avatar CIH avatar trend avatar Gibran Fuentes-Pineda avatar hurricane avatar  avatar MiaoDX avatar Talha Saruhan avatar  avatar  avatar Adelar da Silva Queiróz avatar Matthew Krupcale avatar robbinfan avatar Jonathan Lurie avatar Otfried Cheong avatar Anton avatar  avatar winway avatar Asce Ma avatar  avatar keehang avatar hello avatar  avatar Holger Bartnick avatar Andrey avatar  avatar Mikal avatar yc avatar Jae-woo Kim avatar Oskar Haarklou Veileborg avatar Eric Han avatar FanShao avatar Albert Tavares de Almeida avatar Alex Tairbekov avatar Christin Jose avatar Dmitry Ledentsov avatar Mohammad Reza Taesiri avatar  avatar André Lage-Freitas avatar Matthew Turk avatar Jeremy Hinegardner avatar  avatar Bin Wu avatar verbalsaint avatar John Bandela avatar Diego Caro avatar Xiyou Zhou avatar Yuxin Chen avatar Vishal Belsare avatar zhongpu avatar Zhiya Luo avatar Matthew M Kaufman avatar  avatar Omar Alvarez avatar Andrea Bigagli avatar Chris Manning avatar Matt avatar  avatar Emil Bay avatar Anshu Avinash avatar Andrey Babanin avatar Cyril avatar Xiao Meng avatar X4 avatar  avatar  avatar  avatar Nicholas Pezolano avatar Oliver avatar  avatar forhappy avatar Dave avatar  avatar Morten Revsbæk avatar  avatar  avatar Adam Ehlers Nyholm Thomsen avatar

Watchers

 avatar  avatar  avatar Jan Vahrenhold avatar Adam Ehlers Nyholm Thomsen avatar  avatar  avatar Morten Revsbæk avatar  avatar James Cloos avatar hurricane avatar robbinfan avatar Daniel Perry avatar  avatar  avatar  avatar CIH avatar

tpie's Issues

ut-external_queue fails on debug/9496caa

rav@sanford:~/work/tpie-master/build-debug$ test/unit/ut-external_queue basic
basic .................................................................. [ ]ut-external_queue: /home/rav/work/tpie-master/tpie/file_base.cpp:117: void tpie::file_base::close(): Assertion m_free.empty()' failed. Aborted rav@sanford:~/work/tpie-master/build-debug$ test/unit/ut-external_queue sized sized .................................................................. [ ]ut-external_queue: /home/rav/work/tpie-master/tpie/file_base.cpp:117: void tpie::file_base::close(): Assertionm_free.empty()' failed.
Aborted

Pipelining: Shorten progress fraction internal names

Right now, fractions in the progress indicator used by pipelining are stored with typeid(T).name() as the key, resulting in humongous names because of the nested templates used in pipelining. This is a problem since the keys are used in string comparisons, so we should make them shorter by some kind of hashing scheme.

Build fail

Hi, Im trying to build for the first time, and I hit an issue:

joe@blackbox:~/Projects/src$ git clone https://github.com/thomasmoelhave/tpie.git
Initialized empty Git repository in /home/joe/.projects/src/tpie/.git/
remote: Counting objects: 17673, done.
remote: Compressing objects: 100% (5280/5280), done.
remote: Total 17673 (delta 11613), reused 17061 (delta 11149)
Receiving objects: 100% (17673/17673), 7.96 MiB | 1.11 MiB/s, done.
Resolving deltas: 100% (11613/11613), done.
joe@blackbox:~/Projects/src$ cd tpie
joe@blackbox:~/Projects/src/tpie$ cmake .
-- The C compiler identification is GNU
-- The CXX compiler identification is GNU
-- Check for working C compiler: /usr/bin/gcc
-- Check for working C compiler: /usr/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Boost version: 1.42.0
-- Found the following Boost libraries:
--   date_time
--   thread
--   filesystem
--   system
-- TBB_INSTALL_DIR not found. 
-- Looking for include files TPIE_HAVE_UNISTD_H
-- Looking for include files TPIE_HAVE_UNISTD_H - found
-- Looking for include files TPIE_HAVE_SYS_UNISTD_H
-- Looking for include files TPIE_HAVE_SYS_UNISTD_H - found
-- Could NOT find Doxygen (missing:  DOXYGEN_EXECUTABLE) 
-- Doxygon not found, api documentation can not be generated.
-- Configuring done
-- Generating done
-- Build files have been written to: /home/joe/Projects/src/tpie
joe@blackbox:~/Projects/src/tpie$ make
Scanning dependencies of target tpie
[  1%] Building CXX object tpie/CMakeFiles/tpie.dir/bte/stream_base.cpp.o
[  3%] Building CXX object tpie/CMakeFiles/tpie.dir/key.cpp.o
[  4%] Building CXX object tpie/CMakeFiles/tpie.dir/backtrace.cpp.o
[  6%] Building CXX object tpie/CMakeFiles/tpie.dir/cpu_timer.cpp.o
[  7%] Building CXX object tpie/CMakeFiles/tpie.dir/logstream.cpp.o
[  9%] Building CXX object tpie/CMakeFiles/tpie.dir/mm_base.cpp.o
[ 11%] Building CXX object tpie/CMakeFiles/tpie.dir/mm_manager.cpp.o
[ 12%] Building CXX object tpie/CMakeFiles/tpie.dir/portability.cpp.o
[ 14%] Building CXX object tpie/CMakeFiles/tpie.dir/tpie_log.cpp.o
[ 15%] Building CXX object tpie/CMakeFiles/tpie.dir/tempname.cpp.o
[ 17%] Building CXX object tpie/CMakeFiles/tpie.dir/prime.cpp.o
[ 19%] Building CXX object tpie/CMakeFiles/tpie.dir/progress_indicator_subindicator.cpp.o
[ 20%] Building CXX object tpie/CMakeFiles/tpie.dir/progress_indicator_base.cpp.o
[ 22%] Building CXX object tpie/CMakeFiles/tpie.dir/execution_time_predictor.cpp.o
[ 23%] Building CXX object tpie/CMakeFiles/tpie.dir/fractional_progress.cpp.o
/home/joe/Projects/src/tpie/tpie/fractional_progress.cpp:110: fatal error: tpie_fraction_db.inl: No such file or directory
compilation terminated.
make[2]: *** [tpie/CMakeFiles/tpie.dir/fractional_progress.cpp.o] Error 1
make[1]: *** [tpie/CMakeFiles/tpie.dir/all] Error 2
make: *** [all] Error 2

Any ideas? I'm on Ubuntu 10.10.

problems with TPIE_FRACTION_STATS

I'm having a number of problems compiling testing with TPIE fraction stats. If I turn off TPIE_FRACTION_STATS in cmake, I have to install dummy tpie_fraction_db*.inl files in my include path. Since I'm trying to install tpie system-wide (using stow), this usually means dumping these files in /usr/local/include before I build/install tpie into /usr/local/include. Can this be streamlined?

Furthermore, even if I get this to work and compile, I can't run tpie/test/test_sort -p ./ -n 1000

I get a floating point exception in tpie/tpie/prime.cpp:66:prime_hash, called from execution_time_predictor. Is this related? Is iit possible to run tpie with TPIE_FRACTION_STATS off?

I'm on ubuntu 10.04 with gcc 4.4.3

Mixing ostreams to different TPIE log levels

With log level set to LOG_INFO, I expect the following code to output two lines of logging, "Info log" twice.

However, "Info log" is only output once.

std::ostream & os = log_info();
os << "Info log" << std::endl;
std::ostream & os2 = log_debug();
os2 << "Debug log" << std::endl;
os << "Info log" << std::endl;

Keeping track of number of open files

It turns out that the available_files functionality implemented in file_count.cpp is not particularly useful in many cases.
TPIE faithfully keeps track of how many files it has opened, but it's incorrect to assume that the RLIMIT_NOFILE minus this number is how many files are available to the process.
For instance, in a qt application using tpie that I am working on, 275 file descriptors are used for dynamically loaded libraries and fontconfig files.

It is clear that the number returned by available_files() is an upper bound on the number of available file descriptors, but in the case of a large app using something like qt, this upper bound is way too big, making it essentially meaningless.

One "fix" is to use the current number of opened file descriptors (instead of the temporary file count maintained by tpie) in available_files(), but this of course has the downside that available_files() can change between to invocations even if no tpie stream has been closed or opened in the interim.

We could also allow the application to "promise" that there will be at least x files available to tpie and then use this x instead of *RLIMIT_NOFILE in available_files(), this shifts the burden of guessing how many file descriptors are used for other purposes to the application which can hopefully provide a fairly good guesstimate?

Thoughts?

Storing extra data alongside a virtual chunk

I need to be able to store extra data along side a virtual chunk.

I propose the following change.

diff --git a/tpie/pipelining/virtual.h b/tpie/pipelining/virtual.h
index b21207c..d20ebd4 100644
--- a/tpie/pipelining/virtual.h
+++ b/tpie/pipelining/virtual.h
@@ -221,6 +221,11 @@ class virtual_chunk;
 template <typename Output>
 class virtual_chunk_begin;

+class virtual_container {
+public:
+       virtual ~virtual_container() {}
+};
+
 namespace bits {

 class access {
@@ -328,8 +333,10 @@ public:
        /// ownership of it.
        ///////////////////////////////////////////////////////////////////////////
        template <typename fact_t>
-       virtual_chunk(const pipe_middle<fact_t> & pipe) {
+       virtual_chunk(const pipe_middle<fact_t> & pipe, 
+                                 virtual_container * c=0) {
                *this = pipe;
+               //TODO do something with the container so that it can be gced
        }

        ///////////////////////////////////////////////////////////////////////////

However there needs to be some garbage collection.
See TS piped_paralell_project terrastream/algorithm/project.cpp for an example.

get_temp_file_usage() too high

While raster construction is in the output phase, the temporary space usage is reported as 4.6TB while only 2.7TB is in use. The amount reported in the interface is directly from tpie::get_temp_file_usage(), which is apparently only fed by temp_file objects.
I located one possible problem location: stream::truncate() doesn't call m_temp.update_recorded_size(). This seems to be actually used by the sort_manager, so maybe that's all.

Build fails with boost 1.50

With boost 1.50, filesystem2 was deprecated in favor of filesystem3. This makes building the TPIE libraries with boost 1.50 impossible. Perhaps it's a good idea to finally move to filesystem3.

It seems that only a minimal amount of changes is required, as seen here: SakalisC/tpie@6532985cb6a0afa41baee49af515a68c09e55220

After the changes above, make all completes without errors and the ctest output is the same as it was with filesystem2. Also, it is still possible to build with boost 1.49 . I assume this extends to other older versions of boost which include filesystem3.

Pipelining: Rename pipe_segment -> node

pipe_segment is a bad name for the concept it is representing. I've read through some literature on streaming algorithms, and our concept that is currently named pipe_segment has the following names:

  • process
  • filter
  • stream
  • node / processing node
  • (concurrent computation) kernel
  • module
  • actor

In lack of a good term, I am going to use the least horrible, confusing and misrepresenting term, which is (in my humble opinion) node.

get_os_available_fds

a function get_os_available_fds is defiend in execution time_predictor.cpp but not within the tpie namespace. Is it used? It seems like an odd place to put it.

Parallel fails when worker produces more than one item per item

I have a simple test for parallel that fails.

Currently, the code assumes that when a worker transition is made from processing to outputting, after consuming the output buffer the worker should transition to idle, ready to receive more input. However, if the output buffer is full before the input buffer is fully processed, the worker should transition back to processing without receiving a new input buffer.

Generalized virtual pipes.

Currently the virtual pipe system supports only one push method. But in general one might need several. Is it possible to make it possible for users to generate their own virtual pipe interfaces? it does not nessesarily have to be easy or pretty.

Parallel and virtual yields error.

I have an explicit example in the terrastream ppp branch, where using virtual or parallul alone works fine, but together something goes wrong.

if (m_buffer->m_outputSize >= m_buffer->m_outputBuffer.size())
throw std::runtime_error("Buffer overrun in after");

throws. Btw why is this not an assert?

Enable override keyword

The override keyword is a powerful tool to diagnose simple errors with implementations of the pipeline framework.

We already sort of use it in the form of out-commented overrides, but I think we should actually enable them by default if the compiler supports them. Thus I propose that we create an OVERRIDE (or just "override") macro which is "override" on supporting compilers and the empty string on other compilers. Alternatively it could simply be off by default and the user could turn it on if they wanted to.

Thoughts?

tpie::stack is not io efficient.

Pushing B+1 items to the stack, and then n times popping 2 items and pushing 2 items, performs n I/Os. I would expect constant or atmost n/B I/Os

Use std::align or thread_local for parallel buffers when available

Currently, the code that aligns thread-specific storage to a 64-byte boundary (common cache line size) is hand-rolled. Ideally, we should use the compiler-provided std::align (N3242 §20.6.5, p. 525) or the thread_local keyword (§7.1.1, p. 141), if these are supported by the compiler.

Support random-access flag to OS open()

The file_accessor interface should let the application have the underlying file descriptor opened optimized for random access or for sequential access as needed.

file_stream changes.

I propose two changes to the file_stream class.

  1. Make file_stream allocate its memory on open, and free it on close.
  2. Make file_stream copy constructable unless, it is open.

make clean does not clean enough

This is something that has annoyed me for a while, and at some point I'm going to investigate further. Steps to reproduce:

mkdir tpietest; cd tpietest; git init
git fetch git://github.com/thomasmoelhave/tpie.git filestream
git reset --hard FETCH_HEAD
mkdir build; cd build; cmake -D CMAKE_BUILD_TYPE:STRING=Release ..
make -j9; ctest

See tests pass

Modify magicConst in tpie/stream_header.h

make -j9; ctest

See tests fail

make clean; make -j9; ctest

Tests still fail

Pipelining: Support virtual chunks as temporary variables

Right now, if you use a virtual_chunk temporary in a pipeline creation expression, the temporary is destroyed after the expression is evaluated, and the pipe segments within it are destroyed with it. We should have the pipeline type hold a smartptr reference to the virtual_chunk instances so they are not destroyed until the pipeline holder is.

Get rid of array::m_tss_used

We should get rid of the m_tss_used flag in array, as it seems like an unnecessary complexity. I have pushed commit 566e5a5 to a temporary branch. Is this the fix we want to use?

Pipelining: Phase boundary pipe_segments are tedious to write

While I was writing some high-level documentation for pipelining, I realized something terrible. Implementing the improper reverser (as a single pipe_segment that does not cut the graph into two phases) is very easy, but implementing it properly takes much more code than seems necessary.

Improper way:

template <typename dest_t>
class reverser_type : public tpie::pipelining::pipe_segment {
    tpie::stack<point3d> points;
    dest_t dest;
public:
    typedef point3d item_type;

    void push(point3d p) {
        points.push(p);
        set_name("Reverser",
                 tpie::pipelining::PRIORITY_SIGNIFICANT);
    }

    void end() {
        pipe_segment::end();
        while (!points.empty()) {
            dest.push(points.pop());
        }
    }
};

Proper way:

template <typename dest_t>
class reverser_type_1 : public tpie::pipelining::pipe_segment {
    tpie::temp_file tmpfile;
    tpie::auto_ptr<tpie::stack<point3d> > points;
    dest_t dest;
public:
    typedef point3d item_type;

    reverser_type_1(dest_t dest) {
        dest.set_first(*this);
    }

    void begin() {
        pipe_segment::begin();
        tmpfile.set_persistent(true);
        points.reset(new tpie::stack<point3d>(tmpfile));
        forward<std::string>("stack", tmpfile.path());
    }

    void push(point3d p) {
        points->write(p);
    }

    void end() {
        pipe_segment::end();
        points.reset();
    }
};

template <typename dest_t>
class reverser_type_2 : public tpie::pipelining::pipe_segment {
    tpie::temp_file tmpfile;
    tpie::auto_ptr<tpie::stack<point3d> > points;
    dest_t dest;
public:
    reverser_type_2(dest_t dest) {
        add_push_destination(dest);
    }

    void set_first(pipe_segment & first) {
        add_dependency(first);
    }

    void begin() {
        pipe_segment::begin();
        tmpfile.set_path(fetch<std::string>("stack"));
        points.reset(new tpie::stack<point3d>(tmpfile));
    }

    void go(progress_indicator_base & pi) {
        pi.init(points->size());
        while (!points->empty()) {
            dest.push(points->pop());
            pi.step();
        }
        pi.done();
    }

    void end() {
        pipe_segment::end();
        points.reset();
        tmpfile.free();
    }
};

Should it be possible to implement it properly in a way like the following?:

template <typename dest_t>
class reverser_type : public tpie::pipelining::buffering_pipe_segment { // note made-up class name
    tpie::stack<point3d> points;
    dest_t dest;
public:
    typedef point3d item_type;

    void begin_1() {
        buffering_pipe_segment::begin_1();
    }

    void push(point3d p) {
        // alternatively implement go_1
        points.push(p);
        set_name("Reverser",
                 tpie::pipelining::PRIORITY_SIGNIFICANT);
    }

    void end_1() {
        buffering_pipe_segment::end_1();
    }

    void evacuate() {
        // handle...
    }

    void begin_2() {
        buffering_pipe_segment::begin_2();
    }

    void go_2() {
        // alternatively implement pull and can_pull
        while (!points.empty()) {
            dest.push(points.pop());
        }
    }

    void end_2() {
        buffering_pipe_segment::end_2();
    }
};

Thoughts?

Get rid of TPIE_OS_OFFSET

We currently use TPIE_OS_OFFSET and friends troughout the api. These should be replaced by an appropriate new type.

For instance in progress_indicators the init method thakes a TPIE_OS_OFFSET while it really should take a stream_size_type

TPIE memory management should be thread-safe even in debug mode

Currently, the memory manager in debug builds has a boost::unordered_map from pointers to pointer metadata. This is used to make sure that pointers are newed and deleted as the same type. However, this map is not protected from concurrent access, so we get all sorts of nasty crashes in debug builds.

Pipelining: Support parallelization of pipeline portions

It would be cool for the pipelining framework to support parallelization of operations.

The expression pipeline p = x | parallel(a | b | c) | y should instantiate an appropriate number of a | b | c pipes and run them in separate threads. The framework should maintaining the order of elements as well as disregarding the order, and it should keep an appropriately sized internal buffer to amortize the cost of sending items to different threads.

#include< > paths not updated

Hi Thomas,
when I tried to run sample programs, the compiler gave errors about include paths. The #include<> paths in all the header files and other files are as per the older directory structure of TPIE. Did any one update them or yet they have to be updated? Can give the link for updated files if they exist.

unable to build

root@linux:~/Desktop/Tpie_build# make
Scanning dependencies of target tpie
[ 1%] Building CXX object tpie/CMakeFiles/tpie.dir/bte/stream_base.cpp.o
[ 3%] Building CXX object tpie/CMakeFiles/tpie.dir/key.cpp.o
[ 4%] Building CXX object tpie/CMakeFiles/tpie.dir/backtrace.cpp.o
[ 6%] Building CXX object tpie/CMakeFiles/tpie.dir/cpu_timer.cpp.o
[ 7%] Building CXX object tpie/CMakeFiles/tpie.dir/logstream.cpp.o
[ 9%] Building CXX object tpie/CMakeFiles/tpie.dir/mm_base.cpp.o
[ 11%] Building CXX object tpie/CMakeFiles/tpie.dir/mm_manager.cpp.o
[ 12%] Building CXX object tpie/CMakeFiles/tpie.dir/portability.cpp.o
[ 14%] Building CXX object tpie/CMakeFiles/tpie.dir/tpie_log.cpp.o
[ 15%] Building CXX object tpie/CMakeFiles/tpie.dir/tempname.cpp.o
[ 17%] Building CXX object tpie/CMakeFiles/tpie.dir/prime.cpp.o
[ 19%] Building CXX object tpie/CMakeFiles/tpie.dir/progress_indicator_subindicator.cpp.o
[ 20%] Building CXX object tpie/CMakeFiles/tpie.dir/progress_indicator_base.cpp.o
[ 22%] Building CXX object tpie/CMakeFiles/tpie.dir/execution_time_predictor.cpp.o
[ 23%] Building CXX object tpie/CMakeFiles/tpie.dir/fractional_progress.cpp.o
[ 25%] Building CXX object tpie/CMakeFiles/tpie.dir/tpie.cpp.o
Linking CXX static library libtpie.a
[ 25%] Built target tpie
Scanning dependencies of target test_scan_common
[ 26%] Building CXX object test/CMakeFiles/test_scan_common.dir/scan_count.cpp.o
[ 28%] Building CXX object test/CMakeFiles/test_scan_common.dir/scan_random.cpp.o
Linking CXX static library libtest_scan_common.a
[ 28%] Built target test_scan_common
Scanning dependencies of target test_common
[ 30%] Building CXX object test/CMakeFiles/test_common.dir/app_config.cpp.o
[ 31%] Building CXX object test/CMakeFiles/test_common.dir/getopts.cpp.o
[ 33%] Building CXX object test/CMakeFiles/test_common.dir/parse_args.cpp.o
Linking CXX static library libtest_common.a
[ 33%] Built target test_common
Scanning dependencies of target test_ami_arith
[ 34%] Building CXX object test/CMakeFiles/test_ami_arith.dir/test_ami_arith.cpp.o
Linking CXX executable test_ami_arith
../tpie/libtpie.a(tempname.cpp.o): In function __static_initialization_and_destruction_0(int, int)': tempname.cpp:(.text+0x2471): undefined reference toboost::system::generic_category()'
tempname.cpp:(.text+0x247b): undefined reference to boost::system::generic_category()' tempname.cpp:(.text+0x2485): undefined reference toboost::system::system_category()'
../tpie/libtpie.a(tempname.cpp.o): In function boost::filesystem3::path::codecvt()': tempname.cpp:(.text._ZN5boost11filesystem34path7codecvtEv[boost::filesystem3::path::codecvt()]+0x7): undefined reference toboost::filesystem3::path::wchar_t_codecvt_facet()'
../tpie/libtpie.a(tempname.cpp.o): In function boost::filesystem3::exists(boost::filesystem3::path const&)': tempname.cpp:(.text._ZN5boost11filesystem36existsERKNS0_4pathE[boost::filesystem3::exists(boost::filesystem3::path const&)]+0x1c): undefined reference toboost::filesystem3::detail::status(boost::filesystem3::path const&, boost::system::error_code_)'
../tpie/libtpie.a(tempname.cpp.o): In function boost::filesystem3::is_directory(boost::filesystem3::path const&)': tempname.cpp:(.text._ZN5boost11filesystem312is_directoryERKNS0_4pathE[boost::filesystem3::is_directory(boost::filesystem3::path const&)]+0x1c): undefined reference toboost::filesystem3::detail::status(boost::filesystem3::path const&, boost::system::error_code_)'
../tpie/libtpie.a(tempname.cpp.o): In function boost::filesystem3::create_directory(boost::filesystem3::path const&)': tempname.cpp:(.text._ZN5boost11filesystem316create_directoryERKNS0_4pathE[boost::filesystem3::create_directory(boost::filesystem3::path const&)]+0x15): undefined reference toboost::filesystem3::detail::create_directory(boost::filesystem3::path const&, boost::system::error_code_)'
collect2: ld returned 1 exit status
make[2]: *_* [test/test_ami_arith] Error 1
make[1]: *** [test/CMakeFiles/test_ami_arith.dir/all] Error 2
make: *** [all] Error 2

item_type must be trivially copyable in parallel

Currently, we implicitly require, but do not check, that item_type in parallel is trivially copyable. This check should be explicit, or the code should be extended to allow non-trivially copyable types.

Seek regression in master.

The old file accessor crtp keep track of where we where currently seek to in a file, to minimize the number of seek operations, this seems to have dissapeared in the new interface. Why?

Unit test defficiencies

I think verbose should be implied unless --all is specified

also the output of multitest is not propperly aligned, and the test name is repeted , see:

bounded_cast ........................................................... [    ]
  Testing if -2 casts to 0 .............................................. [ ok ]
  Testing if 257 casts to 255 ........................................... [ ok ]
  Testing if -1 casts to 0 .............................................. [ ok ]
  Testing if 18446744073709551615 casts to 9223372036854775807 .......... [ ok ]
bounded_cast ........................................................... [ ok ]
numeric_cast ........................................................... [    ]
  Testing that 1.11 casts to 1.11 ....................................... [ ok ]
  Testing that -1.6 does not cast ....................................... [ ok ]
  Testing that -2 does not cast ......................................... [ ok ]
  Testing that -2 casts to -2 ........................................... [ ok ]
  Testing that 256 does not cast ........................................ [ ok ]
  Testing that 0 casts to 0 ............................................. [ ok ]
  Testing that 255 casts to 255 ......................................... [ ok ]
numeric_cast ........................................................... [ ok ]
string_to_double ....................................................... [    ]
  Testing that 111.22 casts to 111.22 ................................... [ ok ]
  Testing that -1,111,23.22 casts to -111123 ............................ [ ok ]
  Testing that -1.111.23,22 casts to -111123 ............................ [ ok ]
string_to_double ....................................................... [ ok ]

parallel_sort is slower than std::sort on adversarial inputs on bbq

On bbq (dual-core Windows running a 32-bit build), the ut-parallel_sort unit tests bad_case and equal_elements fail even when the std::sort threshold is set to the default.

std::sort performs especially well when the input is formed like in the bad_case and equal_elements tests. In the former test case, all but two elements are equal. In the latter, all but 8 elements are equal.

Currently, our parallel sort just partitions and recurses no matter what. What should we do (if anything) to make parallel sort as fast as std::sort in these cases?

Just to be clear, on random inputs, std::sort performs around 30 times slower than the adversarial inputs in the unit tests, and parallel sort is faster than std::sort in this case.

maintainOrder should be an enum

This is a minor item, but I think that it would be better if maintainOrder in the parallel constructor was an enum and not a boolean, that makes things a bit more clear. This applies to most other places where constructors take a boolean parameter (I don't know if there are that many though)

Thoughts?

go() is called on a non-initiator node when there's no initiator in a phase

I hoped I could code my own initiator outside of the framework and just call begin(), pull() and end() on a pull node, but that was of course not going to work. Instead of telling me there's a phase without an initiator (namely the one containing the output node of a pull-sorter), it apparently tried to call go() on the pull sorter output.

Feature request: Serialization stream

We should have a new serialization framework that supports some kind of file format. I am currently implementing new serialization and a serialization_stream in the serialization branch.

It should support pipelining, stream reversing, and sorting.

ut-parallel_sort bad_case fails

Currently, ut-parallel_sort bad_case runs with a std::sort threshold of 42 elements (that is, qsort worker jobs will be spawned until the number of elements is less than the threshold, at which point std::sort is used instead).

The 'bad_case' test makes sure that the parallel_sort is at most three times slower than std::sort at this threshold.

However, the test currently fails and as far as I can tell it has always failed.

Our default std::sort threshold is 2^23 divided by item size, i.e. 8 MB data.

It seems the least threshold that will make the bad_case test pass is around 400 elements (which I found out from a bit of editor macro aided binary search).

Does the bad_case test even make sense? Should we up the threshold or simply get rid of the test?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.