Git Product home page Git Product logo

hipipe's Introduction

HiPipe

CircleCI MIT license Development Status Master Developer

HiPipe is a C++ library for efficient data processing. Its main purpose is to simplify and accelerate data preparation for deep learning models, but it is generic enough to be used in many other areas.

HiPipe lets the programmer build intuitive data streams that transform, combine and filter the data that pass through. Those streams are compiled, batched, and asynchronous, therefore maximizing the utilization of the provided hardware.

Example

std::vector<std::string> logins = {"marry", "ted", "anna", "josh"};
std::vector<int>           ages = {     24,    41,     16,     59};

auto stream = ranges::views::zip(logins, ages)

  // create a batched stream out of the raw data
  | hipipe::create<login, age>(2)

  // make everyone older by one year
  | hipipe::transform(from<age>, to<age>, [](int a) { return a + 1; })

  // increase each letter in the logins by one (i.e., a->b, e->f ...)
  | hipipe::transform(from<login>, to<login>, [](char c) { return c + 1; }, dim<2>)

  // increase the ages by the length of the login
  | hipipe::transform(from<login, age>, to<age>, [](std::string l, int a) {
        return a + l.length();
    })

  // probabilistically rename 50% of the people to "buzz"
  | hipipe::transform(from<login>, to<login>, 0.5, [](std::string) -> std::string {
        return "buzz";
    })

  // drop the login column from the stream
  | hipipe::drop<login>

  // introduce the login column back to the stream
  | hipipe::transform(from<age>, to<login>, [](int a) {
        return "person_" + std::to_string(a) + "_years_old";
    })

  // filter only people older than 30 years
  | hipipe::filter(from<login, age>, by<age>, [](int a) { return a > 30; })

  // asynchronously buffer the stream during iteration
  | hipipe::buffer(2);

// extract the ages from the stream to std::vector
ages = hipipe::unpack(stream, from<age>);
assert((ages == std::vector<int>{45, 64}));

hipipe's People

Contributors

floopcz avatar blazekadam avatar petrbel avatar bedapisl avatar

Stargazers

 avatar Basel Ajarmah avatar Pavel Galushin avatar  avatar  avatar Dominik Jurko avatar haison avatar Jay Pratt avatar Sebastian Martin Dicke avatar H.-K.Sun avatar  avatar Rameez Remsudeen avatar Jan Zenisek avatar  avatar Daniele Gravina avatar  avatar  avatar

Watchers

Jan Buchar avatar  avatar James Cloos avatar  avatar  avatar Tomáš Čapek avatar Barbora Blažek avatar Jan Zenisek avatar

hipipe's Issues

Compile error when using dataframe::rows()

I am not able to compile this example:

#include <hipipe/core.hpp>

int main() {
    auto df = hipipe::read_csv("aaa.csv");
    df.rows({"aaa", "bbb"});
}

It gives some ranges V3 error:

g++ main.cpp -I /usr/local/include -I /usr/include/python3.7m -I /lib/python3.7/site-packages/numpy/core/include -I /usr/include/opencv4 -std=c++17
In file included from /usr/include/range/v3/view/transform.hpp:20,
                 from /usr/local/include/hipipe/core/index_mapper.hpp:15,
                 from /usr/local/include/hipipe/core/dataframe.hpp:14,
                 from /usr/local/include/hipipe/core/csv.hpp:14,
                 from /usr/local/include/hipipe/core.hpp:15,
                 from main.cpp:1:
/usr/include/meta/meta.hpp: In substitution of ‘template<class T> using _t = typename T::type [with T = ranges::v3::common_type<>]’:
/usr/include/range/v3/range_fwd.hpp:74:59:   required by substitution of ‘template<class ... Ts> using common_type_t = meta::v1::_t<ranges::v3::common_type<Ts ...> > [with Ts = {}]’
/usr/include/range/v3/view/zip_with.hpp:153:85:   required from ‘struct ranges::v3::iter_zip_with_view<ranges::v3::detail::indirect_zip_fn_>’
/usr/include/range/v3/view/zip.hpp:107:16:   required from ‘struct ranges::v3::zip_view<>’
/usr/include/c++/8.2.1/type_traits:2657:42:   required from ‘constexpr bool std::__call_is_nt(std::__invoke_other) [with _Fn = const ranges::v3::view::zip_fn&; _Args = {}]’
/usr/include/c++/8.2.1/type_traits:2663:34:   required by substitution of ‘template<bool __v> using __bool_constant = std::integral_constant<bool, __v> [with bool __v = std::__call_is_nt<const ranges::v3::view::zip_fn&>((std::__result_of_success<ranges::v3::zip_view<>, std::__invoke_other>::__invoke_type{}, std::__result_of_success<ranges::v3::zip_view<>, std::__invoke_other>::__invoke_type()))]’
/usr/include/c++/8.2.1/type_traits:2661:12:   [ skipping 3 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
/usr/include/c++/8.2.1/bits/invoke.h:89:5:   required from ‘constexpr typename std::__invoke_result<_Functor, _ArgTypes>::type std::__invoke(_Callable&&, _Args&& ...) [with _Callable = const ranges::v3::view::zip_fn&; _Args = {}; typename std::__invoke_result<_Functor, _ArgTypes>::type = ranges::v3::zip_view<>]’
/usr/include/c++/8.2.1/tuple:1678:27:   required from ‘constexpr decltype(auto) std::__apply_impl(_Fn&&, _Tuple&&, std::index_sequence<_Idx ...>) [with _Fn = const ranges::v3::view::zip_fn&; _Tuple = std::tuple<>; long unsigned int ..._Idx = {}; std::index_sequence<_Idx ...> = std::integer_sequence<long unsigned int>]’
/usr/include/c++/8.2.1/tuple:1687:31:   required from ‘constexpr decltype(auto) std::apply(_Fn&&, _Tuple&&) [with _Fn = const ranges::v3::view::zip_fn&; _Tuple = std::tuple<>]’
/usr/local/include/hipipe/core/dataframe.hpp:658:26:   required from ‘auto hipipe::dataframe::irows(std::vector<long unsigned int>, std::tuple<std::function<Ts(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)>...>) const [with Ts = {}]’
/usr/local/include/hipipe/core/dataframe.hpp:681:74:   required from ‘auto hipipe::dataframe::rows(const std::vector<std::__cxx11::basic_string<char> >&, std::tuple<std::function<Ts(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)>...>) const [with Ts = {}]’
main.cpp:6:27:   required from here
/usr/include/meta/meta.hpp:140:36: error: no type named ‘type’ in ‘struct ranges::v3::common_type<>’
         using _t = typename T::type;
...
  • Compilation is done on Kyle.
  • When changing rows to cols in the example, I dont get this error.
  • Similar error happens when compiling our dataset with dataframe::rows().

Add to_python converter for type cv::Point_

CV points should be naturally convertible to python tuples. At the moment, one would receive the following error:

TypeError: No to_python (by-value) converter found for C++ type: cv::Point_<float> 

OpenCV converters incompatible with OpenCV 4.0

The compiler produces the following error:

hipipe_core.dir/src/core/python/stream/converter.cpp.o
[ 60%] Building CXX object CMakeFiles/hipipe_core.dir/src/core/python/utility/pyboost_cv_mat_converter.cpp.o
[ 70%] Building CXX object CMakeFiles/hipipe_core.dir/src/core/python/utility/pyboost_cv_point_converter.cpp.o
/build/hipipe/src/core/python/utility/pyboost_cv_mat_converter.cpp: In member function 'cv::UMatData* hipipe::python::utility::NumpyAllocator::allocate(int, const int*, int, void*, size_t*, int, cv::UMatUsageFlags) const':                                                
/build/hipipe/src/core/python/utility/pyboost_cv_mat_converter.cpp:92:75: error: invalid conversion from 'int' to 'cv::AccessFlag' [-fpermissive]                                                                                                                             
             return stdAllocator->allocate(dims0, sizes, type, data, step, flags, usageFlags);
                                                                           ^~~~~
In file included from /usr/include/opencv4/opencv2/core.hpp:59,
                 from /usr/include/opencv4/opencv2/core/core.hpp:48,
                 from /build/hipipe/include/hipipe/core/python/utility/pyboost_cv_mat_converter.hpp:19,
                 from /build/hipipe/src/core/python/utility/pyboost_cv_mat_converter.cpp:13:
/usr/include/opencv4/opencv2/core/mat.hpp:473:69: note:   initializing argument 6 of 'virtual cv::UMatData* cv::MatAllocator::allocate(int, const int*, int, void*, size_t*, cv::AccessFlag, cv::UMatUsageFlags) const'                                                       
                                void* data, size_t* step, AccessFlag flags, UMatUsageFlags usageFlags) const = 0;
                                                          ~~~~~~~~~~~^~~~~
/build/hipipe/src/core/python/utility/pyboost_cv_mat_converter.cpp: In member function 'bool hipipe::python::utility::NumpyAllocator::allocate(cv::UMatData*, int, cv::UMatUsageFlags) const':                                                                                
/build/hipipe/src/core/python/utility/pyboost_cv_mat_converter.cpp:122:42: error: invalid conversion from 'int' to 'cv::AccessFlag' [-fpermissive]                                                                                                                            
         return stdAllocator->allocate(u, accessFlags, usageFlags);
                                          ^~~~~~~~~~~
In file included from /usr/include/opencv4/opencv2/core.hpp:59,
                 from /usr/include/opencv4/opencv2/core/core.hpp:48,
                 from /build/hipipe/include/hipipe/core/python/utility/pyboost_cv_mat_converter.hpp:19,
                 from /build/hipipe/src/core/python/utility/pyboost_cv_mat_converter.cpp:13:
/usr/include/opencv4/opencv2/core/mat.hpp:474:54: note:   initializing argument 2 of 'virtual bool cv::MatAllocator::allocate(cv::UMatData*, cv::AccessFlag, cv::UMatUsageFlags) const'                                                                                       
     virtual bool allocate(UMatData* data, AccessFlag accessflags, UMatUsageFlags usageFlags) const = 0;
                                           ~~~~~~~~~~~^~~~~~~~~~~
compilation terminated due to -fmax-errors=2.
make[2]: *** [CMakeFiles/hipipe_core.dir/build.make:128: CMakeFiles/hipipe_core.dir/src/core/python/utility/pyboost_cv_mat_converter.cpp.o] Error 1                                                                                                                           
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:105: CMakeFiles/hipipe_core.dir/all] Error 2
make: *** [Makefile:130: all] Error 2

Thank you @petrbel for reporting this.

Add `keep` dropper

I find very annoying to drop every column I temporarly create. It would be a nice feature to specify the columns to keep instead of dropping the useless ones.

E.g. I have columns a,b,c,d,e,f,g and want to keep only a,b:

// current
stream | drop<c,d,e,f,g>

// proposed
stream | keep<a,b>

This would be useful in the end of the streams on in case of reusing some stream parts (no idea what sources are provided)

Graceful compilation errors

I know that we all write a flawless code, however, I have heard that there exist people who make an occasional mistake. :-) Unfortunately, in such a case even recent C++ compilers basically explode with a multi-page error message which requires some training and patience to decode.

If we mean it seriously with efficient C++ streams, cxtream should fail more gracefully so that we don't lose time parsing error messages. This is a list of common scenarios, where a static_assert would significantly improve the coding experience. Please comment on this @blazekadam, @petrbel and @bedapisl add further pitfalls you have encountered.

  • transform, for_each, filter, drop, random_fill, generate, pad, unpack: The selected columns are not present in the given stream.
  • transform, for_each, filter: Function cannot be invoked on the given columns in the given dimension.
  • transform, for_each, filter, random_fill, unpack: Requested dimension is larger than the total number of dimensions.
  • transform: Function return type does not correspond to the given destination columns in the given dimension.
  • random_fill, generate, pad: The generated/provided type cannot be converted to the type of the given column in the given dimension.
  • create: The selected columns cannot be created from the given stream.

Doing this should make us a lot of money without much effort.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.