Git Product home page Git Product logo

pam's Introduction

PAM

PAM (Parallel Augmented Maps) is a parallel C++ library implementing the interface for sequence, ordered sets, ordered maps and augmented maps [2]. It uses the underlying balanced binary tree structure using join-based algorithms [1,2]. PAM is highly-parallelized, safe for concurrency, theoretically work-efficient, and supporting efficient GC. PAM supports four balancing schemes including AVL trees, red-black trees, treaps and weight-balanced trees (default).

PAM also includes another implementation of augmented maps, which is the prefix structure [3]. It is used for parallel sweepline algorithms for computational geometry problems.

This repository also contains examples of using PAM for a variety of applications, including range sum, 1D stabbing queries, 2D range queries, 2D segment queries, 2D rectangle queries, inverted index searching, and implementing an HTAP database benchmark combining TPC-H queries and TPC-C transactions [5]. Detailed descriptions of the algorithms and applications can be found in our previous papers [1,2,3,4] and the tutorial in PPoPP 2019 [6] and SPAA 2020 [7].

PAM uses the ParlayLib library (https://github.com/cmuparlay/parlaylib) as subroutines for parallel operations, including sorting, scan, the scheduler, etc.

PAM works with CMake. To use and test PAM, create a build directory (any name is fine) in the PAM root directory and go to that build directory. If you already have ParlayLib installed, you can configure cmake by running cmake ... If not, you can use -DDOWNLOAD_PARLAY=True to let PAM look for Parlay and intall a local copy in PAM.

To access the PAM libaray, you can use:

git clone [email protected]:cmuparlay/PAM.git

Interface:

To use PAM, you can simply include the header file pam.h.

To define an augmented map using PAM, user need to specify the parameters including type names and (static) functions in the entry structure ``entry''.

  • typename key_t: the key type (K),
  • function comp: K x K -> bool: the comparison function on K (<_K)
  • typename val_t: the value type (V),
  • typename aug_t: the augmented value type (A),
  • function from_entry: K x V -> A: the base function (g)
  • function combine: A x A -> A: the combine function (f)
  • function get_empty: empty -> A: the identity of f (I)

Then an augmented map is defined with C++ template as

augmented_map<entry>.

Here is an example of defining an augmented map "m" that has integer keys and values and is augmented with value sums (similar as the augmented sum example in our paper [1]):

struct entry {
  using key_t = int;
  using val_t = int;
  using aug_t = int;
  static bool comp(key_t a, key_t b) { 
    return a < b;}
  static aug_t get_empty() { return 0;}
  static aug_t from_entry(key_t k, val_t v) { 
    return v;}
  static aug_t combine(aug_t a, aug_t b) { 
    return a+b;}
};
augmented_map<entry> m;

Note that a plain ordered map is defined as an augmented map with no augmentation (i.e., it only has K, <_K and V in its entry) and a plain ordered set is similarly defined as an augmented map with no augmentation and no value type.

pam_map<entry>
pam_set<entry>

To declare a simple key-value map, here's a quick example:

struct entry {
  using key_t = int;
  using val_t = int;
  static bool comp(key_t a, key_t b) { 
    return a < b;}
};
pam_map<entry> m;

Another example can be found in [2], which shows how to implement an interval tree using the PAM interface.

Keys, values and augmented values and be of any arbitrary types, even an another augmented_map (tree) structure.

Hardware dependencies

Any modern (2010+) x86-based multicore machines. Relies on 128-bit CMPXCHG (requires -mcx16 compiler flag) but does not need hardware transactional memory (TSX).

Software dependencies

PAM requires g++ 5.4.0 or later versions. The default scheduler is the Parlay scheduler, but it also supports using Cilk Plus extensions or OpenMP (and of course just sequentially using g++). You can specify one by setting the flags when compiling.

-DPARLAY_CILK
-DPARLAY_OPENMP
-DPARLAY_TBB

We recommend to use "numactl" for better performance when using more than one processor. All tests can also run directly without "numactl".

Applications:

In the sub-directories, we show code for a number of applications. They are all concise based on PAM. All tests can be run in parallel, or sequentially by directly setting the number of threads to be 1.

In each of the directories there is a separated readme file to run the timings for the corresponding application.

General tests and range sum

In directory tests/. More details about the algorithms and results can be found in our paper [2]. The tests include commonly-used functions on sequences, ordered sets, ordered maps and augmented maps.

1D stabbing query

In directory examples/interval/. More details about the algorithm and results can be found in our paper [2]. Our version implements a interval tree structure for stabbing queries.

2D range query

In directory examples/range_query/. More details about the algorithm and results can be found in our paper [2,3]. We implemented both range tree structures and sweeplines algorithm for 2D range queries.

2D segment query

In directory examples/segment/. More details about the algorithm and results can be found in our paper [3]. We implemented both segment tree structures and sweepline algorithms for 2D segment queries.

2D rectangle query

In directory example/rectangle/. More details about the algorithm and results can be found in our paper [2,3]. We implemented both tree structures and sweepline algorithms for 2D rectangle queries.

Inverted index searching

In directory example/index/. More details about the algorithm and results can be found in our paper [2]. Our code builds an inverted index of a dataset of wikipedia documents.

Implementing an HTAP database system: TPC benchmark testing [5]

In directory examples/tpch/. Our code builds indexes for TPC-H workload, which supports all 22 TPC-H analytical queries, and 3 types of TPC-C-style update transactions.

Reference

[1] Guy E. Blelloch, Daniel Ferizovic, and Yihan Sun. Parallel Ordered Sets Using Just Join. SPAA 2016 (also, arXiv:arXiv:1602.02120.

[2] Yihan Sun, Daniel Ferizovic, and Guy E. Blelloch. PAM: Parallel Augmented Maps. PPoPP 2018.

[3] Yihan Sun and Guy Blelloch. Parallel Range, Segment and Rectangle Queries with Augmented Maps. ALENEX 2019 (also, arXiv:1803.08621).

[4] Naama Ben-David, Guy E. Blelloch, Yihan Sun and Yuanhao Wei. Multiversion Concurrency with Bounded Delay and Precise Garbage Collection. SPAA 2019 (also, arXiv:1803.08617).

[5] TPC benchmarks. http://www.tpc.org/

[6] Yihan Sun and Guy Blelloch. 2019. Implementing parallel and concurrent tree structures. Tutorial in PPoPP 2019.

[7] Yihan Sun. 2020. Implementing Parallel Tree Structure in Shared-Memory. Tutorial in SPAA 2020.

pam's People

Contributors

anish-krishnan avatar bouncner avatar gblelloch avatar jmftrindade avatar syhlalala avatar xindubawukong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pam's Issues

aug_seq

An augmented sequence might be useful to add:

	template <class entry>
	struct aug_seq_full_entry : entry {
		using val_t = bool; // not used
		using key_t = typename entry::key_t;
		using aug_t = typename entry::aug_t;
		using entry_t = key_t;
		static bool comp(const key_t& a, const key_t& b) { return true; }
		static inline key_t get_key(const entry_t& e) { return e; }
		static inline val_t get_val(const entry_t& e) { return 0; }
		static inline void set_val(entry_t& e, const val_t& v) {}
	};

	template<class _Entry, class Balance=weight_balanced_tree>
	using aug_seq =
		aug_map_< aug_set_full_entry<_Entry>,
			typename Balance::template
			balance<aug_node<typename Balance::data, aug_set_full_entry<_Entry>>>>;

	void test_aug_seq()
	{
		struct piece_entry 
		{
			using key_t = string_view;
			using aug_t = size_t;
			static aug_t get_empty() { return 0; }
			static aug_t from_entry(key_t k) { return k.size(); }
			static aug_t combine(aug_t a, aug_t b) { return a + b; }
		};
		using piece_seq = aug_seq<piece_entry>;

		using strv = piece_seq; //pam_seq<string_view>;

		strv x1;
		strv v1 = strv(string_view("hello, world"));
		strv v2 = strv(string_view("; wassup"));
		strv v3 = strv::join2(v1, v2);

		strv::foreach_seq(v3, [&](auto x) {
			cout << "- " << x << "\n";
			});
		cout << "\n total=" << v3.aug_val() << "\n";
	}

Compiler errors on GCC 10.1

Configuration: x86-64, CentOS 7

-- --------------- General configuration -------------
-- CMake Generator:                Ninja
-- Compiler:                       GNU 10.1.0
-- Build type:                     Release

I'm seeing 3 error(s):

/home/praja002/OpenSource/PAM/examples/tpch/queries/Q10.h:25:43: **error:** ‘group_by_and_combine’ is not a member of ‘parlay::internal’
   25 |   sequence<kf_pair> r = parlay::internal::group_by_and_combine(elts, parlay::addm<float>());
      |                                           ^~~~~~~~~~~~~~~~~~~~
In file included from /home/praja002/OpenSource/PAM/examples/tpch/queries/queries.h:25,
                 from /home/praja002/OpenSource/PAM/examples/tpch/test.cpp:26:
/home/praja002/OpenSource/PAM/examples/tpch/queries/Q15.h: In function ‘Q15_rtype Q15(maps, const char*, const char*)’:
/home/praja002/OpenSource/PAM/examples/tpch/queries/Q15.h:30:21: **error:** no matching function for call to ‘collect_reduce(parlay::sequence<std::pair<unsigned int, float> >&, Q15(maps, const char*, const char*)::<lambda(const sr_type&)>&, Q15(maps, const char*, const char*)::<lambda(const sr_type&)>&, parlay::addm<double>, int&)’
   30 |          max_supp_id);
      |                     ^
In file included from /usr/local/include/parlay/internal/group_by.h:17,
                 from /usr/local/include/parlay/primitives.h:897,
                 from /home/praja002/OpenSource/PAM/examples/tpch/lineitem.h:1,
                 from /home/praja002/OpenSource/PAM/examples/tpch/test.cpp:5:
/usr/local/include/parlay/internal/collect_reduce.h:162:6: note: candidate: ‘template<class Seq, class Helper> auto parlay::internal::collect_reduce(const Seq&, const Helper&, size_t)’
  162 | auto collect_reduce(Seq const &A, Helper const &helper, size_t num_buckets) {
      |      ^~~~~~~~~~~~~~
/usr/local/include/parlay/internal/collect_reduce.h:162:6: note:   template argument deduction/substitution failed:
In file included from /home/praja002/OpenSource/PAM/examples/tpch/queries/queries.h:25,
                 from /home/praja002/OpenSource/PAM/examples/tpch/test.cpp:26:
/home/praja002/OpenSource/PAM/examples/tpch/queries/Q15.h:30:21: note:   candidate expects 3 arguments, 5 provided
   30 |          max_supp_id);
      |                     ^
In file included from /home/praja002/OpenSource/PAM/examples/tpch/queries/queries.h:32,
                 from /home/praja002/OpenSource/PAM/examples/tpch/test.cpp:26:
/home/praja002/OpenSource/PAM/examples/tpch/queries/Q22.h: In function ‘Q22_rtype Q22(maps, bool, uchar*)’:
/home/praja002/OpenSource/PAM/examples/tpch/queries/Q22.h:58:7: **error:** no matching function for call to ‘collect_reduce(parlay::sequence<std::tuple<float, unsigned char, bool> >&, Q22(maps, bool, uchar*)::<lambda(a_type)>&, Q22(maps, bool, uchar*)::<lambda(a_type)>&, parlay::monoid<parlay::pair_monoid<parlay::addm<int>, parlay::addm<double> >::<lambda(P, P)>, std::pair<int, double> >, int)’
   58 |     32);
      |       ^
In file included from /usr/local/include/parlay/internal/group_by.h:17,
                 from /usr/local/include/parlay/primitives.h:897,
                 from /home/praja002/OpenSource/PAM/examples/tpch/lineitem.h:1,
                 from /home/praja002/OpenSource/PAM/examples/tpch/test.cpp:5:
/usr/local/include/parlay/internal/collect_reduce.h:162:6: note: candidate: ‘template<class Seq, class Helper> auto parlay::internal::collect_reduce(const Seq&, const Helper&, size_t)’
  162 | auto collect_reduce(Seq const &A, Helper const &helper, size_t num_buckets) {
      |      ^~~~~~~~~~~~~~
/usr/local/include/parlay/internal/collect_reduce.h:162:6: note:   template argument deduction/substitution failed:
In file included from /home/praja002/OpenSource/PAM/examples/tpch/queries/queries.h:32,
                 from /home/praja002/OpenSource/PAM/examples/tpch/test.cpp:26:
/home/praja002/OpenSource/PAM/examples/tpch/queries/Q22.h:58:7: note:   candidate expects 3 arguments, 5 provided
   58 |     32);

About YCSB benchmark

Thanks for your great work, we all like it!

I don't find codes related to YCSB. Is it in this repository?

If not, can you please provide the code ?

win32 compatibility

Are you interested in pull requests? I've a few patches that let me compile with visual studio, but I wasn't sure if you would be interested in supporting this as a library or if it sits as is to represent your dissertation work. Fabulous dissertation by the way! All the best with your new job.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.