apavlo / h-store Goto Github PK

H-Store Distributed Main Memory OLTP Database System

License: GNU General Public License v3.0

Python 2.82% Shell 2.13% Java 72.88% Makefile 0.66% TeX 0.19% C++ 14.39% C 6.59% Objective-C 0.01% Protocol Buffer 0.07% HTML 0.05% SQLPL 0.15% Go 0.02% M4 0.04%

dbms database

h-store's People

Contributors

Stargazers

Watchers

Forkers

maxkirsch vjraghavan ambell kanghong genp jdebrabant ezinelony wenfengxu mimosally mwang5 repos-db beehy sdhost apsaltis mettamara zorntc dedcode marcoserafini ufosky malachaifrazier hawkwang mansoure gegao jennierogers harsha-kalvagadda aberg001 rytaft theta1990 atreyee brandy00 frank-wu tuanpv1803 mjgiardino bikeshare-capstone-pdx wingedsandals linmagit lequynhnhu zheguang mziyabo yaochang ninoarsov wlx0419 domino-succ hanumathrao duranhe cw75 zagrebzagreb ipl narayana1208 weihu816 ramakrishna580 mindis mihaatwork afmckinney qqzwzzh hkharryking arthursxl8 anubhavsinha jmeehan16 mshayeb slfs007 vanwaals lionelliang punit89 tempbottle imran1570 xxwwbb3 nan0224 ammonfife slowsoul bitorange shaunglinux sumitgs sungsoo plusql zuhaozou1 evilmcjerkface kilida liming-thu pianyu crowleyzyf frank-24 huanchenz yujinee anuragkh shamim8888 compasshxm nivalsoul jarulraj simonzhangsm caetanosauer prajamani pzz2011 mbrukman mustafaiman vivi2000 teamplusql guntermueller leotcsun gschreiner

h-store's Issues

Two-Phase Commit Optimizations

This project is on optimizing various corner cases in the two-phase commit protocol.

Add the ability for the HStoreCoordinator to not have to wait for a TransactionPrepareResponse from a remote partition that it only executed a read-only query at. We will need to check whether the LocalTransaction is accurately keeping track of whether it has sent only read-only WorkFragment messages to remote partitions. This will also require modifying the TransactionPrepareCallback code to allow it to send out the TransactionFinishRequest before all of the responses arrive. Note that we want to make sure that we don't delete the transaction until all of the TransactionPrepareResponse messages have arrived. I think that this optimization only works in the H-Store model where the execution engines are single-threaded.
Related to the first task, we will want to add the ability to automatically commit a distributed transaction at a remote partition when it receives a TransactionPrepareRequest from the base partition if it only has executed a read-only query at a remote partition. It does not need to wait for a TransactionFinishRequest. We should still send the TransactionFinishRequest because we need to know when we can delete the transaction safely.
Implement the "transfer of coordination" optmization from Berstein's book (Section 8.4). If a transaction modified data at only one partition and that partition is not the base partition, then the TransactionPrepareRequest should indicate that to the remote partition that it's now the "coordinator" for the transaction and it sole decider on whether that txn commits or aborts. It does not need to wait for a TransactionFinishRequest. If optimization #2 is implemented as well, then there is only a single round messages to complete the transaction. This will work for TM1's UpdateLocation transaction.

Students will use H-Store's built-in benchmarks and profiling tools to measure the system before and after the optimizations are implemented. The focus in the benchmarking experiments should just be on distributed transactions.

Distributed Query Optimizations

One of the most important parts of a database is its query planner and optimizer. This is especially true in a distributed database. There are several well-known optimizations that commercial database vendors to improve the performance of distributed queries. In this project, students will explore various techniques for improving system performance using query plan optimizations.

Improve predicate/aggregate push-down optimizations. The PlanOptimizer should determine whether a projection in a query plan can be pushed down to all nodes. For example, consider the following distributed query that retrieves the max salary based on the department from the employees table that is split across multiple partitions
SELECT MAX(salary), department FROM employees GROUP BY department;
Assuming that there are more columns in employees than just the two used in the query, the database will want to execute the projection in parallel at each partition so that only the two columns that are needed are sent back to the initiating node to coalesce the results. Likewise, the MAX() operation can also be executed in parallel so that each partition just sends the minimum amount of data.
Identify parts of a query plan that are better executed in Java. That is, instead of executing a PlanFragment in the ExecutionEngine, the PartitionExecutor will process it directly from within Java. The goal of this is to reduce of executing a distributed query by shortcutting certain opeations. For example, if a transaction executes a distributed UPDATE query, then the output of the top-most PlanFragment (which is executed at the transaction's base partition) is just a single value that is the sum of the single value input tables generated by the PlanFragment for the distributed UPDATE operation. This value represents the number of tuples that were modified at each partition. Thus, the PartitionExecutor can just perform this summation itself, instead of going to the ExecutionEngine. Similarily, the topmost PlanFragment for broadcast SELECT statements will often just combine the output of
- Update the PlanOptimizer to identify which PlanFragments can be executed in Java (this code may need to be somewhere else because the PlanOptimizer won't have the PlanFragments yet) and update the appropriate flags (e..g, "fastaggregate" and "fastcombine").
- Create new classes called CombineExecutor and AggregateExecutor in edu.brown.hstore.executors that implement the desired functionality.
- Add a new boolean configuration option in HStoreConf called "exec_fast_executors" that the PartitionExecutor will check at run time to determine whether to use this new feature.
  ** Update PartitionExecutor's constructor to instantiate the new executor classes.
  ** In PartitionExecutor.dispatchWorkFragments(), add code to check whether "exec_fast_executors" is true and whether a PlanFragment (stored in the WorkFragment) that needs to be executed on the local partition has one of the PlanFragment flags set to true.
Improve support n-way joins. When a query needs to join multiple tables together, it is important that the order in which those tables are joined minimizes both the number of tuples that must be examined and the amount of data that is sent between nodes. This is a difficult problem and database optimizers are often wrong. For example, H-Store generates a query plan for TPC-E's BrokerVolume that does not select the proper join, and causes the DBMS to send entire tables to a single node. This is because query planner does not have access to statistical information about the tables in order to make the best selection.
Instead of improving the statistical analysis of H-Store's query optimizer, students will implement an alternative approach used by MongoDB where the system generates all possible plans for the query and tries them each out at run time to determine which one is the best. To do this, the catalog will need to be extended to support multiple query plans per Statement. At runtime, the system will select a random query plan for the query and keep track of how long it to execute. Once enough samples are collected, the system will then choose the query plan with the lowest run time.
The information gained about query plans can be written into the catalog project jar so that future invocations of the DBMS can use the best query without having to run trials first.

Students will use H-Store's built-in benchmarks and profiling tools to measure the system before and after the query planner optimizations are implemented. Some test cases will be provided to validate that the query plans are correct, but students are strongly encourage to write their own.

Pre-fetched Queries for Distributed Transactions

Update the hstore.proto file to allow us to embed PartitionFragments and serialized ParameterSets.
You will then need to be able to deserialize the ParameterSets (ProtoBuf will deserialize the PartitionFragments for you) and then queue them up at the appropriate partition's ExecutionSite. All of this stuff is already done.

Only start MapReduceHelperThread when needed

Right now the MapReduceHelperThread is always started in HStoreSite.init() if the catalog has a VoltMapReduceProcedure defined in it. Most of our experiments don't need this to happen. The thread should only be started when the first MR request arrives. You can add a new boolean flag mr_helper_started to HStoreSite and then check whether the thread has been started or not inside of HStoreSite.procedureInvocation() when you initialize the MapReduceTransaction. You should also check whether the thread has been started in HStoreSite.shutdown()

This will help reduce the overhead in running the regression suite tests.

Move Benchmark Code into 'src/benchmarks'

It's kind of weird that we have the benchmark code embedded inside of the tests. This was done because that's where the The Hugg had originally put the TPC-C benchmark code. After the end of the semester, we should move the non-utility code and non-testing code from tests/frontend/org/voltdb/benchmarks and tests/frontend/edu/brown/benchmarks into src/benchmarks. We can then move the Test* files into tests/benchmarks and the utility code (e.g., BenchmarkController) into src/frontend

Include latency measurements in BenchmarkResults

We want to start measuring latency in our experiment, so we should start keeping track of it in the BenchmarkController. I think that we can get the latency measures from the ClientResponse.

Add a new field called restartCounter to ClientResponseImpl and add get/set methods to the ClientResponse interface. Be sure to clear the value in the cached ClientResponse in LocalTransaction.finish()
In HStoreSite.sendClientResponse() get the restart counter from LocalTransaction.getRestartCounter() and add the value to the ClientResponse before it gets sent over the wire.
Merge back newer version of Distributer.ProcedureStats and ClientStatsLoader from the latest version of VoltDB. Change the ProcedureStats.update() and NodeConnection.updateStats() methods to pass in the Hstore.Status and the restartCounter from the ClientResponse.
Fix the initialization of the StatsUploaderSettings object in the BenchmarkComponent constructor. We will need to think about whether we want to always require there to be a MySQL installation for us to use.

Replicated Nodes using Zookeeper

This goal of this project is add replicated node support in H-Store using Zookeeper. This will enable H-Store to store copies of partitions on multiple nodes. Students will familiarize themselves with the various of issues of ensuring transactional consistency in a parallel, shared-nothing database.

The first step will be to integrate the embedded version of Zookeeper into HStoreSite. Students will implement a basic fail-over detection algorithm and then initiate new master elections. Support for adding new nodes into the replica pool will be added but recovery from snapshots will be implemented by another group. The catalogs will need to extended to include replica information, and any changes to the catalogs (due to elections) must be propagated to all nodes through the HStoreCoordinator.
Once the Zookeeper infrastructure is in place, the students will then choose a replication scheme to implement (e.g., active replication). See Comparison of Database Replication Techniques Based on Total Order Broadcast.
(Optional) Add support for "delayed" node replication. One of the replicas for a given node will purposely trail behind the master node by a certain amount of time. This will allow the administrator to recover from that node in case the database is corrupted by a programmer error.

Java-based Query Cache

Query caching is a widely-used technique for decreasing the execution time of transactions. It is especially useful in disk-based systems, where the access time to retrieve data is large. But it may not be a useful technique in a main memory database system, since everything is already in memory. This project will explore techniques for using a query cache to improve query performance without paying a large overhead of managing the cache.

Analyze the catalog to pre-compute which queries can be cached and which queries invalidate cached items. Each Statement catalog object will contain a boolean flag as to whether it allowed to be cached. Start off with simple queries (i.e., single-key look-ups) that return columns that are not modified by any other query. Then add support for identifying which Statements invalidate the cached results for other Statements.
Implement a query cache for the PartitionExecutor. The query cache is essentially a mapping of a Statement invocation key (i.e., the name/id of the Statement and the input parameters (if any) it was executed with) to the VoltTable instance that was generated when that Statement was previously executed. This cache should only contain results for locally-executed single-partition queries. Each cache invocation will be unique per partition, thus no synchronization or locking protections are needed. At run time, the PartitionExecutor will need to quickly check to see whether a particular Statement invocation is cached, and then use those results instead of dispatching the WorkFragments to its execution engine.
Compare performance differences for different eviction strategies for the cache. Examples include least-recently used, most-recently used, and least-frequently used. Allow the system to learn which cached Statements are never used before they are evicted and update the catalog to prevent them from being cached in the future.
Compare performance differences using different query cache sizes.

Students will use H-Store's built-in benchmarks to determine whether their query cache implementation improves system performance.

Add 'Version' Tags to HStoreConf Parameters

It would be nice if we could keep track of when new parameters were added to HStoreConf. This would make it easier to keep track what gets changed for each release.

Remove StoredProcedureInvocation in LocalTransaction

Everything that we could ever need for a transaction can be stored in the deserialized form.

Error on the Documentation Site for H-store

Could someone take a look at the documentation site for H-Store please.
It consistently throws the error: "Error establishing a database connection" on most of the page links.
Please post the correct forum if this is not the right one for such issues.

Restore 'h-store-files' submodule link

Can you add back the link for the h-store-files submodule so that it can be downloaded automatically during a clone? Is there anyway to prevent it from being deleted again by somebody's commit?

Implement Distributed Transaction Profiling

We need to measure the time a transaction spends in the following parts of the code when executing:

Blocked waiting to initialize for execution.
Blocked waiting for TransactionWorkRequests to return from remote machines
Blocking waiting for the TransactionPrepareResponse to come back from each partition.

Likewise, we also need to measure the time that a partition spends

Blocked waiting for the next TransactionWorkRequest to arrive
Blocked waiting for the TransactionPrepareRequest to arrive.

Gossip-style Notifcations

One way that we try to prevent unnecessary network messages between HStoreSites is by having the TransactionQueueManager keep track of the greatest transaction id that it's seen from the remote partitions. If our clocks are skewed, then this will prevent a the site from continually sending TransactionInitRequest messages only to have the transaction get rejected.

The student will add a simple "gossip" notification scheme where one HStoreSite can tell other HStoreSites that it has rejected a distributed transaction. The way this will work is that the TransactionInitRequest gets sent to all of the remote partitions that are needed by new distributed transaction. If the TransactionQueueManager at one of those remote partitions decides to reject the transaction, then it will back the ABORT_REJECT status in its TransactionInitResponse, but then also send an update notification to all the other partitions that the transaction wanted to use. The TransactionQueueManager can get the list of these partitions that it needs to update in original TransactionInitRequest. The idea is that this optimization will allow the TransactionQueueManager to release the next queued distributed transaction in their internal lists before hearing back from the base partition. We should think about maybe embedding this notification inside of an existing message, such as a heart beat.

Students will use H-Store's built-in benchmarks and profiling tools to measure the system's performance before and after the gossip optimizations are implemented. The focus should be on whether this optimization reduces the amount of time that each PartitionExecutor is idle waiting for work from a distributed transaction.

Reorganize Frontend Source Code Directories

It might be too confusing for CS227 students to have the core H-Store files split between org.voltdb, edu.mit, and edu.brown. We need to move everything out of edu.mit (sorry Evan) into edu.brown. We can also rename the ExecutionSite to the PartitionExecutor so that it's more clear what it's suppose to do.

We can also try to remove as much of the VoltDB DTXN architecture as much as possible.

Move Project Jar files into Separate Directory

The root directory of the repository gets cluttered with all of the jar files that get created by the hstore-prepare command and in the test cases. We should move them into a single directory obj/projects.

I've added a new HStoreConf parameter client.jar_dir, but the default value is currently "." and it is only being used by BenchmarkController. We will need to update build.xml to use this option. We may want to also change it to be global.jar_dir

Remove VoltDB Messaging Dependencies in ExecutionSite

The current work queue in the ExecutionSite still uses VoltMessages. This should probably be switch to only stored Protocol Buffer messages.

Ad Hoc Queries

For performance reasons, all stored procedures and queries must be pre-defined in H-Store. The only way to execute a new query is to stop the database and recompile the project catalog. It is useful to allow the user to execute a "one-off" query to perform a simple operation without having to do this recompilation. An "ad hoc" query is executed as a distributed transaction that compiles the query at run time and then executes it as if it was pre-defined in the catalog.

Add VoltDB's AdHoc system procedure. Be sure to add this class to the list of sysprocs that are included in the project jar.
Implement run-time support for generating the query plan and executing it as it was a regular transaction.

Students can test this feature using the hstore-invoke command from the command-line with H-Store's built-in benchmarks.

Eventually Consistent Queries

Although H-Store guarantees that all transactions are ACID, it may be the case that not all components of a transaction need to be executed immediately.

Create a new annotation that can be attached to SQLStmts to denote that they can be executed asynchronously at runtime.
Update the VoltCompiler to check whether each SQLStmt has this flag set, and if so then set the "asynchronous" flag in the Statement catalog object.
At runtime, VoltProcedure will need to check whether a queued SQLStmt is asynchronous, and therefore can be added to the PartitionExecutor's "idle work" queue.

The students will measure the runtime performance of the system for certain workloads, such as TPC-C's NewOrder`, but with and without the asynchronous feature to see the performance difference.

ProjectionPushdownOptimization messes up DistinctPlanNode offset

TPC-C's slev.GetStockCount's DistinctColumn is set to OL_DIST_INFO when it should really be OL_I_ID
This is hopefully an easy fix.

Enhanced SQL Features in Execution Engine

Although the complexity of OLTP queries is limited (compared to OLAP-style applications), there are several basic SQL features that are commonly used in transaction processing applications. For this project, students will add support for new SQL features in the H-Store query planner and run-time execution engine. This project will introduce students to H-Store's iterator-based execution engine.

Implement support for the IN operator. This task will require students to change several components in H-Store. First, the HSQLDB query planner must support the IN operator (if the current version in H-Store doesn't support it, we can contact VolDB developers to see whether they have ported a newer version that does support it). We will then need to extend the VoltProcedure API to support array input parameters. Then in the execution engine, we will need to add support for array input parameters in the AbstractExpression tree and the scan executors.
Students will use H-Store's built-in benchmarks and profiling tools to measure the system before and after the the following enhancements. In the case of the IN operator task, students can demonstrate that a single query using IN is faster than executing either multiple queries with a single predicate or a single query with many OR clauses.
Implement support for parameterized LIMIT clauses. In the current version of H-Store, the value used for LIMIT clauses are hardcoded. This might not be supported in HSQLDB's compiler, so one approach is use a simple SQL re-writing scheme where the parameter placeholder is replaced with hard-coded value that can then be replaced after the query plan is created. This will also require changing the EE's limitexecutor (and any other places where the LimitPlanNode is supported - indexscanexecutor) to support pulling the limit value from the input parameters.

Implement New Client Interfaces

One of the key distinctions of good a DBMS is if it supports a wide-variety of client protocols and interfaces. H-Store only supports Java-based clients that issue transaction requests through a custom (but open-source) protocol based on the VoltDB wire protocol.

For this project, students will extend H-Store support different client interfaces. This will allow applications to issue transaction requests and receive results without using the default Java library. The goal of this project is to familiarize students with asynchronous approaches for accessing databases that are commonly used in large-scale deployments.

Implement a JSON-based interface in HStoreSite. Each node will start an HTTP server that can receive transaction requests. Students may use the VoltDB JSON interface as a reference.
Implement support for the Memcached text protocol. The Memcache operations will be mapped to special StmtProcedures that automatically invoke queries needed to perform the appropriate action in the database. Students will need to backport support for the VARBINARY data type from VoltDB. Again, students may reference the VoltCache implementation.
Optional: Extend the HTTP server used for the JSON interface to provide a web-based terminal for interacting with H-Store. Such a web-site could be extended to include cluster information, and the ability to control nodes.

Switch Deployment Scripts to use Git

Need to fix Fabric deployment scripts, Codespeed uploading, and other things that are based on Subversion to use Git instead.

Will probably also need to convert all of the existing Codespeed results over to Git somehow.

Remove superflous ProjectionPlanNodes in distributed query plans

The current PlanOptimizer makes a bunch of ProjectionPlanNodes that are redundant. These should be removed.

Move BatchPlanner code out of VoltProcedure

Alex needs to be able to invoke a single method to get the query plan for a SQLStmt and ParameterSet pair. I need to think about what is the best way to make this functionality more accessible to other parts of the code. One way is just make a "virtual" VoltProcedure that will then in turn call back to the PartitionExecutor but that seems kind of ghetto. We may want to break out this functionality into a separate class, since the PartitionExecutor is getting kind of big.

Improve Processing Distributed Query Results at Base Partition

We need to remove the need to coalesce distributed query results down in the ExecutionEngine. This could probably be done in Java must faster.

Two things:

Fix the PlanOptimizer to remove redundant projections in distributed query plans. See TM1's GetSubscriberData (which also has an unnecessary projection even do it is a SELECT * query).
Add flags to PlanFragments in the catalog that identify whether they can be processed in Java. One flag will indicate that the results should be aggregated together (INSERT/UPDATE/DELETE). The other flag will indicate that the results just need to be appended to each other. We could also use to avoid having to make a copy of the VoltTable if it came from another partition at the same node.

Add unique primary key to voter benchmark

Currently there is no primary key for the VOTES table. We will want to add a unique “VOTE_ID” column and make that the primary key. We will also add columns for the person’s age and the timestamp. Each client will be given a unique range of vote_ids to pass in at run time.

VoltProcedure debug flag should be turned off

Weird thing happens, which should be dealt with later

] 10:58:45,678 H00-000 DEBUG - Unpexpected error when executing Query1 #562908527765487616/0
[java] java.lang.ArrayIndexOutOfBoundsException: 0
[java] at org.voltdb.VoltProcedure.mispredictDebug(VoltProcedure.java:1600)
[java] at org.voltdb.VoltProcedure.executeQueriesInABatch(VoltProcedure.java:1157)
[java] at org.voltdb.VoltProcedure.voltExecuteSQL(VoltProcedure.java:1047)
[java] at org.voltdb.VoltProcedure.voltExecuteSQL(VoltProcedure.java:1020)
[java] at org.voltdb.VoltProcedure.executeNoJavaProcedure(VoltProcedure.java:779)
[java] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[java] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[java] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[java] at java.lang.reflect.Method.invoke(Method.java:597)
[java] at org.voltdb.VoltProcedure.call(VoltProcedure.java:631)
[java] at edu.brown.hstore.PartitionExecutor.processInitiateTaskMessage(PartitionExecutor.java:1425)
[java] at edu.brown.hstore.PartitionExecutor.run(PartitionExecutor.java:837)
[java] at java.lang.Thread.run(Thread.java:662)

Clean-up H-Store Configuration Files

Rename properties to conf. Rename default.properties to hstore.conf. We might want to also rename it to hstore.conf-sample so that kids don't keep checking it in.
Move log4.properties into conf and rename it to log4j.properties-sample
Create an ant target that automatically copies the sample files without the '-sample' suffix.
Move properties/benchmarks to be just benchmarks in the root directory

Add Support Code for Query Pre-Fetching

I will update the catalog schema so that you can populate each Procedure catalog object with the list of Statements that we will ask to be pre-fetched for each transaction.
I will write some scaffolding code for you to perform the calculation of pre-fetchable queries. This will be a new part of the system where we can do a bunch of optimizations and calculations when we generate the catalog.
I will also write some rudimentary test cases. You will need to fill them in with more thorough tests that cover more of your code.

Pass "transaction hints" in to client.callProcedure()

Some students need the ability to pass in dynamic information about how they want transactions executed. We should extend the client API to allow them to pass in a key/value pair of options that will get embedded in the StoredProcedureInvocation. Then at runtime, the corresponding code could check whether certain options have been set and then change their behavior accordingly.

Remove parameter descriptions out of HStoreConf

Since we're trying to reduce the memory footprint of each HStoreSite for the anti-caching project, we should move the configuration parameter description strings out into a separate file (since we are loading the @ConfigProperty annotations when we instantiate HStoreConf).

An alternative approach would be to use a separate XML file that contains all of the configuration parameter stuff. Then we can create a simple tool that automatically creates the appropriate Java code from that.

Aggressive Single-Partition Transaction Speculative Execution

Ok here we go! Right now the only time that we will speculatively execute a transaction is when we get the 2PC:PREPARE message for a distributed transaction. We are now going to implement more aggressive scheduling for speculative transactions.

The different places where we want to try to schedule speculative transactions are :

At the base partition when it the txn sends a WorkFragment request to remote partitions and it is blocked waiting for responses over the network.
At the remote partitions after the PartitionExecutor executes a query on behalf of a txn and becomes idle waiting for another query request or the 2PC:PREPARE message.

These are essentially any place that we are currently invoking PartitionExecutor.utilityWork(). Now the tricky part is that we are going to need to precompute what procedures are non-conflicting with all other procedures so that at runtime we can peek in our queue and quickly grab the ones that we use to fill in the gap.

In the first implementation, we are going the catalog to compute coarse-grained conflict sets. This might prove to be enough. We'll have to look at our workload to see how many interesting examples that we have based on that. We can refine our sets using our Markov models from the 2011 VLDB paper to figure out the estimated time for each of the txns and how much time we have until the partition will no longer be idle and try to schedule in the most txns that way.

Update Supported Benchmarks Documentation

Need to add documentation for new benchmarks that we support:

SEATS
Bingo

http://hstore.cs.brown.edu/documentation/deployment/benchmarks/

Serialization/Deserialization Optimizations

The system spends a lot of time executing the FastSerializer and FastDeserializer code. This is the current "high pole in the tent", as Stonebraker likes to call it, for single-partition transactions. I think we will get a huge speed up if we fixed the following parts of the code:

Right now the system creates a new StoredProcedureInvocation object for every incoming transaction request from the client. We should use an object pool with cached objects instead. The first step is to move the code for creating StoredProcedureInvocation out of VoltProcedureListener.ClientConnectionHandler.read() and into HStoreSite.procedureInvocation(). We will then add a new object pool to HStoreObjectPool and invoke StoredProcedureInvocation.readExternal() to populate it with the new information from the serialized request. We should make sure that we reuse the same ParameterSet in the StoredProcedureInvocation as well. Lastly, we will need to update HStoreSite.deleteTransaction() to make sure that we put the StoredProcedureInvocation object back into the pool once we know that we're done with the transaction.
Serializing ParameterSets in the ExecutionEngineJNI.executePlanFragment() code in order to pass down into the down into the C++ EE is super slow. A lot of this time is spent figuring out what each object type is in the array. We should be able to cache all of this mess and re-use a ParameterSet array per batch since what SQLStmts are in a batch don't change very often. This could be put inside of the BatchPlanner, since we already keep separate planners per unique batch in a VoltProcedure. We will want to use JProfiler to measure the overhead of serialization before and after this optimization is added.

Students will use H-Store's built-in benchmarks and profiling tools to measure the system before and after the optimizations are implemented. The focus in the benchmarking should just be on single-partition transactions in order to isolate the testing from networking issues.

Restore Support for Ad-Hoc Queries

We'll have to see what we can re-use from the original VoltDB code.

Set default H-Store log directory to base directory

Since there are only so many 64-bit machines in the CS department, we need to make sure that we write out the BenchmarkController's log files to their local directory instead of /tmp.

It would be nice to have a feature to prevent the files from getting overwritten.

Add ability to pass in EE log levels by name

Instead of using numbers, we should change build.py to accept log level names.

Use LRU eviction for BatchPlanner cache

We always create a new BatchPlanner for each unique SQLStmt batch in a transaction. We keep these things around forever and we have a separate cache for each partition (to avoid locking). This means that if a transaction has a lot of non-determinism (e.g., loops that vary the number of SQLStmts queued in a batch), then there will be many BatchPlanners that are sitting in this cache.

We should switch PartitionExecutor.batchPlanners to be an LRU cache so that we can discard BatchPlanners that aren't used very often. This will reduce the amount of memory used for internal infrastructure within an HStoreSite

Add table read/write sets in LocalTransaction

We want to keep track of the read/write sets per table at each partition inside of a LocalTransaction

Port TPC-E Bechmark to Java/H-Store

Since TPC-C is 20 years old, the TPC-E benchmark was developed to represent more modern OLTP applications. The TPC-E schema contains 33 tables with a diverse number of foreign key dependencies between them. It also features 12 stored procedures, of which ten are executed in the regular transactional mix and two that are considered "clean-up" procedures that are invoked at fixed intervals. These clean-up procedures are of particular interest because they perform full-table scans and updates for a wide variety of tables.

We currently have a partial implementation of TPC-E written for H-Store. For this project, students will complete the port and get the benchmark fully running on the H-Store system (or as close to the official specification as possible). There are three main tasks in this project:

Port the official TPC-E C++ data generator to H-Store's Java-based benchmark framework. The current version of H-Store's TPCELoader uses JNI to invoke the C++ loader, but this is not portable. The entire data generation process should be implemented in Java.
Port the official TPC-E C++ client driver to H-Store's Java-based benchmark framework. Each client instance should be single-threaded and autonomous (i.e., it does not need to synchronize its operations with other client instances). Special consideration must also given for multi-node deployments.
Clean-up the existing H-Store TPC-E stored procedures. Students will need to refactor procedures that have complex queries that contain SQL features that are currently not supported by H-Store into equivalent multiple queries and Java code. For example, the H-Store distributed planner has trouble with the seven table join query in the BrokerVolume transaction; the student will need to break the single large query up into multiple queries and combine the results in the Java-portion of the stored procedure. The current version also does not utilize batched queries and uses a cumbersome/slow "helper" function ProcedureUtil.execute().

Students will need some previous C++ and Java development experience. Reference C++ implementations are available in the repository and the OSDL's "DBT-5" for MySQL. It is important that any deviations from the official specification that students make in order to make the benchmark work on H-Store be documented so that they can be disclosed properly.

TPC-C Benchmark Prepare(catalog-fix) displays inaccurate information

PS. Please advise if this is not the right way to communicate the issues I have encountered during my testing with H-Store

Here is the stdout of the "ant hstore-prepare -Dhosts=" command...

Buildfile: /home/rajr/h-store/build.xml

hstore-prepare:

benchmark:
     [java] 22:51:19,180 INFO  - Compilation Complete. Exiting [./tpcc.jar]

hstore-jar:

catalog-fix:
     [copy] Copying 1 file to /home/rajr/h-store/obj/backup
     [echo] Copied back-up to /home/rajr/h-store/obj/backup/tpcc.jar
    [unjar] Expanding: /home/rajr/h-store/tpcc.jar into /home/rajr/h-store/obj/fixcatalog
     [java] 22:51:22,081 [main] (FixCatalog.java:110) INFO  - Updated host information in catalog with 5 new hosts and 12 partitions
     [java] 22:51:22,085 [main] (ArgumentsParser.java:927) WARN  - The ParameterMappings file '/home/rajr/h-store/files/mappings/tpcc.mappings' does not exist
     [java] 22:51:22,182 [main] (FixCatalog.java:191) WARN  - ParameterMappings file '/home/rajr/h-store/files/mappings/tpcc.mappings' does not exist. Ignoring...
     [java] 22:51:22,640 [main] (FixCatalog.java:110) INFO  - Updated host information in catalog with 5 new hosts and 12 partitions
     [java] 22:51:23,062 [main] (FixCatalog.java:224) INFO  - Wrote updated catalog specification to '/home/rajr/h-store/obj//fixcatalog/catalog.txt'
      [jar] Building jar: /home/rajr/h-store/tpcc.jar
   [delete] Deleting directory /home/rajr/h-store/obj/fixcatalog
     [echo] Updated catalog in jar file /home/rajr/h-store/tpcc.jar

BUILD SUCCESSFUL
Total time: 16 seconds
223grp01@sensorium-1 22:51:24 ~

The hostsfile contains:

$ cat HSCluster.txt

sensorium-3:0:0,1
sensorium-4:1:2,3
sensorium-5:2:4,5
sensorium-6:3:6,7
sensorium-7:4:8,9
sensorium-8:5:10,11

Clearly, there are 6 hosts in the file and would be nice to keep the output in sync just like you have specified the number of partitions.

TPC-C benchmark with >9 partitions does not work

Attached is the hosts file as well as the conf parameters to run the TPC-C benchmark.
As you can see, I have 4 clients and 6 sites(each site with 2 partitions).
Each site has 4 cores but I am using just 2 partitions per site.

The -Dhosts file used in the prepare step is:

$ cat HSCluster.txt
sensorium-3:0:0,1
sensorium-4:1:2,3
sensorium-5:2:4,5
sensorium-6:3:6,7
sensorium-7:4:8,9
sensorium-8:5:10,11

The -Dconf parameter points a file containing...

$ cat HSConf.txt
global.temp_dir = /home/223grp01/h-store/obj
global.sshoptions = -x
global.defaulthost          = sensorium-1
client.memory                 = 2048
client.count                = 4
client.host                 = sensorium-1,sensorium-2,sensorium-9,sensorium-10
client.txnrate              = 5000
client.processesperclient   = 20
client.duration             = 180000
client.warmup               = 60000
client.interval             = 10000
client.blocking             = true
client.blocking_concurrent  = 2
client.scalefactor          = 0.2
client.log_dir               = ${global.temp_dir}/logs/clients
site.memory                 = 6144
223grp01@sensorium-1 23:10:03 ~

When I run the benchmark, I get this error...

$ ./runHS.sh
Buildfile: /home/rajr/h-store/build.xml

hstore-benchmark:

benchmark:
     [java] 23:08:24,399 INFO  - ----------------------------------- BENCHMARK INITIALIZE :: TPCC -----------------------------------
     [java] 23:08:24,403 INFO  - Starting HStoreSite H00 on sensorium-3
     [java] 23:08:24,438 INFO  - Starting HStoreSite H01 on sensorium-4
     [java] 23:08:24,468 INFO  - Starting HStoreSite H02 on sensorium-5
     [java] 23:08:24,498 INFO  - Starting HStoreSite H03 on sensorium-6
     [java] 23:08:24,528 INFO  - Starting HStoreSite H04 on sensorium-7
     [java] 23:08:24,558 INFO  - Starting HStoreSite H05 on sensorium-8
     [java] 23:08:24,589 INFO  - Waiting for 6 HStoreSites with 12 partitions to finish initialization
     [java] 23:08:38,510 INFO  - -------------------------------------- BENCHMARK LOAD :: TPCC --------------------------------------
     [java] 23:08:38,515 INFO  - Starting TPCC Benchmark Loader - MultiLoader / ScaleFactor 0.20
     [java] 23:08:39,431 INFO  - Loading 12 warehouses using 12 load threads
     [java] 23:08:39,483 ERROR - Stream monitoring thread for 'site-03-sensorium-6' is exiting
     [java] 23:08:39,483 FATAL - site-03-sensorium-6
     [java] java.lang.ArithmeticException: / by zero
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.makeStock(MultiLoader.java:522)
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.run(MultiLoader.java:201)
     [java] java.lang.ArithmeticException: / by zero
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.makeStock(MultiLoader.java:522)
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.run(MultiLoader.java:201)
     [java] 23:08:39,561 ERROR - Unexpected error while invoking tpcc.MultiLoader
     [java] java.lang.InterruptedException
     [java]     at java.lang.Object.wait(Native Method)
     [java]     at java.lang.Thread.join(Thread.java:1143)
     [java]     at java.lang.Thread.join(Thread.java:1196)
     [java]     at edu.brown.benchmark.BenchmarkComponent.main(BenchmarkComponent.java:1075)
     [java]     at edu.brown.benchmark.BenchmarkController.startLoader(BenchmarkController.java:595)
     [java]     at edu.brown.benchmark.BenchmarkController.setupBenchmark(BenchmarkController.java:443)
     [java]     at edu.brown.benchmark.BenchmarkController.main(BenchmarkController.java:1679)
     [java] java.lang.ArithmeticException: / by zero
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.makeStock(MultiLoader.java:522)
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.run(MultiLoader.java:201)
     [java] java.lang.ArithmeticException: / by zero
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.makeStock(MultiLoader.java:522)
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.run(MultiLoader.java:201)
     [java] java.lang.ArithmeticException: / by zero
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.makeStock(MultiLoader.java:522)
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.run(MultiLoader.java:201)
     [java] java.lang.ArithmeticException: / by zero
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.makeStock(MultiLoader.java:522)
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.run(MultiLoader.java:201)
     [java] java.lang.ArithmeticException: / by zero
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.makeStock(MultiLoader.java:522)
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.run(MultiLoader.java:201)
     [java] java.lang.ArithmeticException: / by zero
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.makeStock(MultiLoader.java:522)
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.run(MultiLoader.java:201)
     [java] java.lang.ArithmeticException: / by zero
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.makeStock(MultiLoader.java:522)
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.run(MultiLoader.java:201)
     [java] java.lang.ArithmeticException: / by zero
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.makeStock(MultiLoader.java:522)
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.run(MultiLoader.java:201)
     [java] java.lang.ArithmeticException: / by zero
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.makeStock(MultiLoader.java:522)
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.run(MultiLoader.java:201)
     [java] java.lang.ArithmeticException: / by zero
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.makeStock(MultiLoader.java:522)
     [java]     at org.voltdb.benchmark.tpcc.MultiLoader$LoadThread.run(MultiLoader.java:201)

BUILD FAILED
/home/rajr/h-store/build.xml:2364: The following error occurred while executing this line:
/home/rajr/h-store/build.xml:1504: Java returned: 255

Total time: 27 seconds
223grp01@sensorium-1 23:08:41 ~
$

However, if I change the number of partitions to 9 using this hosts file...

$ cat HSCluster.txt
sensorium-3:0:0,1
sensorium-4:1:2
sensorium-5:2:3,4
sensorium-6:3:5
sensorium-7:4:6,7
sensorium-8:5:8

Then I run ant hstore-prepare...
Then run the benchmark, it does run and provide the results... albeit not so good.

Buildfile: /home/rajr/h-store/build.xml

hstore-benchmark:

benchmark:
     [java] 23:18:56,387 INFO  - ----------------------------------- BENCHMARK INITIALIZE :: TPCC -----------------------------------
     [java] 23:18:56,391 INFO  - Starting HStoreSite H00 on sensorium-3
     [java] 23:18:56,418 INFO  - Starting HStoreSite H01 on sensorium-4
     [java] 23:18:56,443 INFO  - Starting HStoreSite H02 on sensorium-5
     [java] 23:18:56,466 INFO  - Starting HStoreSite H03 on sensorium-6
     [java] 23:18:56,488 INFO  - Starting HStoreSite H04 on sensorium-7
     [java] 23:18:56,510 INFO  - Starting HStoreSite H05 on sensorium-8
     [java] 23:18:56,535 INFO  - Waiting for 6 HStoreSites with 9 partitions to finish initialization
     [java] 23:19:16,618 INFO  - -------------------------------------- BENCHMARK LOAD :: TPCC --------------------------------------
     [java] 23:19:16,623 INFO  - Starting TPCC Benchmark Loader - MultiLoader / ScaleFactor 0.20
     [java] 23:19:17,774 INFO  - Loading 9 warehouses using 9 load threads
     [java] 23:20:36,265 INFO  - Finished Loading Warehouse 1
     [java] 23:20:36,278 INFO  - Finished Loading Warehouse 5
     [java] 23:20:36,293 INFO  - Finished Loading Warehouse 4
     [java] 23:20:36,304 INFO  - Finished Loading Warehouse 7
     [java] 23:20:36,314 INFO  - Finished Loading Warehouse 9
     [java] 23:20:36,325 INFO  - Finished Loading Warehouse 8
     [java] 23:20:37,576 INFO  - Finished Loading Warehouse 6
     [java] 23:20:37,586 INFO  - Finished Loading Warehouse 2
     [java] 23:20:37,596 INFO  - Finished Loading Warehouse 3
     [java] 23:20:37,597 INFO  - Loading replicated ITEM table [tuples=20000]
     [java] 23:20:51,940 INFO  - Finished loading all warehouses
     [java] 23:20:51,940 INFO  - Completed TPCC loading phase in 95.32 sec
     [java] 23:20:52,225 INFO  - ------------------------------------ BENCHMARK EXECUTE :: TPCC ------------------------------------
     [java] 23:20:52,225 INFO  - Starting TPCC execution with 80 clients [hosts=4, perhost=20, txnrate=5000, blocking=true/2]
     [java] 23:20:56,754 INFO  - Letting system warm-up for 60.0 seconds
     [java] 23:21:56,764 INFO  - Starting benchmark stats collection
     [java]
     [java] At time 10000 out of 180000 (5%):
     [java]   In the past 10000 ms:
     [java]     Completed 2098 txns at a rate of 209.80 txns/s
     [java]   Since the benchmark began:
     [java]     Completed 2098 txns at a rate of 209.80 txns/s
     [java]
...
...
     [java]
     [java] At time 180000 out of 180000 (100%):
     [java]   In the past 10000 ms:
     [java]     Completed 1709 txns at a rate of 170.90 txns/s
     [java]   Since the benchmark began:
     [java]     Completed 31136 txns at a rate of 172.98 txns/s
     [java] 23:25:06,768 INFO  - Waiting for 4 clients to finish
     [java] 23:25:07,218 INFO  - Computing final benchmark results
     [java]
     [java] ======================================== BENCHMARK RESULTS ========================================
     [java] Execution Time:     180000 ms
     [java] Total Transactions: 31136
     [java] Throughput:         172.98 txn/s [min:131.10 / max:209.80 / stddev:17.22]
     [java]
     [java]                Delivery:        1278 total  (  4.1%)      7.10 txn/s      426.00 txn/m
     [java]               New Order:       13963 total  ( 44.8%)     77.57 txn/s     4654.33 txn/m
     [java]             Stock Level:        1225 total  (  3.9%)      6.81 txn/s      408.33 txn/m
     [java]         Reset Warehouse:           0 total  (  0.0%)      0.00 txn/s        0.00 txn/m
     [java]                 Payment:       13449 total  ( 43.2%)     74.72 txn/s     4483.00 txn/m
     [java]            Order Status:        1221 total  (  3.9%)      6.78 txn/s      407.00 txn/m
     [java] ====================================================================================================
     [java]

BUILD SUCCESSFUL
Total time: 6 minutes 31 seconds
223grp01@sensorium-1 23:25:10 ~
$

I am unable to run the benchmark for these parameters (please see the hosts file and conf file provided earlier) at all.
Is this a bug?

Please advise.

Live Migration

In this project, students will implement for live migration in H-Store. This will allow a new node to be added to an H-Store cluster without having to take the entire database offline.

Implement a new AbstractHasher that provides consistent hashing. This will allow for tuples to be redirected to the newly inserted node's partitions without having to reorganize the entire database.
Create a new RPC method in the HStoreCoordinator that allows the system to broadcast updates for the system catalog on all nodes to insert the new partition information.
Investigate, design, and implement a live migration scheme. This will essentially move tuples from one or more partitions to a newly added node in the cluster. This must be done in a non-blocking manner to minimize the impact on the database system as it continues to process transactions. The system must figure out whether the new node should get partitions from a single machine or multiple machines.

Students will use H-Store's built-in benchmarks to simulate an overloaded database cluster and then trigger a node to be added to the system. Students should measure how long it takes for their system to complete the migration, as well as compare the performance of the system before, during, and after the migration.

Move Benchmark ParameterMappings files into Main Repository

Right now all of the ParameterMapping files for the benchmarks are located in the H-Store Research Files repository. These should be moved into the individual benchmark directories in the main repository so that they can be automatically applied when building the project jar files.

Benchmark Loaders Slow on Large Clusters

I don't know what I did over this weekend, but now all of the benchmark loaders are super slow on large clusters. I tried making all the @LoadMultipartitionTable sysprocs single-partition at first, but this didn't help. Enabling client.blocking_loader also did not help.

This needs to resolve this before I can merge the "strangelove" branch back to the master.

Write-Ahead Logging

Although storing an entire database in main memory allows the system to run significantly faster, many administrators are unwilling to deploy mission-critical applications without the guarantees of disk. In this project, students will implement a command logging feature in H-Store that writes out the work performed by the database to a file on the local disk and the file will be fsync-ed. They will familiarize themselves with state-of-the-art practices used in both open-source and commercial database systems and the issues of ensuring transaction consistency in a high-performance database system.

Implement write-ahead redo logs. For each new transaction request that arrives at a master node (assuming replicas), the system will write a triplet out to a file that contains (1) the transaction id, (2) the name of the procedure, and (3) its input parameters. When the transaction finishes, the system will then write out whether the transaction committed or aborted. Using H-Store's built-in benchmark framework, students will measure the overhead of logging.
Add support to replay the log at start-up and re-execute the transactions. The first implementation can just assume that the database was restored from a checkpoint, but future versions will coordinate with the recovery project.
Once the basic implementation is in place, students will explore different approaches for optimizing the performance.

Restore Support for VoltDB's Regression Suite

I looked over the regression suite stuff this afternoon. Here is some basic information for you to help you get started. Overall I don't think that it will be that much work to get the basic tests working (e.g., for single- partition transactions). We'll have to see what kind of tests they have for distributed transactions, which is the thing that we really are about.

The first test case that we should focus on is 'TestFixedSQLSuite'. Just as before, you will be able to test it from the command-line (but it obviously won't work now):

$ ant compile junitclass -Djunitclass=TestFixedSQLSuite

The following are the steps that will help guide you to getting this working:

The base class for all of these tests cases is 'RegressionSuite'. This class is responsible for starting up the cluster configuration defined in 'VoltServerConfig'. There are different 'VoltServerConfig' implementations that define the different types of cluster deployments to use for a test. We'll eventually want to get all of them working, but for now let's start with a simple single server node test (e.g., 'LocalSingleProcessServer')
If you look in 'LocalSingleProcessServer', you will see that it implements a startUp() and a shutDown() method. In LocalSingleProcessServer.startUp() you will see that it creates a new 'ServerThread' instance, starts that thread, and then blocks until the cluster is initialized. I think that this is the main thing that we need to fix. We will need to make it so that we will start an H-Store cluster instead of a VoltDB cluster.
The 'ServerThread' code is pretty basic. You will see that it in ServerThread.run() invokes the 'VoltDB' wrapper to get a VoltDB instance and then starts it. Then in ServerThread.waitForInitialization() it blocks until the VoltDB instance is up and running.

The good news is that I went ahead and wrote a new wrapper class 'HStore' that almost has the same interface as 'VoltDB'. I added some sample code in 'ServerThread' that invokes the equivalent

The only thing that is different is that our 'HStore' wrapper requires you to pass in the Site catalog object and the 'HStoreConf' handle (which is different than the 'VoltDB.Configuration'). A Site represents a single JVM instance, and can have one or more Partitions. I updated 'ServerThread' so it automatically loads the base Catalog object for you. Check out our viewer tool to learn about what the Catalog contains:

http://hstore.cs.brown.edu/documentation/deployment/catalog-viewer-tool/

For example, after you try to execute the 'TestFixedSQLSuite' test case, you can use the viewer to look at the catalog information:

$ ant catalog-viewer -Djar=./obj/release/testobjects/fixedsql-onesite.jar

After looking at this for a bit, I think what needs to happen is that the constructor to 'ServerThread' should be given the Site catalog and the HStoreConf handle. For 'LocalSingleProcessServer', there should only be one Site to pass in. I've added some code in LocalSingleProcessServer.startUp() that gets everything you need.

Disk-Based Checkpoints + Recovery

This project is to implement disk-based snapshots in H-Store. This will allow the system to write out the contents of a database stored in memory out to a persistent file. Students may use VoltDB's snapshot feature as a reference in the initial implementation.

Implement basic snapshot functionality. This will write out the contents of the database to a file on the local disk at each node. This should be done down in the ExecutionEngine so that the entire contents of the database does not need to be copied up to the Java layer. For starters, snapshots can be invoked periodically based on an interval defined in the HStoreConf.
Create the ability to reload a snapshot stored on disk into the system when the system starts.
Create a system procedure that can be called by the client to execute the snapshot functionality. The input parameter to the procedure should be the directory that each HStoreSite will write the snapshot information to.
If the overhead of writing snapshots out to disk is large, students can explore non-blocking techniques such as forking the JVM and using a separate process to write out the checkpoint using snapshot isolation. Students can also explore other techniques for writing to alternative data stores, such as Google's LevelDB.

Students will measure the overhead costs of taking snapshots using their implementation in H-Store using the built-in benchmarks.

MapReduce Transactions

This project is for improving H-Store's current implementation of MapReduce transactions.

Add support for non-blocking Reduce execution. Currently, the Reduce-phase of a MapReduce transaction is executed like the Map-phase as multiple single-partition transactions that block all other transactions from executing on each partition. This is unnecessary because Reduce only operates on the data generated from Map (i.e., it is not allowed to access the database). Thus, this task is to explore different ways to execute the Reduce jobs without blocking the main execution threads. We will implement two different approaches for processing this work. The client will be able to pass in at runtime which technique to use process the transaction.
- The Reduce jobs will be offloaded to the MapReduceHelperThread for execution. The MapReduceHelperThread will execute the Reduce job for each partition serially.
- The Reduce jobs will be entered into a special queue at each partition that contains miscellaneous work that the PartitionExecutor can execute whenever it is blocked and needs something to do (e.g., waiting for a TransactionWorkResult from distributed query). A special method will allow the system to process a subset of the ReduceInput data (e.g., the list of values for just one key) and quickly return to check whether the thing that it was blocked on has arrived. This can either be implemented as a special thread that the PartitionExecutor can quickly restart/block or just using the PartitionExecutor's thread (the former is likely easier but I am not sure of the CPU cache implications).
Add support for dispatching Map jobs as asynchronous single-partition transactions. Currently, each MapReduce transaction is invoked as a distributed transaction that blocks all partitions. Although the PartitionExecutor will continue to execute non-MapReduce single-partition transactions, its partition is blocked from executing any non-MapReduce distributed transaction. We should reduce the initialization step and have each new MapReduce request get executed immediately.
- Need to double-check whether the Map-job executing at the base partition releases all of the locks in the cluster when it completes. I'm not sure if it does that now.
Conduct experiments that compare distributed transactions versus MapReduce transactions.
- Execute an entirely single-partitioned workload and measure the drop in performance when executing the distributed/MapReduce transactions.
- Compare the latency of distributed versus MapReduce transactions.
- Research metrics for determing the accuracy of "fuzzy queries". This will allow us to measure how different the results are when consistency is ignored. This actually a bit tricky to do because we can't execute the distributed transaction and the MR transaction at the exact same time, so we will need to think about how we can do this deterministically.
If the fuzzy query metrics in the experiments describe above are significantly different, then we will want implement the ability to dynamically enable snapshot serialization at run time for Map jobs. When the MapReduce transaction request arrives, the system will turn timestamp versioning on tuples. Any tuple that is modified when this is enabled are appended to the end of the table and are tagged with a timestamp (to save space, the timestamp can be a simple counter). When the Maptuples are never updated, I think that this is the only We cannot use the HyPer approach of forking the JVM because that it still does does not guarantee . See Serializable Isolation for Snapshot Databases for a basic overview of this technique.
Improve support in the query planner for multi-aggregate queries.

SELECT ol_number, SUM(ol_amount), AVG(ol_quantitiy)
  FROM order_line, item 
 WHERE order_line.ol_i_id item.i_id 
 GROUP BY ol_number ORDER BY ol_number

Optional: Add support for writing out Map results to disk. We need to think about this a bit, but we may have the problem where the Map job generates a lot of output that we don't want to keep in memory. Instead, we can serialize the VoltTable out to disk. The more I think about this, however, the more I think it's unnecessary.