Apache Big Data projects

Based on and motivated by the following resources:

Top-level

Accumulo, http://accumulo.apache.org/ - a sorted, distributed key/value store
Cassandra, http://cassandra.apache.org/ - column-oriented database
Cayenne, http://cayenne.apache.org/ - object-relational mapping (ORM) and remoting services
CouchDB, http://couchdb.apache.org/ - NoSQL document-oriented datastore
Gora, http://gora.apache.org/ - provides an in-memory data model and persistence for big data
Hadoop, http://hadoop.apache.org/ - a distributed computing platform:
HDFS - distributed redundant file system for Hadoop
MapReduce - parallel computation on server clusters
HBase, http://hbase.apache.org/ - column-oriented database on top of Hadoop
Hive, http://hive.apache.org/ - data warehouse with SQL-like access
Flume, http://flume.apache.org/ - collection and import of log and event data
Lucene, http://lucene.apache.org/ - indexing
Mahout, http://mahout.apache.org/ - library of machine learning and data mining algorithms on top of Hadoop
Pig, http://pig.apache.org/ - high-level programming language for Hadoop computations
Oozie, http://oozie.apache.org/ - orchestration and workflow management for Hadoop
Solr, http://lucene.apache.org/solr/ - Lucene-based enterprise search platform
Sqoop, http://sqoop.apache.org/ - imports data from relational databases into Hadoop
Whirr, http://whirr.apache.org/ - cloud-agnostic deployment of clusters
Zookeeper, http://zookeeper.apache.org/ - configuration management and coordination

Ambari, http://incubator.apache.org/ambari/ - deployment, configuration and monitoring of Hadoop clusters
Blur, http://incubator.apache.org/blur/ - search platform for searching massive amounts of data in a cloud computing environment
Chukwa, http://incubator.apache.org/chukwa/ - log collection and analysis framework for Apache Hadoop clusters
Crunch, http://incubator.apache.org/crunch/ - a Java library for writing, testing, and running pipelines of MapReduce jobs
Drill, http://incubator.apache.org/drill/ - interactive analysis of large-scale data
HCatalog, http://incubator.apache.org/hcatalog/ - schema and data type sharing over Pig, Hive and MapReduce
Kafka, http://incubator.apache.org/kafka/ - distributed publish-subscribe messaging system
Mesos, http://incubator.apache.org/mesos/ - a cluster manager that provides resource sharing and isolation across cluster applications
S4, http://incubator.apache.org/s4/ - distributed platform for processing continuous unbounded streams of data
Tashi, http://incubator.apache.org/tashi/ - infrastructure for service providers to build applications harnessing cluster computing resources to efficiently access repositories of rich data