The accumulo from tlpinney

******************************************************************************
1. Building

In the normal tarball or RPM release of accumulo, everything is built and
ready to go on x86 GNU/Linux: there is no build step.

However, if you only have source code, or you wish to make changes, you need to
have maven configured to get Accumulo prerequisites from repositories.  See
the pom.xml file for the necessary components.

Run "mvn package -P assemble"

If you are running on another Unix-like operating system (OSX, etc) then
you may wish to build the native libraries.  They are not strictly necessary
but having them available suppresses a runtime warning:

  $ ( cd ./src/server/src/main/c++ ; make )

******************************************************************************
2. Deployment

Copy the accumulo tar file produced by mvn package 
from the src/assemble/target/ directory to the desired destination, then untar it (e.g. 
tar xvzf accumulo-1.5.0-incubating-SNAPSHOT-dist.tar.gz).

If you are using the RPM, install the RPM on every machine that will run
accumulo.

******************************************************************************
3. Upgrading from 1.3 to 1.4


******************************************************************************
4. Configuring

Accumulo has two prerequisites, hadoop and zookeeper. Zookeeper must be at
least version 3.3.0. Both of these must be installed and configured. Zookeeper 
normally only allows for 10 connections from one computer.  On a single-host 
install, this number is a little two low.  Add the following to the 
$ZOOKEEPER_HOME/conf/zoo.cfg file:

   maxClientCnxns=100

Ensure you (or the some special hadoop user account) have accounts on all of
the machines in the cluster and that hadoop and accumulo install files can be
found in the same location on every machine in the cluster.  You will need to
have password-less ssh set up as described in the hadoop documentation. 

You will need to have hadoop installed and configured on your system.
Accumulo 1.5.0-incubating-SNAPSHOT has been tested with hadoop version 0.20.2.

Create a "slaves" file in $ACCUMULO_HOME/conf/.  This is a list of machines
where tablet servers and loggers will run.

Create a "masters" file in $ACCUMULO_HOME/conf/.  This is a list of
machines where the master server will run. 

Create conf/accumulo-env.sh following the template of
conf/accumulo-env.sh.example.  Set JAVA_HOME, HADOOP_HOME, and ZOOKEEPER_HOME.
These directories must be at the same location on every node in the cluster.
Note that zookeeper must be installed on every machine, but it should not be 
run on every machine.

Create the $ACCUMULO_LOG_DIR on every machine in the slaves file.

* Note that you will be specifying the Java heap space in accumulo-env.sh.  
You should make sure that the total heap space used for the accumulo tserver,
logger and the hadoop datanode and tasktracker is less than the available
memory on each slave node in the cluster.  On large clusters, it is recommended
that the accumulo master, hadoop namenode, secondary namenode, and hadoop
jobtracker all be run on separate machines to allow them to use more heap
space.  If you are running these on the same machine on a small cluster, make
sure their heap space settings fit within the available memory.  The zookeeper
instances are also time sensitive and should be on machines that will not be
heavily loaded, or over-subscribed for memory.

Create conf/accumulo-site.xml.  You must set the zookeeper servers in this
file (instance.zookeeper.host).  Look at docs/config.html to see what
additional variables you can modify and what the defaults are.

Create the write-ahead log directory on all slaves.  The directory is set in 
the accumulo-site.xml as the "logger.dir.walog" parameter.  It is a local 
directory that will be used to log updates which will be used in the event of
tablet server failure, so it is important that it have sufficient space and
reliability.  It is possible to specify a comma-separated list of directories 
to use for write-ahead logs, in which case each directory in the list must be
created on all slaves.

Synchronize your accumulo conf directory across the cluster.  As a precaution
against mis-configured systems, servers using different configuration files
will not communicate with the rest of the cluster.

******************************************************************************
5. Running Accumulo

Make sure hadoop is configured on all of the machines in the cluster, including
access to a shared hdfs instance.  Make sure hdfs is running.

Make sure zookeeper is configured and running on at least one machine in the
cluster.

Run "bin/accumulo init" to create the hdfs directory structure
(hdfs:///accumulo/*) and initial zookeeper settings. This will also allow you
to also configure the initial root password. Only do this once. 

Start accumulo using the bin/start-all.sh script.

Use the "bin/accumulo shell -u <username>" command to run an accumulo shell
interpreter.  Within this interpreter, run "createtable <tablename>" to create
a table, and run "table <tablename>" followed by "scan" to scan a table.

In the example below a table is created, data is inserted, and the table is
scanned.

    $ ./bin/accumulo shell -u root
    Enter current password for 'root'@'accumulo': ******

    Shell - Accumulo Interactive Shell
    - 
    - version: 1.5.0-incubating-SNAPSHOT
    - instance name: accumulo
    - instance id: f5947fe6-081e-41a8-9877-43730c4dfc6f
    - 
    - type 'help' for a list of available commands
    - 
    root@ac> createtable foo
    root@ac foo> insert row1 colf1 colq1 val1
    root@ac foo> insert row1 colf1 colq2 val2
    root@ac foo> scan
    row1 colf1:colq1 []    val1
    row1 colf1:colq2 []    val2

The example below start the shell, switches to table foo, and scans for a
certain column.

    $ ./bin/accumulo shell -u root
    Enter current password for 'root'@'accumulo': ******

    Shell - Accumulo Interactive Shell
    - 
    - version: 1.5.0-incubating-SNAPSHOT
    - instance name: accumulo
    - instance id: f5947fe6-081e-41a8-9877-43730c4dfc6f
    - 
    - type 'help' for a list of available commands
    - 
    root@ac> table foo
    root@ac foo> scan -c colf1:colq2
    row1 colf1:colq2 []    val2




******************************************************************************
6. Monitoring Accumulo

You can point your browser to the master host, on port 50095 to see the status
of accumulo across the cluster.  You can even do this with the text-based
browser "links":

 $ links http://localhost:50095

From this GUI, you can ensure that tablets are assigned, tables are online,
tablet servers are up. You can monitor query and ingest rates across the
cluster.

******************************************************************************
7. Stopping Accumulo

Do not kill the tabletservers or run bin/tdown.sh unless absolutely necessary.
Recovery from a catastrophic loss of servers can take a long time. To shutdown
cleanly, run "bin/stop-all.sh" and the master will orchestrate the shutdown of
all the tablet servers.  Shutdown waits for all writes to finish, so it may
take some time for particular configurations.  

******************************************************************************
8. Logging

DEBUG and above are logged to the logs/ dir.  To modify this behavior change
the scripts in conf/.  To change the logging dir, set ACCUMULO_LOG_DIR in
conf/accumulo-env.sh.  Stdout and stderr of each accumulo process is
redirected to the log dir.

******************************************************************************
9. API

The public accumulo API is composed of everything in the org.apache.accumulo.core.client
package (excluding the org.apache.accumulo.core.client.impl package) and the following
classes from org.apache.accumulo.core.data : Key, Mutation, Value, and Range.  To get
started using accumulo review the example and the javadoc for the packages and
classes mentioned above. 

******************************************************************************
10. Performance Tuning

Accumulo has exposed several configuration properties that can be changed. 
These properties and configuration management are described in detail in 
docs/config.html.  While the default value is usually optimal, there are cases 
where a change can increase query and ingest performance.

Before changing a property from its default in a production system, you should 
develop a good understanding of the property and consider creating a test to 
prove the increased performance.

******************************************************************************
tlpinney / accumulo Goto Github PK

accumulo's Introduction

accumulo's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent