Git Product home page Git Product logo

hadoop-mini-clusters's Introduction

hadoop-mini-clusters

hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE, without the need for a full blown development cluster or container orchestration. It allows the user to debug with the full power of the IDE. It provides a consistent API around the existing Mini Clusters across the ecosystem, eliminating the tedious task of learning the nuances of each project's approach.

Coverage Status

Modules:

The project structure changed with 0.1.0. Each mini cluster now resides in a module of its own. See the module names below.

Modules Included:

  • hadoop-mini-clusters-hdfs - Mini HDFS Cluster
  • hadoop-mini-clusters-yarn - Mini YARN Cluster (no MR)
  • hadoop-mini-clusters-mapreduce - Mini MapReduce Cluster
  • hadoop-mini-clusters-hbase - Mini HBase Cluster
  • hadoop-mini-clusters-zookeeper - Curator based Local Cluster
  • hadoop-mini-clusters-hiveserver2 - Local HiveServer2 instance
  • hadoop-mini-clusters-hivemetastore - Derby backed HiveMetaStore
  • hadoop-mini-clusters-storm - Storm LocalCluster
  • hadoop-mini-clusters-kafka - Local Kafka Broker
  • hadoop-mini-clusters-oozie - Local Oozie Server - Thanks again Vladimir
  • hadoop-mini-clusters-mongodb - I know... not Hadoop
  • hadoop-mini-clusters-activemq - Thanks Vladimir Zlatkin!
  • hadoop-mini-clusters-hyperscaledb - For testing various databases
  • hadoop-mini-clusters-knox - Local Knox Gateway
  • hadoop-mini-clusters-kdc - Local Key Distribution Center (KDC)

Tests:

Tests are included to show how to configure and use each of the mini clusters. See the *IntegrationTest classes.

Using:

  • Maven Central - latest release
<dependency>
    <groupId>com.github.sakserv</groupId>
    <artifactId>hadoop-mini-clusters</artifactId>
    <version>0.1.16</version>
</dependency>

<dependency>
    <groupId>com.github.sakserv</groupId>
    <artifactId>hadoop-mini-clusters-common</artifactId>
    <version>0.1.16</version>
</dependency>

Profile Support:

Multiple versions of HDP are available. The current list is:

  • HDP 2.6.5.0 (default)
  • HDP 2.6.3.0
  • HDP 2.6.2.0
  • HDP 2.6.1.0
  • HDP 2.6.0.3
  • HDP 2.5.3.0
  • HDP 2.5.0.0
  • HDP 2.4.2.0
  • HDP 2.4.0.0
  • HDP 2.3.4.0
  • HDP 2.3.2.0
  • HDP 2.3.0.0

To use a different profiles, add the profile name to your maven build:

mvn test -P2.3.0.0

Note that backwards compatibility is not guarenteed.

Examples:

HDFS Example

<dependency>
    <groupId>com.github.sakserv</groupId>
    <artifactId>hadoop-mini-clusters-hdfs</artifactId>
    <version>0.1.16</version>
</dependency>
HdfsLocalCluster hdfsLocalCluster = new HdfsLocalCluster.Builder()
    .setHdfsNamenodePort(12345)
    .setHdfsNamenodeHttpPort(12341)
    .setHdfsTempDir("embedded_hdfs")
    .setHdfsNumDatanodes(1)
    .setHdfsEnablePermissions(false)
    .setHdfsFormat(true)
    .setHdfsEnableRunningUserAsProxyUser(true)
    .setHdfsConfig(new Configuration())
    .build();
                
hdfsLocalCluster.start();

YARN Example

<dependency>
    <groupId>com.github.sakserv</groupId>
    <artifactId>hadoop-mini-clusters-yarn</artifactId>
    <version>0.1.16</version>
</dependency>
YarnLocalCluster yarnLocalCluster = new YarnLocalCluster.Builder()
    .setNumNodeManagers(1)
    .setNumLocalDirs(Integer.parseInt(1)
    .setNumLogDirs(Integer.parseInt(1)
    .setResourceManagerAddress("localhost:37001")
    .setResourceManagerHostname("localhost")
    .setResourceManagerSchedulerAddress("localhost:37002")
    .setResourceManagerResourceTrackerAddress("localhost:37003")
    .setResourceManagerWebappAddress("localhost:37004")
    .setUseInJvmContainerExecutor(false)
    .setConfig(new Configuration())
    .build();
   
yarnLocalCluster.start();

MapReduce Example

<dependency>
    <groupId>com.github.sakserv</groupId>
    <artifactId>hadoop-mini-clusters-mapreduce</artifactId>
    <version>0.1.16</version>
</dependency>
MRLocalCluster mrLocalCluster = new MRLocalCluster.Builder()
    .setNumNodeManagers(1)
    .setJobHistoryAddress("localhost:37005")
    .setResourceManagerAddress("localhost:37001")
    .setResourceManagerHostname("localhost")
    .setResourceManagerSchedulerAddress("localhost:37002")
    .setResourceManagerResourceTrackerAddress("localhost:37003")
    .setResourceManagerWebappAddress("localhost:37004")
    .setUseInJvmContainerExecutor(false)
    .setConfig(new Configuration())
    .build();

mrLocalCluster.start();

HBase Example

<dependency>
    <groupId>com.github.sakserv</groupId>
    <artifactId>hadoop-mini-clusters-hbase</artifactId>
    <version>0.1.16</version>
</dependency>
HbaseLocalCluster hbaseLocalCluster = new HbaseLocalCluster.Builder()
    .setHbaseMasterPort(25111)
    .setHbaseMasterInfoPort(-1)
    .setNumRegionServers(1)
    .setHbaseRootDir("embedded_hbase")
    .setZookeeperPort(12345)
    .setZookeeperConnectionString("localhost:12345")
    .setZookeeperZnodeParent("/hbase-unsecure")
    .setHbaseWalReplicationEnabled(false)
    .setHbaseConfiguration(new Configuration())
    .activeRestGateway()
        .setHbaseRestHost("localhost")
        .setHbaseRestPort(28000)
        .setHbaseRestReadOnly(false)
        .setHbaseRestThreadMax(100)
        .setHbaseRestThreadMin(2)
        .build()
    .build();

hbaseLocalCluster.start();

Zookeeper Example

<dependency>
    <groupId>com.github.sakserv</groupId>
    <artifactId>hadoop-mini-clusters-zookeeper</artifactId>
    <version>0.1.16</version>
</dependency>
ZookeeperLocalCluster zookeeperLocalCluster = new ZookeeperLocalCluster.Builder()
    .setPort(12345)
    .setTempDir("embedded_zookeeper")
    .setZookeeperConnectionString("localhost:12345")
    .setMaxClientCnxns(60)
    .setElectionPort(20001)
    .setQuorumPort(20002)
    .setDeleteDataDirectoryOnClose(false)
    .setServerId(1)
    .setTickTime(2000)
    .build();

zookeeperLocalCluster.start();

HiveServer2 Example

<dependency>
    <groupId>com.github.sakserv</groupId>
    <artifactId>hadoop-mini-clusters-hiveserver2</artifactId>
    <version>0.1.16</version>
</dependency>
HiveLocalServer2 hiveLocalServer2 = new HiveLocalServer2.Builder()
    .setHiveServer2Hostname("localhost")
    .setHiveServer2Port(12348)
    .setHiveMetastoreHostname("localhost")
    .setHiveMetastorePort(12347)
    .setHiveMetastoreDerbyDbDir("metastore_db")
    .setHiveScratchDir("hive_scratch_dir")
    .setHiveWarehouseDir("warehouse_dir")
    .setHiveConf(new HiveConf())
    .setZookeeperConnectionString("localhost:12345")
    .build();

hiveLocalServer2.start();

HiveMetastore Example

<dependency>
    <groupId>com.github.sakserv</groupId>
    <artifactId>hadoop-mini-clusters-hivemetastore</artifactId>
    <version>0.1.16</version>
</dependency>
HiveLocalMetaStore hiveLocalMetaStore = new HiveLocalMetaStore.Builder()
    .setHiveMetastoreHostname("localhost")
    .setHiveMetastorePort(12347)
    .setHiveMetastoreDerbyDbDir("metastore_db")
    .setHiveScratchDir("hive_scratch_dir")
    .setHiveWarehouseDir("warehouse_dir")
    .setHiveConf(new HiveConf())
    .build();

hiveLocalMetaStore.start();

Storm Example

<dependency>
    <groupId>com.github.sakserv</groupId>
    <artifactId>hadoop-mini-clusters-storm</artifactId>
    <version>0.1.16</version>
</dependency>
StormLocalCluster stormLocalCluster = new StormLocalCluster.Builder()
    .setZookeeperHost("localhost")
    .setZookeeperPort(12345)
    .setEnableDebug(true)
    .setNumWorkers(1)
    .setStormConfig(new Config())
    .build();

stormLocalCluster.start();

Kafka Example

<dependency>
    <groupId>com.github.sakserv</groupId>
    <artifactId>hadoop-mini-clusters-kafka</artifactId>
    <version>0.1.16</version>
</dependency>
KafkaLocalBroker kafkaLocalBroker = new KafkaLocalBroker.Builder()
    .setKafkaHostname("localhost")
    .setKafkaPort(11111)
    .setKafkaBrokerId(0)
    .setKafkaProperties(new Properties())
    .setKafkaTempDir("embedded_kafka")
    .setZookeeperConnectionString("localhost:12345")
    .build();

kafkaLocalBroker.start();

Oozie Example

<dependency>
    <groupId>com.github.sakserv</groupId>
    <artifactId>hadoop-mini-clusters-oozie</artifactId>
    <version>0.1.16</version>
</dependency>
OozieLocalServer oozieLocalServer = new OozieLocalServer.Builder()
    .setOozieTestDir("embedded_oozie")
    .setOozieHomeDir("oozie_home")
    .setOozieUsername(System.getProperty("user.name"))
    .setOozieGroupname("testgroup")
    .setOozieYarnResourceManagerAddress("localhost")
    .setOozieHdfsDefaultFs("hdfs://localhost:8020/")
    .setOozieConf(new Configuration())
    .setOozieHdfsShareLibDir("/tmp/oozie_share_lib")
    .setOozieShareLibCreate(Boolean.TRUE)
    .setOozieLocalShareLibCacheDir("share_lib_cache")
    .setOoziePurgeLocalShareLibCache(Boolean.FALSE)
    .setOozieShareLibFrameworks(
        Lists.newArrayList(Framework.MAPREDUCE_STREAMING, Framework.OOZIE))
    .build();

OozieShareLibUtil oozieShareLibUtil = new OozieShareLibUtil(
    oozieLocalServer.getOozieHdfsShareLibDir(),
    oozieLocalServer.getOozieShareLibCreate(), 
    oozieLocalServer.getOozieLocalShareLibCacheDir(),
    oozieLocalServer.getOoziePurgeLocalShareLibCache(), 
    hdfsLocalCluster.getHdfsFileSystemHandle(),
    oozieLocalServer.getOozieShareLibFrameworks());
oozieShareLibUtil.createShareLib();

oozieLocalServer.start();

MongoDB Example

<dependency>
    <groupId>com.github.sakserv</groupId>
    <artifactId>hadoop-mini-clusters-mongodb</artifactId>
    <version>0.1.16</version>
</dependency>
MongodbLocalServer mongodbLocalServer = new MongodbLocalServer.Builder()
    .setIp("127.0.0.1")
    .setPort(11112)
    .build();

mongodbLocalServer.start();

ActiveMQ Example

<dependency>
    <groupId>com.github.sakserv</groupId>
    <artifactId>hadoop-mini-clusters-activemq</artifactId>
    <version>0.1.16</version>
</dependency>
ActivemqLocalBroker amq = new ActivemqLocalBroker.Builder()
    .setHostName("localhost")
    .setPort(11113)
    .setQueueName("defaultQueue")
    .setStoreDir("activemq-data")
    .setUriPrefix("vm://")
    .setUriPostfix("?create=false")
    .build();

amq.start();

HyperSQL DB Example

<dependency>
    <groupId>com.github.sakserv</groupId>
    <artifactId>hadoop-mini-clusters-hyperscaledb</artifactId>
    <version>0.1.16</version>
</dependency>
hsqldbLocalServer = new HsqldbLocalServer.Builder()
    .setHsqldbHostName("127.0.0.1")
    .setHsqldbPort("44111")
    .setHsqldbTempDir("embedded_hsqldb")
    .setHsqldbDatabaseName("testdb")
    .setHsqldbCompatibilityMode("mysql")
    .setHsqldbJdbcDriver("org.hsqldb.jdbc.JDBCDriver")
    .setHsqldbJdbcConnectionStringPrefix("jdbc:hsqldb:hsql://")
    .build();

hsqldbLocalServer.start();

Knox Example

<dependency>
    <groupId>com.github.sakserv</groupId>
    <artifactId>hadoop-mini-clusters-knox</artifactId>
    <version>0.1.16</version>
</dependency>
KnoxLocalCluster knoxCluster = new KnoxLocalCluster.Builder()
    .setPort(8888)
    .setPath("gateway")
    .setHomeDir("embedded_knox")
    .setCluster("mycluster")
    .setTopology(XMLDoc.newDocument(true)
        .addRoot("topology")
            .addTag("gateway")
                .addTag("provider")
                    .addTag("role").addText("authentication")
                    .addTag("enabled").addText("false")
                    .gotoParent()
                .addTag("provider")
                    .addTag("role").addText("identity-assertion")
                    .addTag("enabled").addText("false")
                    .gotoParent()
                .gotoParent()
            .addTag("service")
                .addTag("role").addText("NAMENODE")
                .addTag("url").addText("hdfs://localhost:8020")
                .gotoParent()
            .addTag("service")
                .addTag("role").addText("WEBHDFS")
                .addTag("url").addText("http://localhost:50070/webhdfs")
        .gotoRoot().toString())
        .build();

knoxCluster.start();

KDC Example

<dependency>
    <groupId>com.github.sakserv</groupId>
    <artifactId>hadoop-mini-clusters-kdc</artifactId>
    <version>0.1.16</version>
</dependency>
KdcLocalCluster kdcLocalCluster = new KdcLocalCluster.Builder()
        .setPort(34340)
        .setHost("127.0.0.1")
        .setBaseDir("embedded_kdc")
        .setOrgDomain("ORG")
        .setOrgName("ACME")
        .setPrincipals("hdfs,hbase,yarn,oozie,oozie_user,zookeeper,storm,mapreduce,HTTP".split(","))
        .setKrbInstance("127.0.0.1")
        .setInstance("DefaultKrbServer")
        .setTransport("TCP")
        .setMaxTicketLifetime(86400000)
        .setMaxRenewableLifetime(604800000)
        .setDebug(false)
        .build();
kdcLocalCluster.start();

Find how to integrate KDC with HDFS, Zookeeper or HBase in the tests under hadoop-mini-clusters-kdc/src/test/java/com/github/sakserv/minicluster/impl

Modifying Properties

To change the defaults used to construct the mini clusters, modify src/main/java/resources/default.properties as needed.

Intellij Testing

If you desire running the full test suite from Intellij, make sure Fork Mode is set to method (Run -> Edit Configurations -> fork mode)

InJvmContainerExecutor

YarnLocalCluster now supports Oleg Z's InJvmContainerExecutor. See Oleg Z's Github for more.

hadoop-mini-clusters's People

Contributors

explicite avatar jetoile avatar jlleitschuh avatar llbg avatar sakserv avatar skumpf avatar timvw avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.