Git Product home page Git Product logo

kylin-docker's Introduction

Kylin on Docker

This repository trackes the code and files for building docker images with Apache Kylin.

Please note: this is the master branch, which doesn't have the scripts and Dockerfile; You need checkout the specific branch which named with Kylin version and Hadoop release name.

Background

Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets. For more information you can visit Kylin home page at http://kylin.apache.org

Usually Kylin is deployed in a dedicated Hadoop client node, on which the Hadoop, HBase, Hive and other clients have been properly configured to communicate with the cluster; Kylin will use the client Jars and configuration files to work with other components; Besides, all Kylin metadata and cube data are persistended in HBase and HDFS, not in local, so all these make it very reasonable to build Kylin as a docker image for quick deploy.

How to make it

The main idea is building Hadoop/HBase/Hive clients and Kylin binary package into one image; User can pull this image, and then just add client configuration files like core-site.xml, hdfs-site.xml, yarn-site.xml, hbase-site.xml, hive-site.xml and kylin.properties to the effective paths to make a new image (has verified), or upload these files during starting up (not verified yet);

Before start, you need do some preparations:

  • check the Hadoop versions, and make sure the client libs in the image are compitable with the cluster;
  • prepare a kylin.properties file for this deployment;
  • ensure the Hadoop security constraint will not block Docker's adoption; you may need run additional component in the container if kerberos is enabled.

Steps

Below is a sample of building and running a docker image for Hortonworks HDP 2.2 cluster.

  1. Collect the client configuration files Get the *-site.xml files from a working Hadoop client node, to a local folder say "~/hadoop-conf/";

  2. Prepare kylin.properties

The kylin.properties file is the main configuration file for Kylin; you need prepare such a file and put it to the "~/hadoop-conf/" folder, together with other conf files; suggest to double check the parameters in it; e.g, the "kylin.metadata.url" points to the right metadata table, "kylin.hdfs.working.dir" is an existing HDFS folder and you have permission to write, etc.

  1. Clone this repository, checkout the correct branch;
git clone https://github.com/Kyligence/kylin-docker  
cd kylin-docker  
git checkout kylin152-hdp22  
  1. Copy the client configuration files to "kylin-docker/conf" folder, overwriting those template files;
cp -rf ~/hadoop-conf/* conf/
  1. Build docker image, which may take a while, just take a cup of tea;
docker build -t kyligence/kylin:152 .

After the build finished, should be able to see the image with "docker images" commmand;

[root@ip-10-0-0-38 ~]# docker images  
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE  
kyligence/kylin     152                 7ece32097fa3        About an hour ago   3.043 GB  
  1. Now you can run a contianer with the bootstrap command (in which will start Kylin server). The "-bash" argument is telling to keep in bash so you can continue to run bash commands; If don't need, you can use the "-d" argument:
[root@ip-10-0-0-38 ~]# docker run -i -t -p 7070:7070 kyligence/kylin:152 /etc/bootstrap.sh -bash  
Generating SSH1 RSA host key:                              [  OK  ]  
Starting sshd:                                             [  OK  ]  
KYLIN_HOME is set to /usr/local/kylin/bin/../  
kylin.security.profile is set to ldap  
16/06/30 04:50:31 WARN conf.HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist  
16/06/30 04:50:31 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist  
16/06/30 04:50:31 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist  
16/06/30 04:50:31 WARN conf.HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist  
Logging initialized using configuration in file:/etc/hive/conf/hive-log4j.properties  
HCAT_HOME not found, try to find hcatalog path from hadoop home  
A new Kylin instance is started by , stop it using "kylin.sh stop"  
Please visit http://<ip>:7070/kylin  
You can check the log at /usr/local/kylin/bin/..//logs/kylin.log  
  1. After a minute, you can open web browser with address http://host:7070/kylin , here the "host" is the hostname or IP address of the hosting machine which runs Docker; Its 7070 port will redirect to the contianer's 7070 port as we specified in the "docker run" command; You can change to other port as you like.

  2. Now you can use Kylin as usually: import Hive tables, design cubes, build, query, etc.

Thanks

Thanks to SequenceIQ's hadoop-docker and other projects, which inspires us on developing this.

kylin-docker's People

Contributors

shaofengshi avatar zhaoyongjie avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.