Git Product home page Git Product logo

hive-flume-root's Introduction

#Analysing Collected Server Log Data This example demonstrates the following respectively:

  • Generate log data in Hortonworks Sandbox
  • Load the log data, each time it's changed, into Hadoop with Flume
  • Import the data with Python Hive Client into Root to visualize data analysis

Inspired from Hortonwork's official example: How to Refine and Visualize Server Log Data

Install and Run Flume

I'm using WinSCP to access server for better understanding. You can learn your server ip with ifconfig command.

win_scp

To install Flume, type the command below. Even though it was already installed at the first time I've started virtual machine.

yum install –y flume

Start flume after putting the configuration file into /etc/flume/conf/ directory. Using PuTTY is optional.

flume-ng agent -c /etc/flume/conf -f /etc/flume/conf/flume.conf -n sandbox

According to our flume.conf configuration file, we know that sandbox is our agent name, and our source is /var/log/eventlog-demo.log which means flume listens changes on this log, namely when the source receives an event, it stores it into our channel which is also defined in cofiguration file. The channel keeps the event until it’s consumed by the Flume sink. The sink removes the event from the channel and puts it into HDFS in this example.

Generating Server Log

After putting generate_logs.py file into server, then generate new log line with python generate_logs.py command. I made this file to write one line to better determination for beginners. Otherwise, a real world example would create larger logs.

generate_logs

Our generate_logs.py file now in the server. All we have to do is to run python command.

generate_log_data

Creating HCatalog Table

Next step is creating the table to store the logs.

hcat -e "CREATE TABLE COUNTRY_LOGS(time STRING, ip STRING, country STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LOCATION '/flume/events';"

table_creation_error_and_success

As you can see from the image above, I get an error firstly. Later on, succeeded after running usermod -aG hdfs hue and/or usermod -aG hdfs root commands in server and restarting the services.

hcat_country_logs_table

Eventually, the data residing in the HDFS peacefully...

Fetching Data for Visualization with Python Client for Hive

Python Client pyhs2 executes query via HiveServer2 Thrift API then fetches the query result. At the end, it stores data files according to query result. Those data files are possible canditates as input for Root histogram.

To install pyhs2 dependencies run python setup.py install command in ./pyhs2 directory.

In the root-analysis, run python generate_data.py command and it's done, the data files are in ./data directory.

Visualization with Root

CERN's Root is a great framework to analysis, especially for scientific analyses. root htraffic.C command will cause our histogram to emerge.

root_analysis


References

  1. Flume User Guide
  2. How to Refine and Visualize Server Log Data

hive-flume-root's People

Contributors

ufukomer avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.