Git Product home page Git Product logo

logeventsprocessing's Introduction

Description:

Log Processing storm topology for:

  1. count the log events per minute
  2. status codes
  3. find out which country & city the request is coming from

from http logs and persist the events, this is a example storm-topology, illustrating integration between storm, kafka, logstash and cassandra

Storm Topology

Dependencies:

This storm topology depends on various components

  1. logstash, will aggregate logs on individual machines and ships them off to kafka
  2. kafka, will act as a message queue from which storm will fetch the log events
  3. cassandra, will be used by storm to persist the event counters like logEventsPerMinute and logResponseCodes

Configuring Dependencies:

##LogStash:

logstash does not has support for kafka input/output types yet, so built one using any of the following 2 steps:

  1. Install logstash with a branch that supports kafka:
git clone https://github.com/ashrithr/logstash.git
cd logstash
git checkout feature/kafka

rvm install jruby-1.7.2
rvm use jruby-1.7.2

ruby gembag.rb logstash.gemspec #to install ruby dependencies

make #to create logstash jar
  1. Use the already built jar with kafka support, found in this project root logstash-1.1.10.dev-monolithic.jar

###Configure logstash agent

Then configure your logstash to ship the logs, use the below configuration as a base-line

shipper.conf

input {
  file {
    type => "syslog"
    path => "/tmp/apache.log"
    debug => true
  }
}

filter {
  multiline {
    type => "syslog"
    pattern => "^\t"
    what => "previous"
  }
}

output {
  stdout {
    debug => true
    debug_format => "json"
  }
  kafka {
    host => "127.0.0.1"
    port => 9092
    topic => "logstash"
  }
}

How to run in local mode:

  1. Start a local instance of kafka & zookeeper, for installation instructions
${KAFKA_HOME}/bin/zookeeper-server-start.sh config/zookeeper.properties
${KAFKA_HOME}/bin/kafka-server-start.sh config/server.properties
  1. Start a local instance of logstash
java -jar logstash-<version>-monolithic.jar agent -f shipper.conf #if using jar
${LOGSTASH_HOME}/bin/logstash agent -f shipper.conf #if using source
  1. Mock random apache log generation from here

  2. Start a local cassandra instance, for installation instructions

${CASSANDRA_HOME}/bin/cassandra -f
  1. Create cassandra keyspace and column families from file resources/cassandra_schema.txt:
${CASSANDRA_HOME}/bin/cassandra-cli -host localhost -port 9160 -f resources/cassandra_schema.txt
  1. Finally, run the storm topology in LocalCluster mode
mvn compile exec:java -Dexec.classpathScope=compile -Dexec.mainClass=com.cloudwick.log.LogTopology

Component versions tested on:

  • kafka - 0.7.2
  • storm - 0.8.2
  • cassandra - 1.0.12
  • storm-kafka - 0.8.0-wip4

logeventsprocessing's People

Contributors

ashrithr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

logeventsprocessing's Issues

logstash ship data to kafka occur error

logstash error message as follow:

Output thread exception {:plugin=><LogStash::Outputs::Kafka host=>"127.0.0.1", topic=>"logstash">, :exception=>#<Kafka::SocketError: cannot write: Broken pipe>, :backtrace=>["/home/csy/dev/src/logstash/logstash/vendor/bundle/jruby/1.9/gems/kafka-rb-0.0.12/lib/kafka/io.rb:53:in rescue in write'", "/home/csy/dev/src/logstash/logstash/vendor/bundle/jruby/1.9/gems/kafka-rb-0.0.12/lib/kafka/io.rb:49:inwrite'", "/home/csy/dev/src/logstash/logstash/vendor/bundle/jruby/1.9/gems/kafka-rb-0.0.12/lib/kafka/producer.rb:32:in send'", "/home/csy/dev/src/logstash/logstash/lib/logstash/outputs/kafka.rb:36:inreceive'", "/home/csy/dev/src/logstash/logstash/lib/logstash/outputs/base.rb:55:in handle'", "/home/csy/dev/src/logstash/logstash/lib/logstash/agent.rb:765:inrun_output'", "/home/csy/dev/src/logstash/logstash/lib/logstash/agent.rb:386:in `block in start_output'"], :level=>:warn}

[2013-12-06 03:22:23,851] ERROR Closing socket for /127.0.0.1 because of error (kafka.network.Processor)
java.nio.BufferUnderflowException
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
at java.nio.ByteBuffer.get(ByteBuffer.java:675)
at kafka.api.ApiUtils$.readShortString(ApiUtils.scala:38)
at kafka.api.ProducerRequest$.readFrom(ProducerRequest.scala:33)
at kafka.api.RequestKeys$$anonfun$1.apply(RequestKeys.scala:34)
at kafka.api.RequestKeys$$anonfun$1.apply(RequestKeys.scala:34)
at kafka.network.RequestChannel$Request.(RequestChannel.scala:49)
at kafka.network.Processor.read(SocketServer.scala:353)
at kafka.network.Processor.run(SocketServer.scala:245)
at java.lang.Thread.run(Thread.java:619)

my config is :

output{
kafka {
host => "127.0.0.1"
port => 9092
topic => "logstash"
}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.