Git Product home page Git Product logo

telemetry's Introduction

Origin Security - Telemetry Pipeline

Security organisations are continuously challenged to collect more logs, from more devices; a problem that is typically solved with a Security Information and Event Management (SIEM) platform.

Many traditional SIEMs try to solve all problems at once; at Origin, we decided to break the traditional model into discrete components, and use a combination of the best free and open source software and cloud services to run our SIEM smarter and cheaper, while avoiding vendor lock-in.

  1. Shipping and Parsing:
    • We use a combination of Elastic Beats and Logstash, with some​ cloud-native pipelines where they make sense, for things like CloudTrail or VPC flow logs.
  2. Analytics:
    • We split off only the subset of logs we need for our day-to-day operations and alerting into Splunk.
    • We use Amazon Athena to query our historical logs directly from archive, or any sources that aren't in Splunk.​
  3. Archive:
    • We compress and partition our logs in Logstash before storing them in S3 for long term retention at very low cost.​

This repo consists of:

  • An AWS CDK app that will help with provisioning Fargate and the associated AWS components to run Logstash without servers.
  • Two docker images and associated Logstash configuration files to get you started.

Architecture

  1. We run a separate Logstash pipeline for each log source we’re ingesting, and we run them as separate microservices on Fargate.
  2. We push the events from these listener services into a central Kinesis data stream – which acts as a buffer.
  3. Then, we pull events from the data stream in batches and process them in a processor service, which is also Logstash running on Fargate.
    • This service parses any unstructured events, typically from syslog sources, it partitions events by time and event attributes, and it compresses these partitioned batches before uploading them to S3.
    • This service is also responsible for filtering off a subset of the event stream to a Splunk Universal Forwarder (not included in this stack).

Image of diagram showing pipeline components and corresponding stacks.

Cost

AWS pricing can vary significantly from region to region. You must review and understand the costs of the CloudFormation templates this stack outputs before you deploy them.

This will vary depending on how many services you deploy, what size and how many of each task you run, and which region you deploy to.

For general guidance only. Origin Security's own telemetry pipeline implementation costs around USD $800/month to run in the Sydney region and manages around 400,000,000 events/day with regular peaks exceeding 10,000 events/second.

Getting Started

Review the documentation for how to get started.

License

This code to help you build the required AWS infrastructure, and Logstash sample configuration files, is licensed under the MIT license; Logstash itself is not.

Refer to https://github.com/elastic/logstash/blob/master/LICENSE.txt for details on Logstash's license.

telemetry's People

Contributors

glennbolton avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.