Git Product home page Git Product logo

pfilourenco / security-analytics-accelerators Goto Github PK

View Code? Open in Web Editor NEW

This project forked from googlecloudplatform/aicoe

0.0 0.0 0.0 60.41 MB

This repository contains an end-to-end walkthrough to leverage Google Cloud services to demonstrate a Solution Accelerator for Security Analytics.

License: Apache License 2.0

Shell 10.39% Python 52.45% PureBasic 0.53% Smarty 10.51% HCL 26.11%

security-analytics-accelerators's Introduction

Solution Accelerator for Security Analytics

This repository contains an end-to-end walkthrough to leverage Google Cloud services to demonstrate a Solution Accelerator for Security Analytics. The Solution Accelerator starts from - ingesting real-time / streaming and batch data into BigQuery, developing models using Vertex AI and BQML to detect and report anomalies of security attacks and using Looker Studio to visualize the details. A subset of public datasets and generated data are used to simulate the flow.

Table of Contents

Introduction

The Solution Accelerator is developed to create core Google Cloud infrastructure for Data Ingestion pipeline with support for Batch and Real-time data, support Transformations using the out of the box Templates, letting the data flow through a MLOps pipeline exhibiting data collection, processing, modeling, anomaly detection and visualization. These key elements can be used in the Security and Analytics domain around below use cases.

  • Analyzing network traffic to identify patterns that indicate a potential attack
  • Detect insider threats or malicious activity
  • Incident response and forensics (using logs)
  • Manage third and fourth-party vendor risk
  • User Behavior Analysis
  • Data exfiltration Detection (Perhaps in conjunction with VPC Service Controls)

High Level Architecture

The labelled numbers (1-6) correspond to the Sprint cycle, each explaining the data journey in great detail.

HighLevelFlow

  • Sprint 1 - Realtime Ingestion: Google Cloud PubSub is used to stream data in real-time to BigQuery using JSON log format

  • Sprint 2 - Enrichment: Dataflow is used with PubSub to stream data to BigQuery

  • Sprint 3 - Feature Store: Dataflow is used to store data from Google Cloud Storage into Vertex AI FeatureStore.

  • Sprint 4 - Anomaly detection: Anomaly detection is demonstrated using FeatureStore and AutoML and Vertex AI Model Registry

  • Sprint 5 - BigQueryML: Data stored in BigQuery is leveraged to develop a BigQueryML model for Anomaly detection

  • Sprint 6 - Visualization: Anomaly detection dashboard is developed using Looker Studio that shows the various data paths and trigger patterns. Custom dashboard can be developed depending on use cases.

The above journey of data from Log ingestion, enriching logs and inference can be walked through using different data sets.

Tech Stack

  • Python 3.7
  • Terraform / HCL (HashiCorp Configuration Language)
  • Shell scripting
  • Google Cloud services

Hands-on

Bootstrap

This is the first step that creates the foundational infrastructure needs for the remaining sprints. Click here for instructions.

Note This is a mandatory pre-step.

Realtime Ingestion

This sprint shows reading data from a file to simulate a real-time experience and ingesting to a Cloud PubSub topic and storing into a BigQuery table. Cloud PubSub to BigQuery ingestion is done via PubSub BigQuery subscription. Click here for instructions.

Data Enrichment

This sprint shows reading data from a file to simulate a real-time experience and ingesting to a Cloud PubSub topic and storing into a BigQuery table. Cloud PubSub to BigQuery ingestion is done via Dataflow. Dataflow is also doing data enrichment. Click here for instructions.

Feature Store

This sprint shows a feature engineering platform for Security Analytics. Milestone involve building an enrichment pipeline that reads data from GCS to a Dataflow job that writes to Vertex AI Feature Store. Click here for instructions.

Anomaly Detection

This sprint demonstrates anomaly detection using FeatureStore and AutoML and Vertex AI Model Registry. Click here for instructions.

BigQuery ML

This sprint uses data from streaming and batching datasets to train a K-Means model for clustering. Anomaly detection is demonstrated and results are stored in a BigQuery table. All anomalies are alerted using PubSub. Click here for instructions.

Visualization

This sprint demonstrates a dashboard developed using Looker Studio that shows the various data paths and trigger patterns of Anomaly detection. Click here for instructions.

Cleanup

For resources created and managed by terraform: execute terraform destroy in reverse order. For resources created and managed outside of terraform (created by the pipelines and predictions / models): execute the relevant scripts from the utils directory. Click here for instructions.

Versioning

Initial Version August 2023

Code of Conduct

View

Contributing

View

License

View

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.