Git Product home page Git Product logo

anomalydetector's Introduction

Anomaly Detector

We describe here how we detect anomalies using Ripe Atlas DNS CHAOS measurements.

Input files:

  • Ground truth: data/k-root-ddos-20151130.csv : this file covers k-root during the ddos of nov30th , when we know for sure there were anomalies. As in the Root DDoS paper.
  • Random day: data/k-root-20170401.csv

Possible algorithms

Theare are many possible algorithms for time series anomaly detection. Most of them rely on periodical data to operate. In this hackaton, we used three algorithms:

  1. Twitter's Robust Time Series Anomaly Detection
  2. ARIMA
  3. and a ad-hoc algorithm derived from our experience
  4. others?

We have run both ARIMA and Twitter's, and even though they detect anomalies automatically, we still need to define what is an anomaly. For example, a variation of 3ms on median RTT values is not necessarily an anomaly.

Architecture

This module take as input CSV files generated by the download and parser, located at <$ADD PATH HERE>

The data flow is as follows:

  1. DNS measurement ID from chaos.id DNS measurement
  2. The downloader.sh download and parses it
  3. The anomalyDetector module runs on it and generate another file that specifies where are the anomalies
  4. This information is then stored and plotted in the module that does the visualization

Ad-hoc algorithm

We document in this section what is the criteria for anomalies

Terminology

We use the definition of Letter as in the Root DDoS paper, Fig1:

  • Letter: IP address of an anycast server (e.g.: k-root, or ns1.dns.nl)
  • Site: a physical location of a anycast site (e.g: kroot-ams )
  • Server: a server under a site (e.g.: k-root-ams-srv1)

Letter-level anomalies

We know that under stable conditions, anycast is pretty stable (see this paper). Meaning that probes should reach the same site over and over.

For a letter level, therefore, we can define the following anomalies: Below we have a sample file (we only consider rcode=0 answers):

timestamp nProbes nSites nQueries nResponses q25RTT q50RTT q75RTT q90RTT
1448841600 8890 24 22759 13887 15.7070 32.9710 58.8360 135.8580

So as we have seen in the Root DDoS paper, under an event, the nProbes goes down. So at a letter level, we define the following anomalies:

  • F1: Reachability failture: nProbes go down, rtt values may or not go up (if the server is mostly down, and just few probes respond, the RTT might not change that much)
  • F2: Performance issues: nProbes does not go down, but the RTT go up

To detect them, we propose:

  • F1: nProbes number is reduced in at least 3x the standard deviation + median
  • F2: q50RTT (quartile 50, or median) or q50TT goes at least 3x the standard devaition + median

To run it:

  • python letter-level-detector.py $input $output

Site-level anomalies

  • F3: the number of probes go median up or below 3x the standard deviation: sites can take the load from others if others go down (see Figure 5 on Root DDoS paper).
  • F2: same as for sites

To run it:

  • python site-level-detector.py $input $output

@Jan Harm

anomalydetector's People

Contributors

woutifier avatar gmmoura avatar

Watchers

Along avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.