Git Product home page Git Product logo

riskloc's Introduction

RiskLoc

This repository contains code for the paper RiskLoc: Localization of Multi-dimensional Root Causes by Weighted Risk. Both the implementation of RiskLoc itself and all baseline multi-dimensional root cause localization methods in the paper are included, as well as the code to generate synthetic datasets as described in the paper.

architecture

Short problem description:
RiskLoc solves the problem of identifying the root cause of an anomaly occurring in a time series with multi-dimensional attributes. These types of time series can be regarded as aggregations (the total sum in the simplest case) of numerous underlying, more fine-grained, time series.

For example, a time series T with 2 dimensions (d1 and d2), each with 3 possible values:

  • d1: [a, b, c]
  • d2: [d, e, f]

is built up of 9 fine-grained time series (two examples of these are the time series corresponding to {d1: a, d2: d} and {d1: b, d2: f}).

The goal is to find the specific dimension and dimensional values (the elements) of the root cause when an error occurs in the fully aggregated time series T. This is a search problem where any combination of dimensions and values are considered, and there can be multiple elements in the fanal root cause set. For the example time series above, one potential root cause set can be {{d1: a, d2: [d, e]}, {d1: b, d2: e}}. Since any combination and any number of elements needs to be considered, the total search space is huge which is the main challenge.

Requirements

  • pandas
  • numpy
  • scipy
  • kneed (for squeeze)
  • loguru (for squeeze)
  • pyyaml

How to run

To run, use the run.py file. There are a couple of options, either to use a single file or to run all files in a directory (including all subdirectories).

Example of running a single file using RiskLoc in debug mode:

python run.py riskloc --run-path /data/B0/B_cuboid_layer_1_n_ele_1/1450653900.csv --debug

Example of running all files in a particular setting for a dataset (setting derived to True):

python run.py riskloc --run-path /data/D/B_cuboid_layer_3_n_ele_3 --derived

Example of running all files in a dataset:

python run.py riskloc --run-path /data/B0

Example of running all datasets with 20 threads:

python run.py riskloc --n-threads 20

Changing riskloc to any of the supported algorithms will run those instead, see below.

Algorithms

Implemented algorithms: RiskLoc, AutoRoot, RobustSpot, Squeeze, HotSpot, and Adtributor (normal and recursive).

They can be run by specifying the algorithm name as the first input parameter to the run.py file:

$ python run.py --help
usage: run.py [-h] {riskloc,autoroot,squeeze,old squeeze,hotspot,r_adtributor,adtributor} ...

RiskLoc

positional arguments: {riskloc,autoroot,robustspot,squeeze,hotspot,r_adtributor,adtributor}

                        algorithm specific help
    riskloc             riskloc help
    autoroot            autoroot help
    robustspot          robustspot help
    squeeze             squeeze help
    hotspot             autoroot help
    r_adtributor        r_adtributor help
    adtributor          adtributor help

optional arguments:
  -h, --help            show this help message and exit

The code for Squeeze is adapted from the released code from the original publication: https://github.com/NetManAIOps/Squeeze. The code for RobustSpot is similarly adapted from their recently released code: https://github.com/robustspotproject/RobustSpot.

To see the algorithm-specific arguments run: python run.py 'algorithm' --help. For example, for RiskLoc:

$ python run.py riskloc --help
usage: run.py riskloc [-h] [--data-root DATA_ROOT] [--run-path RUN_PATH] [--derived [DERIVED]] [--n-threads N_THREADS] [--output-suffix OUTPUT_SUFFIX] [--debug [DEBUG]] [--risk-threshold RISK_THRESHOLD] [--pep-threshold PEP_THRESHOLD] [--n-remove N_REMOVE] [--remove-relative [REMOVE_RELATIVE]] [--prune-elements [PRUNE_ELEMENTS]]

options:
  -h, --help                           show this help message and exit
  --data-root DATA_ROOT                root directory for all datasets (default ./data)
  --run-path RUN_PATH                  directory or file to be run; if a directory, any subdirectories will be considered as well;
                                       must contain data-path as a prefix
  --derived [DERIVED]                  derived dataset (defaults to True for the D and RS datasets and False for others)
  --n-threads N_THREADS                number of threads to run
  --output-suffix OUTPUT_SUFFIX        suffix for output csv file
  --debug [DEBUG]                      debug mode
  --risk-threshold RISK_THRESHOLD      risk threshold
  --pep-threshold PEP_THRESHOLD        proportional explanatory power threshold
  --n-remove N_REMOVE                  number of elements to ignore when computing the cutoff point
  --remove-relative [REMOVE_RELATIVE]  if true then n_remove is a percentage value
  --prune-elements [PRUNE_ELEMENTS]    use element pruning (True/False)

The risk-threshold and below arguments are specific for the RiskLoc while the rest are shared by all algorithms. To see the algorithm-specific arguments for other algorithms simply run them with the --help flag or check the code in run.py.

Datasets

The real-world dataset with derived measures from RobustSpot (RS) is already present in the data folder and can be used immediately.

The semi-synthetic datasets can be downloaded from: https://github.com/NetManAIOps/Squeeze. To run these, place them within the data/ directory and name them: A, B0, B1, B2, B3, B4, and D, respectively.

The three synthetic datasets used in the paper can be generated using generate_dataset.py as follows.

S dataset:

python generate_dataset.py --num 1000 --dataset-name S --seed 121

L dataset:

python generate_dataset.py --num 1000 --dataset-name L --seed 122 --dims 10 24 10 15 --noise-level 0.0 0.1 --anomaly-severity 0.5 1.0 --anomaly-deviation 0.0 0.0 --num-anomaly 1 5 --num-anomaly-elements 1 1 --only-last-layer

H dataset:

python generate_dataset.py --num 100 --dataset-name H --seed 123 --dims 10 5 250 20 8 12

In addition, new, interesting datasets can be created using generate_dataset.py for extended empirical verification and research purposes. Supported input arguments can be found at the beginning of the generate_dataset.py file or using the --help flag.

Citation

If you find this code useful, please cite the following paper:

@article{riskloc,
  title={RiskLoc: Localization of Multi-dimensional Root Causes by Weighted Risk},
  author={Kalander, Marcus},
  journal={arXiv preprint arXiv:2205.10004},
  year={2022}
}

riskloc's People

Contributors

shaido987 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

riskloc's Issues

input data

Hi, I have a question about the input data, how is the data input to the different algorithms? Thanks a lot in advance

Question about the value of "n_remove" in riksloc

Hi,

I hope you are well.

When I used riskloc in my dataset, I noticed that it can precisely found the root cause. However, my purpose is to find those anomalies that occur more frequently, so I would consider those rare root causes I found would be some outliers. Then I tried to increase the value of "n_remove" , but still not got my expected result.

Also, when I decrease the "n_remove" to 1, the "cutoff" value shifted a lot, and the output return null. When I do the same thing in another dataset, the result was not affected. I compared the distributions of measurements of 2 datasets, the first one is more like normal distribution, the second one is like long-tailed distribution.

Here are my questions:

  1. Is adjusting n_remove a way to do what I expect? If yes, is there some more reliable way than setting constants arbitrarily?
  2. Does the distribution of the measurements range affect the performance of the algorithm?

I am looking forward to your reply.

About dataset generate.

Hello:
I have a question about the method of anomaly injection(scale_anomaly) in generate_dataset.py, why should a relatively large value be taken in row*(1-r) and 0, which will cause the predicted value of some abnormal combination to be 0, so in It will be filtered out when using squeeze.

demo for squeeze

Hi,

I try to understand the algorithm squeeze and someone recommend your project. But I can't understand the detail of how each function works, especially the meaning of the input of def squeeze(df, attributes, delta_threshold=0.9, debug=False), line 124 in squeeze.py. Could you provide a demo in the future? Thank you.

Best,
YYL

the result of adtributor

hi, I have ran the adtributor algorithm with the B0 dataset, but get 0 TP. Is there something wrong about the code.

hotspot方法:关于PS度量因置信度的可解释性

大佬您好,PS方法采用RE(涟漪效应)来度量因的置信度,如何理解PS方法的原理

image

很多人的猜想类似于下面的:
如果属性值是因 , 属性值的变化和属性值样本的变化符合涟漪效应;
如果属性值的变化和属性值样本的变化符合涟漪效应,则属性值是因

这种理解对么

About the forecasts

Thanks for your excellent work! It really helped me a lot.

Recently, I have also been focusing on issues in this area (Multi-dimensional Root Causes Analysis), and as you mentioned:

In practice, I found that the most difficult step is to get accurate forecasting values for all leaf elements. Since these are usually quite fine-grained, they don't actually have much data and any forecasts are often inaccurate. This can skew the results.

I've also found it very difficult to get the forecast value of all leaves, in particular, certain combinations have only a small number of values or are almost 0, is there any suitable forecasting method worth recommending in this case?

Or have you tried using the RiskLoc algorithm in a real industrial scenario, and if so, can you share what forecasting method you used in this case?

question of surprise in adtributor

The calculation of surprise value in adtributor seems not correct to me.

The JS divergense formula should be:

2021-12-15_18-19-49

So, the code should be:

p = df['predict'] / F
q = df['real'] / A
m = (p + q) / 2
df['surprise'] = 0.5 * np.sum(p * np.log(p/m)) + 0.5 * np.sum(q * np.log(q/m))

what do you think? thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.