marknzed / artimenab.jl Goto Github PK

ARTime detector for the Numenta Anomaly Benchmark

License: GNU Affero General Public License v3.0

Julia 100.00%

artimenab.jl's Introduction

ARTimeNAB

This is an open source (AGPL3) and simplified version of ARTime, an anomaly detection algorithm. It supports the Numenta anomaly benchmark (NAB) ARTime detector.

NAB includes the ARTime detector, please start there to see & reproduce the ARTime results. The NAB environment uses PythonCall to install ARTime from this repository.

ARTime was developed by Mark Hampton.

Running ARTime with NAB

The Python JuliaCall module is used with NAB and the version of JuliaCall we are using defaults to installing the latest stable version of Julia (ignoring the Julia version in the juliacalldeps.json file at the root of NAB). ARTime is no longer compatible with the most recent Julia language. To use Julia 1.7.0 with JuliaCall you must install Julia 1.7.0 and set the environment variable: PYTHON_JULIACALL_EXE to the Julia 1.7.0 binary executable before running the ARTime detector.

There is a fork of the NAB repo with a docker environment that runs ARTime at https://github.com/markNZed/NAB/tree/docker with a README

Acknowledgements

Stephen Grossberg and Gail Carpenter developed adaptive resonance theory (ART). Grossberg's 2021 book Conscious Mind, Resonant Brain: How Each Brain Makes a Mind was the major inspiration for ARTime.

Numenta provided NAB to inspire innovation in anomaly detection. It was very valuable in testing ARTime. The paper introducing NAB is from Ahmad, S., Lavin, A., Purdy, S., & Agha, Z. (2017). Unsupervised real-time anomaly detection for streaming data. Neurocomputing, Available online 2 June 2017, ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2017.04.070

The excellent Julia package AdaptiveResonance.jl was extremely useful in getting ARTime off the ground. Modifications in the DVFA implementation of AdaptiveResonance.jl led to a compact version of AdaptiveResonance being included in ARTime.

@isentropic was a great help in introducing me to Julia and improving the quality of the code.

Where to from here

Unfortunately deep learning catastrophically forgot why it was not a panacea in the 1980s. A talented team of computational neuroscientists could push ART much further in machine learning... For an introduction to Grossbergian Neuroscience look no further than Yohan John's Neurologos channel on YouTube.

artimenab.jl's People

Contributors

Stargazers

Watchers

Forkers

earthgecko aifeixuelo

artimenab.jl's Issues

Threshold of anomaly

Hi, I have a quick question about how to decide the threshold for the predicted anomaly. I see there is always a pre-defined threshold in thresholds.json for NAB. How do you decide the threshold in ARTime? Do you find it with the ROC curve or something? What do you think is a good threshold strategy (e.g., a dynamic threshold) for online settings?

Cannot find reference 'Main' in 'init.py'

Hi @markNZed

When I tried to merge ARtime into the NAB library I found that it would not work and produced some error messages》
The output from the IDE is:

The error in the code is reported as:

Does the above error report have anything to do with the python version? Or am I missing something
After I added some printing to the above code, the output of the IDE was:

Unbounded memory growth on stream data

Hi @markNZed
Congrats on taking the top spot in NAB! Your contribution pointed my interest towards this entirely new (to me) area of neuroscience, for which you have my sincerest thanks.

I took ARTime for a spin, and wanted to see how it fares in a streaming scenario (i.e. ~infinite series), but I noticed something which slightly worries me:
It seems that the internal DVFA structures are growing without any limits.

Please take a look at this snippet:

julia> using ARTime, Random, Distributions
julia> Random.seed!(123)

julia> p = ARTime.P(); ARTime.init(-2,2,210000,p)
julia> size(p.cs.art.W)[1] * size(p.cs.art.W)[2]  + size(p.cs.art.M)[1] + size(p.cs.art.Me)[1]
0 #Size before any processing

# Let's say we have a slightly noise sine wave
julia> for x in range(0, 200π, length=10000) 
    y = sin(x) + 0.1 * randn() 
    ARTime.process_sample!(y, p)
end


julia> size(p.cs.art.W)[1] * size(p.cs.art.W)[2]  + size(p.cs.art.M)[1] + size(p.cs.art.Me)[1]
2788  # Internal struct size after 10K points

# Now there's a longer period where the noise more pronounced
julia> for x in range(0, 2000π, length=100000) 
    y = sin(x) + 0.2 * randn()
    ARTime.process_sample!(y, p)
end 

julia> size(p.cs.art.W)[1] * size(p.cs.art.W)[2]  + size(p.cs.art.M)[1] + size(p.cs.art.Me)[1]
27336 # Internal struct size after 110K points

# Noise-levels are down, but the frequency of sine wave has changed
julia>  for x in range(0, 200π, length=100000) 
    y = sin(x) + 0.1 * randn() 
    ARTime.process_sample!(y, p)
end 

julia> size(p.cs.art.W)[1] * size(p.cs.art.W)[2]  + size(p.cs.art.M)[1] + size(p.cs.art.Me)[1]
237422 # Internal struct size after 210K points

julia> p.cs.art.n_categories
6983

julia> p.cs.art.n_clusters
739

(I'm aware that there are more internal state variables than W, M, Me, but they grow at similar pace so I omitted them here)

I know that this example is a bit nasty, but this is just to illustrate something that I also see on my real data i.e., that with enough time, the ARTime process will eventually run out of memory and crash (which is not the case for e.g., HTM). It seems that the DVFA never ceases to create new clusters and categories.

Is this an intentional behavior (or maybe some sort of optimization for NAB)?

Is there any way to limit the memory usage (or e.g., somehow compact the current state) without forgetting catastrophically (i.e. full state reset)?

I would like the algorithm to keep on adapting to the stream (rather than use learned state) - but it seems to have infinite appetite for memory.

Non NAB Version

Hello is there a version of ARTime that can be used outside of NAB. I am looking for online structural break detection algorithm and not necessarily use NAB

Thanks

Allocation problem stops running.

Hi, I really like your method and want to use it as a compared method. Currently, I'm running the ARTime under NAB. I think I have set everything correctly and tried to run the following command:

python run.py -d ARTime --detect --optimize --score --normalize --windowsFile labels/combined_windows_new.json

However, I encounter the following error, which I have spent a lot of time on it and don't know how to fix it. Hope you can provide some guidance.

Running detection step
0: Beginning detection with ARTime for realAWSCloudwatch/ec2_cpu_utilization_77c1ca.csv
2: Beginning detection with ARTime for realAWSCloudwatch/ec2_network_in_5abac7.csv
1: Beginning detection with ARTime for realAWSCloudwatch/ec2_disk_write_bytes_1ef3de.csv

signal (11): Segmentation fault
in expression starting at none:0
Allocations: 2494562 (Pool: 2493386; Big: 1176); GC: 2
Segmentation fault (core dumped)

[Q] processing larger batch - initial phase

Hi @markNZed

Firstly, congratulations on taking 1st place on the NAB scoreboard, that is quite an achievement 🎉 and great contribution.

It is quite amazing that julia code can run directly in Python, a fine testament to a community effort.

I have a few questions that perhaps you could answer for me.

The implementation in here or in https://github.com/markNZed/NAB/tree/ARTimeNAB more specifically, is aimed at running through the dataset in an iterative manner (as per NAB) to score each data point. Is it possible to process the data set in large batches? For example, could one process 90% of data in one shot for training/learning, not being concerned with anomaly scores (p) in this phase and then iterate the last 10% of the data set as per the ARTimeNAB method and determine anomaly scores (p).
If so, would that be quicker than the iteration method?

I did a test passing a values list to jl.ARTime rather than a single value and it returned an object with all the expected data, just as if it was one value and then iterated the final part of the data and did not get the expected result (an anomaly which is present in the iterative method), so that method I tried does not work, so I am wondering if there is a way to do it that will work.

Can I understand your method as an online learning method which update the model step-by-step over time?

I'm really impressed by this method and am going to use it as a compared method in my experiment.

Can I interpret it as an advanced version of the Hierarchical Temporal Memory method? Also, is this method an unsupervised and online learning method?

Julia error when running ARTime in NAB

Sorry to bother you, I got an error after run python run.py -d ARTime --detect --optimize --score --normalize --skipConfirmation.

ERROR: LoadError: setfield!: const field .name of type TypeName cannot be changed Stacktrace: [1] setproperty!(x::Core.TypeName, f::Symbol, v::Symbol) @ Base ./Base.jl:39 [2] top-level scope @ ~/.julia/packages/RedefStructs/JMYNd/src/RedefStructs.jl:138 [3] include @ ./Base.jl:419 [inlined] [4] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::Nothing) @ Base ./loading.jl:1554 [5] top-level scope @ stdin:1