hello / anomaly-detection Goto Github PK
View Code? Open in Web Editor NEWpython scripts for anomaly-detection
python scripts for anomaly-detection
For example:
On 2016-03-21, log messages say account 51846 has an anomaly, but rerunning the same anomaly detection algorithm with the same parameters produces a "not anomaly result." Note: current run of dbscan says that NO date in march is an anomaly, which rules out off-by-one timezone confusions.
73294 2016-03-21 09:20:07,534 - root - INFO - anom_date=2016-03-20 account_id=51846
73295 2016-03-21 09:20:07,554 - root - INFO - Success insertion into anomaly_results_raw for account_id=51846 target_date=2016-03-21 00:00:00 alg_id=1 row_id=1469045.
73296 2016-03-21 09:20:07,558 - root - INFO - anom_date=2016-03-20 account_id=51846
73297 2016-03-21 09:20:07,566 - root - INFO - Success insertion into anomaly_results_raw for account_id=51846 target_date=2016-03-21 00:00:00 alg_id=4 row_id=1469046.
73298 2016-03-21 09:20:07,569 - root - INFO - anom_date=2016-03-20 account_id=51846
73299 2016-03-21 09:20:07,579 - root - INFO - Success insertion into anomaly_results_raw for account_id=51846 target_date=2016-03-21 00:00:00 alg_id=3 row_id=1469047.
73300 2016-03-21 09:20:24,611 - root - INFO - one_data_length=176
This was found by rerunning algorithms for user responses in the past
actu: [1, 0, 0, 1, 0, 0, 0, 0, 1, 0]
pred: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] <= This should be all 1's. Bug here.
actu: [0, 0, 1, 0, 1, 0, 0, 0]
pred: [0, 0, 0, 0, 1, 1, 0, 0]
pred: [0, 0, 1, 0, 0, 0, 1, 0] <= minus 1 day
pred: [0, 1, 1, 0, 0, 0, 0, 0] <= minus 2 day
Problem seems to disappear on 2016-03-28 which may be a reason for the sudden improvement in accuracy
actu: [1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1]
pred: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
The current query for pulling data is as follows
SELECT SUM(ambient_light), count(1), date_trunc('hour', local_utc_ts) AS hour
FROM prod_sense_data
WHERE account_id = 26173
AND local_utc_ts > '2016-01-25'
AND local_utc_ts < '2016-01-26'
AND extract('hour' from local_utc_ts) < 6
GROUP BY hour
ORDER BY hour ASC;
The problem is that there are duplicate rows, resulting in count(1) being > 60 for a single hour. This is also uneven for a user across one hour, which means that sum(ambient_light) can be biased towards random points.
Some options to combat this:
SUMMARY
jyfan/multi_alg
I get a spawn errorpython run.py configs/prod.yml
instead of with supervisor, process is successfulmaster
, process is successfulSo the cause of the supervisor error is some change I made in the branch jyfan/multi_alg
- trying to figure out which change did it, but I don't have an understanding of the differences between running the python script via supervisor versus not.
cc @pims
ubuntu@ip-10-0-0-47:~/anomaly-detection$ git branch
* jyfan/multi_alg
master
ubuntu@ip-10-0-0-47:~/anomaly-detection$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
ubuntu@ip-10-0-0-47:~/anomaly-detection$ supervisorctl start anomaly:*
anomaly: started
ubuntu@ip-10-0-0-47:~/anomaly-detection$ supervisorctl status
anomaly RUNNING pid 15331, uptime 0:00:21
ubuntu@ip-10-0-0-47:~/anomaly-detection$ supervisorctl stop anomaly:*
anomaly: stopped
ubuntu@ip-10-0-0-47:~/anomaly-detection$ supervisorctl status
anomaly STOPPED Jan 13 07:19 PM
ubuntu@ip-10-0-0-47:~/anomaly-detection$ git checkout jyfan/multi_alg
Switched to branch 'jyfan/multi_alg'
Your branch is up-to-date with 'origin/jyfan/multi_alg'.
ubuntu@ip-10-0-0-47:~/anomaly-detection$ git pull
Username for 'https://github.com': jykfan
Password for 'https://[email protected]':
Already up-to-date.
ubuntu@ip-10-0-0-47:~/anomaly-detection$ supervisorctl status
anomaly STOPPED Jan 13 07:19 PM
ubuntu@ip-10-0-0-47:~/anomaly-detection$ supervisorctl start anomaly:*
anomaly: ERROR (spawn error)
ubuntu@ip-10-0-0-47:~/anomaly-detection$ supervisorctl status
anomaly FATAL Exited too quickly (process log may have details)
ubuntu@ip-10-0-0-47:~/anomaly-detection$ python run.py configs/prod.yml
2016-01-13 19:24:07,558 - __main__ - INFO - test
2016-01-13 19:24:07,781 - root - DEBUG - Found 17500 account_ids
2016-01-13 19:24:16,727 - root - INFO - 2016-01-01 is an anomaly for account 32769
2016-01-13 19:24:16,727 - root - INFO - query: INSERT INTO anomaly_results_raw (account_id, target_date, anomaly_days, alg_id) VALUES ('32769', '2016-01-13T00:00:00'::timestamp, ARRAY['2016-01-01T00:00:00'::timestamp], '1') RETURNING id
2016-01-13 19:24:16,734 - root - INFO - Success insertion into anomaly_results_raw for account_id=32769 target_date=2016-01-13 00:00:00 alg_id=1 row_id=281150.
^CTraceback (most recent call last):
File "run.py", line 47, in <module>
main()
File "run.py", line 41, in main
app.run(account_id, conn_sensors, conn_anomaly, dbscan_params[dbscan_params_i])
File "/home/ubuntu/anomaly-detection/app/logic.py", line 100, in run
ORDER BY hour ASC""", dict(account_id=account_id, start=thirty_days_ago, end=now))
File "<string>", line 8, in __new__
KeyboardInterrupt
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.