Comments (2)
Hello, @RahimKh !
Thank you for your remarks! Cause we are still working on the methodology for algorithms evaluation, your comment is helpful.
We do see 3 possible ways of model fitting (fault-free train set selection):
- (priority option) Using separate file (data/anomaly-free/anomaly-free.csv) with relatively long fault-free operating mode. Once trained and applied to all datasets. The problem here is connected with some troubles that appeared during the data collecting, making the fault-free dataset too different from most of the other datasets. It caused the recognition of the wrong patterns by the anomaly detection algorithms. We are currently working on collecting a proper fault-free dataset for model fitting in the future.
- Use the beginning of one dataset as a fault-free mode. Once trained and applied to all datasets.
- Use the beginning of each dataset as a fault-free mode. Trained and applied using every single dataset.
We have selected the 3rd way, for now, using the first 400 points of each dataset (approx 1/3 of the total number of points) as a train set. It is not entirely fair (doing so, we decrease the number of unknown points, making the problem easier to solve), but still, it is ok for a changepoint detection problem. As for the outlier detection problem: though generally, you are right saying "results need to be done on only data that is unknown to the model" for metrics (FAR, MAR, F1) calculation, it can still be an option. In this case, the results are just slightly overstated.
We definitely want to switch to the 1st way of model fitting. Probably we will switch to the 2nd way while the proper separate fault-free dataset is unavailable.
from skab.
The answer moved to the slides about the project.
from skab.
Related Issues (20)
- Add preprocessing for results improving and propose separate leaderboard for preprocessed methods
- Conv-AE Results not reproducable HOT 1
- Prepare English version of EDA HOT 2
- Replace custom MeshLoader with itertools.product()
- About datasets and EDA HOT 1
- MeshLoader issue
- SKAB 1.0 release HOT 1
- utils/evaluating.py HOT 2
- Sensor details HOT 3
- About evaluating code HOT 3
- Missing F1-score for outlier detection problem
- Changepoint issue HOT 2
- Data<->Readme mismatch
- Need to update T-squared algorithm and its results
- Anomalous indices loop HOT 2
- Anomaly_detection is not found in ArimeFD.ipynb
- irregular sampling rate and gaps HOT 1
- No details regarding the AD algorithms
- The idea of using SKAB as a forecasting benchmark. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from skab.