In this project, an LSTM Autoencoder is trained and applied in the context of outlier detection. The dataset used for this task is the Ozone Level Detection dataset. The task is to use weather and enviromental data to predict an extreme weather event.
Due to the limited number of extreme weather events, anomoly detection in this context serves as the apropriate tool to use. A model can be constructed which learns from the normal weather events, and applied with the aim of recognising extreme events as anomolies.
The core idea behind an autoencoder is to learn a representation that can be used to reconstruct the original input sequence. In the context of our problem, detecting extreme weather events, we want to train a model to reconstruct normal weather events which ultimately leads to a learnt model able to detect ‘normality’ as the distirbution used to train the model is different from the extreme weather events distribution.
In the cases of extreme weather events, the model has not ‘seen’ these types of sequences before, and ultimately has trouble reconstructing the sequence; this leads to a high reconstruction error. The reconstruction error then acts as a metric whereby we can set some threshold, i.e. if reconstruction error is > some_threshold must be an extreme weather event.
There are ways to set the threshold algorithmically, for now, I have used manual inspection. I compute the distribution of the reconstruction loss and pick a cutoff point (see below).
Why an LSTM-autoencoder? Due to weather events exhibiting a temporal nature, an LSTM can make use of the long-range dependencies inherit in the dataset. This approach assumes there is a temporal nature / context in predicting an event.
I did ponder other approaches, rather than take the non-parametric discriminative route, one could use a generative approach to model the underlying distribution such as a variational autoencoder LSTM which would additionally provide certainty estimates. Besides this, a classical approach could also be to construct a distribution of normal weather events and use this as a reference when testing for extreme weather events.
The specific Python version is 3.9 using PyTorch CUDA 11. Packages can be installed from the included requirements.txt. To the run the code, execute main.py with desired parameters.
This section describes the end-to-end logic within this pipeline.
- Start by loading the ozone data, append the labels and perform some basic cleaning.
- Remove features where the absolute correlation > some threshold.
- Split the data into, train, valid, test and anomoly.
- Impute missing values by Multiple Imputation by Chained Equations
- Normalise and standardise the dataset.
- Convert the data to PyTorch tensors.
- Train/Load an LSTM-AE with/without dropout layer.
- Construct the distribution of the training reconstruction loss
- Pick a threshold using the training distribution.
- Compute anomoly losses per sample, if loss > threshold, extreme weather event
- Normal evaluation etc
This section discusses the evalaution of the chosen approach. For reference, I did not perform any hyperparameter exploration such as a grid search. I simply tinkered with the architecture manually.
The below image depicts the training and validation loss curves. The model tends to generalise around 75 epochs.
The below image depicts the reconstruction loss for the training set. Based on this distribution we can set a threshold at the peak of the distribution around 0.07.
The below image depicts the reconstruction loss for the anomoly set. As seen there is overlap between the threshold set and the ability to reconstruct the anomoly dataset.
Using both the testing and anomoly set, I am ebale to compute the F1 metric (micro) which takes into account FP/FN/TP/TN etc. The perfect score is 1.
Encoder Layers | Decoder Layers | Hidden Units | F1-Micro |
---|---|---|---|
2 | 2 | 64 | 0.61 |
2 | 2 | 32 | 0.70 |
2* | 2* | 16* | 0.75* |
1 | 1 | 64 | 0.58 |
1 | 1 | 32 | 0.64 |
1 | 1 | 16 | 0.61 |
The most accurate method was a model with 2 layers in the encoder, 2 layers in the decoder and 16 hidden units. The network is able to predict all extreme weather events, however, due to false positives the F1 micro is 75%.
Given more time, one would also explore the following:
- Exhuastive Hyperparameter exploration i.e. grid search: model architecture, regularization (I added dropout, but could further experiment).
- Feature engineering - Engineering robust features, principal component analysis, representation learning etc.
- Type of approach: One could use other types of approaches such as LSTM for classification with SMOTE upsampling.
- CONSULT THE LITERATURE!!! There is most likely tons of useful academic literature in this space.