This document serves as a plan for the final publication in my PhD. This work is a continuation of the novelty-based detection scheme proposed in the last two publication. Here we discuss some of the foreseeable difficulties in both constructing the dataset as well as training models for anomaly detection. The following section are in order of completion, i.e. creating the dataset needs to happen before training the models.
Dataset creation
We need to translate the spreadsheet-based labels to spectrograms that can be used for training the anomaly detection based models. I began this process late last year, but discovered a bug in the station names extraction code. The following todo lists serves as a guide of the steps needed to
TODO
Pull Jorrit's patch
Separate the anomalies from autocorrelations and cross correlations
Maybe it sufficient to detect anomalies in the autocorrelation spectrum because then we can assume all baselines that contain that station will also contain the anomaly.
Ensure that there is only 1 type of anomaly per class.
There should be only 1 training set for all the anomalies, this training set should contain no anomalies
Should the training set only consist of autocorrelations? This means we have no phase information?
To detect only one specific anomaly we would need to sample anomalies from the other dataset. The scheme I propose is shown below.
One of the problems with this approach is that if not all other known anomalies are contained within the training set, then we may detect the unseen anomaly as the class we are trying to detect
For example, if we are training an oscillating tile detector and used only the two classes shown in the diagram below, then if the model were to be exposed to lightening then it may say it is an oscilating tile?
Other considerations
Focus on 4 anomalies initially
High noise element
Data loss
Oscillating tile
Scintilation
then expand to the more rare classes that may require higher resolution data
Unlike the RFI work, the labels only need to be on a per-spectrogram level
There will be multiple testing classes but 1 training class (that contains only "normal" LOFAR data).
Structure
LOFARAD/
|-- train
||-- crosscorrelations
||||-- Baseline ID
||||-- Baseline ID
||||-- ...
||-- autocorrelations
||||-- Baseline ID
||||-- Baseline ID
||||-- ...
|-- test||-- high_noise
|||-- crosscorrelations
||||-- Baseline ID
||||-- Baseline ID
||||-- ...
|||-- autocorrelations
||||-- Baseline ID
||||-- Baseline ID
||||-- ...
||-- oscillating_tile
|||-- crosscorrelations
||||-- Baseline ID
||||-- Baseline ID
||||-- ...
|||-- autocorrelations
|||||-- Baseline ID
|||||-- Baseline ID
|||||-- ...
|-- <etc etc>
Training procedure
Multi class anomaly detection methodology for LOFAR
Train excluding a single class and all others, evaluate how well it detects that class
This means to detect 4 different anomalies we need 4 different models that are trained on completely different data (similar how we evaluated on MNIST in the NLN paper)
This may also give an opportunity for efficient computing as the models are run in parallel it could offer some interesting implementation details
Model selection
Evaluate NLN on this dataset
Limitations of the method, we predict a scalar for a given input spectrogram, using AE's for this means that we need to average the Reconstruction error which will probably cause poor sensitivity
I think singular mapping using self-supervised or contrastive losses will be best
Investigate alternative self-supervised losses that produce a scalar per spectrogram without averaging the reconstruction error
We have extra information about the baseline (get a model to predict which baseline an input is from)
We can include another subtask?
Do we use a contrastive loss? Are the spectrograms sufficiently different to do so? Maybe if we include a contrastive loss per baseline?
Get the model to fill in the blanks like the paper we referenced for our NLN paper?
Other thoughts
Timeline? When do i need to start writing my thesis?
Potential venues, short conference paper at SPIE, journal paper for MNRAS?
I have mostly finished labelling the LOFAR anomaly detection dataset, there are certainly going to be some inconsistencies, but for now it is sufficient
Classes such as "High_Noise_Elements" are ill-defined, but these can be either discarded or including into the "non-anomalous class"
1) Note, these values are changing as more corrections are made to each class 2) Note, the class labels are not necessarily correct, for examples, what I labelled strong radio emitter is an A-team source in the sidelobes_
Class
# Samples
Other
6953
scintillation
2991
strong_radio_emitter
2444
unknown
976
high_noise_elements
790
lightning
727
data_loss
462
solar_storm
147
electric_fence
142
oscillating_tile
82
empty
73
Labelling interface/the way things are plotted
I make use of Jorrit's code to produce the "Adder Plots"
We use only the magintude spectrum but for all 4 polarisations
Each plot is normalised based on the 1st and 99th percentile across a given sap
To improve dynamic range, a 3rd degree polynomial is fit to the each time slice (1 subband) and then the polynomial is divided through to decrease the dynamic range issues
NOTE the code was written in python2.7 and there is a bug in its such that the station numbers do not correspond to the correct baseline.
Example:
The domain that we labelled in (with polynomial normalisation)
Unprocessed data
Potential directions
We have very imbalanced classes (this is expected in anomaly detection scenarios)
We could technically make a classifier for the classes with enough data (scintillation and strong radio emitter), but for the others such as oscillating tile we would either need to collect more data or use novelty detection/clustering
Option 1:
Train a classifier per class and measure the accuracy on some mixed test set
The classifier would have to be multi-class as a single spectrogram can have multiple classes present
Alternatively, we could take the top-N accuracy of a single classifier to determine which classes are the most likely to be present
Option 2:
Use AE-based novelty detection schemses like NLN
My suspicion is that their success will be limited due to training on the MSE will not create meaningful blurry reconstructions
Furthermore, integrating each spectrogram/patch to give an anomaly score will most likely produce many false negatives.
It is easily to evaluate given the amount of work i've put into it, but i dont think they are the best option
Option 3:
Self-supvervised approaches such as predicting the baseline length or some other subtask may produce meaningful representations for our data.
I think many of the features (lightning, oscilating tile, etc) will not be reflected in an embedding generated by baseline distance (actually its not even possible because we are currently only using autocorrelations
What about training a classifier on a subset of the data and then using it to create an embedding for unseen classes?
We have made progress in data set creation, self-supervised anomaly detection and supervised anomaly detection. However several issues need to be addressed before this work is ready for publication.
Model fine tuning
Fine tuning is not working as well as expected, it seems that by fine-tuning the resnet the fine-tuning performance decreases when comparing it to training a resnet in a supervised manner on the full spectrograms
My suspicion is that it is due to the way we re-assemble patches after projection, i.e. some spatial relationships are discarded.
Unfortunately as we only have labels on a per spectrogram level, we need to reconstruct each latent embedding of each patch into its corresonding dimensionality
Each input is 256x256, it is broken into 256/n nxn patches, and then projected to some latent space dimensinoality
e.g. patchsize = 64, latent dim = 128, then we have 16, 64x64 patches when projected down become a vector of size (16, 128)
Things to try:
Experiment with the reshaping on the vector, currently we flatten the (16,128) -> (1, 2048)
Apply an SVM to the latent projection to see if it does better than the MLP
SVM does approximately the same as as the MLP
Change the classification head to a CNN for example.
Outcome:
I dont think fine tuning makes sense
The reason it does not work is that we are:
Destroying the spatial relationship between patches
Only using the representations learnt from the "normal" data.
It seems that it decreases performance in comparison to just training a resnet on the full spectrograms
Furthemore in both cases we need some supervision, so technically we could just used the self-supervised model to detect anomalies and then detect the specific anomalies using the supervised resnet on the full data.
Finish results for URSI abstract:
Make all three models evaluated on the same data
Label the last few examples in the dataset:
...
Supervised model:
Currently the simple resnet achieved pretty good performance, however it is failing on the high noise element class
I suspect that this is due to a number of incorrect labels, so i think i need to investigate which samples it is breaking on
There are currently two separate branches that are being used for evaluation, the one is for SSL pretraining and the other is for evaluting the fine tuned models. I need to merge them so that all fine tuning and pretraining is done simultaneously.
TO-DO:
Merge main and the code from the class removal
Update resnet structure (remove position classifier from the model)
Add code for changing model backbones
Determine best way to evaluate supervised classifier and detector
Refactor VAE evaluation code by using the eval_knn function
Tests:
Ensure the performance of the models as the same as before refactoring
Experiments for validation of models before committing to writing.
Determine which samples from each class we are misclassifying (if it seems unreasonable, then reassess model)
Obtain an estimate of how long inference takes per spectrogram
Evaluate OOD results in a more appropriate way
I think doing it on a per class level is a little weird
I think we should randomly drop a class train the classifier and see its overall performance at detecting anomalies
Then we can do the same experiments with multiple classes, so see the overall performance wrt to number of classes dropped.
Dataset finalisation
Electric fence?
Third order high noise fused with first order?
Results needed for paper:
The purpose of this paper is to show how we can solve the two problems, these being classification of known failures as well as anomaly detection for unseen anomalies.
Comparison of different anomaly detection models on our dataset vs our method.
Note this might be a little weird as vision-based AD models only evaluate in a single class setting.
I think the single class setting doesn't really make sense, so what we should compare against is fine-tuned ability to do anomaly detection. In the sense that we use all classes and we get it to predict if its anomalous or not. We could also technically do it without subsampling on all available data.
Comparison of classifier with and without SSL corrections that proves that our approach is best
this will probably in the form of the bar graph done in the presentation where we show that we can get a 3% overall increase
Comparing the combined anomaly detection performance of the supervised backbone against the ssl augmentation when some classes are OOD
I envisage this to be a line chart with combined F-2 anomaly detection performance on the y-axis against number of OOD classes on the x-axis.
How many epochs of pre-training/fine tuning repsectively to obtain best performance?
Model ablations:
If we remove parts of the SSL loss we prescribe how does it effect the overall training objective in terms of anomaly detection.
Combination function of classification and detection