The problem is to detect anomalous behavior in stock price data collected over the years, using an unsupervised learning method, that is using the autoencoder architecture
- I downloaded the stock price data of GE from yahoo finance website
- The data is read as a pandas dataframe, it has almost 15000 observations and 7 features
- Used seaborn to visualize the data, that is, time vs the closing price
- Divide the data into train and test sets
- Here the data 'Last' column is separated out as a new data frame and is converted to a numpy array
- The data is then scaled (I have used StandardScaler, that standardizes features by removing the mean and scaling to unit variance)
- I used a sequence size of 60 here and then converted the observations to sequences with features and labels for both train and test sets
- I used autoencoder model architecture
- Model is compiled using the default values and mean absolute error loss is calculated
- The model is trained with the data and the label with a batch_size of 32 and 50 epochs
- The training and validation losses are plotted using matplotlib
- We know that anomaly is detected where the reconstruction error is large and we can define a value beyond which we call it an anomaly
- Looking at the MAE in training prediction
- Gathering all details in a dataframe and using seaborn to do a lineplot of the same
- Also plotted the anomalies as colored dots on the data