AI-Driven EEG Schizophrenia Diagnosis

Project Overview

What is schizophrenia?

Schizophrenia is a serious mental disorder that affects how a person thinks, feels, and behaves. It is often described as a type of psychosis, where individuals may have difficulty distinguishing their own thoughts and ideas from reality. The symptoms of schizophrenia can include hallucinations, delusions, muddled thoughts, and speech. The exact cause of schizophrenia is believed to involve a combination of genetic and environmental factors, and it is considered to be a brain disorder.

Define electroencephalogram.

An electroencephalogram (EEG) is a diagnostic test that assesses the electrical activity in the brain by placing small metal discs (electrodes) on the scalp. The brain constantly communicates through electrical impulses, remaining active even during sleep. This ongoing activity is depicted as wavy lines on the EEG recording, providing valuable insights into brain function.

How machine learning can be used to detect schizophrenia using EEG?

Integrating machine learning with electroencephalogram (EEG) analysis can enhance early schizophrenia detection. EEG, a non-invasive method measuring brain electrical activity, is invaluable in capturing diverse patterns during different conditions such as rest, sleep, listening, and cognitive activities. This versatility proves crucial in schizophrenia research, revealing distinctive EEG patterns in patients compared to a control group. Examining EEG across varied conditions contributes to a comprehensive understanding of schizophrenia-related brain activity, bolstering its diagnostic potential. Machine learning algorithms excel in analyzing these intricate EEG patterns, offering insights and identifying potential markers for schizophrenia. Numerous studies underscore the superior performance of machine learning in accurately classifying schizophrenia from diverse EEG data.

Dataset

Metadata (data about the data)

The raw data comprises 32 participants (17 of which are diagnosed with schizophrenia and the remaining 15 being the control group).

The EEG data recorded using a reference-free montage using the 10-20 system with the following electrodes shown in the table and image below:

Electrode	Location
Fp1	Frontopolar, Left Hemisphere
Fp2	Frontopolar, Right Hemisphere
F3	Frontal, Left Hemisphere
F4	Frontal, Right Hemisphere
C3	Central, Left Hemisphere
C4	Central, Right Hemisphere
P3	Parietal, Left Hemisphere
P4	Parietal, Right Hemisphere
O1	Occipital, Left Hemisphere
O2	Occipital, Right Hemisphere
F7	Frontotemporal, Left Hemisphere
F8	Frontotemporal, Right Hemisphere
T3	Temporal, Left Hemisphere
T4	Temporal, Right Hemisphere
T5	Temporal, Left Hemisphere (posterior to T3)
T6	Temporal, Right Hemisphere (posterior to T4)
Fz	Frontal Midline
Pz	Parietal Midline
Cz	Central Midline

For most of the participants, four phases of EEG data were recorded. The first and third phases being when the participant is at rest. The second phase is when the participant was performing an arithmetic task. The fourth phase is when the participant was subject to frequency-modulated auditory stimuli. There are multiple trials for each participant.
All EEG data was saved using the European Data Format(EDF). It is a common file standard for recording multichannel biological and physical data.

Data cleaning

The following steps were taken and outputted to the processed_data folder:

Each participant's trial phase data was extracted and saved in a CSV file and would serve as a datapoint in the dataset. They are stored in the eeg_data folder.
All invalid EDF files were discarded.
A CSV file in Participants Trial Data.csv containing all datapoints with their corresponding metadata (i.e. phase, trial).
Another CSV file, Participants Data.csv contains all participants with their respective categories (i.e. Patient or Control).
Event markers data i.e. those pertaining to the fourth phase were retrieved and stored in the event_markers folder.

Dataset preprocessing

In this project phase, statistical analysis was applied to each datapoint within specific EEG recording phases. Mean, median, variance, standard deviation, and range were computed, and their histograms were inspected.
The goal was to identify a statistic with a concentrated datapoint distribution, indicating consistent measurements. Outliers, potentially indicative of erroneous EEG data, were targeted for removal.
Given the sensitivity of EEG measurements, where precision is critical, outliers were considered possible instances of inaccuracies.
Range emerged as the most fitting statistic, effectively highlighting concentrated regions on histograms and aiding in the identification of potential outliers. This approach enhances the overall reliability of the EEG dataset, crucial for subsequent analyses and machine learning model. The result of this step was stored in the Filtered Range Participant Trial Data.csv.

Dataset generation

The following features can be derived from the EEG data and have been found to have a correlation with people diagnosed with schizophrenia:

Fuzzy Entropy
- Definition: Fuzzy entropy is a measure of the degree of irregularity or fuzziness in time series data.
- Significance: Higher fuzzy entropy values indicate increased irregularity in the EEG signal, which may be associated with schizophrenia.
MMN (Mismatch Negativity)
- Definition: MMN measures the mismatch between the power in the presence of a baseline auditory stimulus supplied at a frequent rate and the power in the presence of a deviant auditory stimulus, which is infrequent.
- Significance: Deviations in MMN can reveal the brain's sensitivity to unexpected stimuli, a factor often associated with schizophrenia.
Wave Power
- Alpha (8 to 13 Hz): Detected in a restful state.
- Beta (12 to 30 Hz): Detected in the performance of cognitive exercises, e.g., arithmetic.
- Gamma (30 to 100+ Hz): Detected when subjected to auditory stimuli.
- Significance: Aberrations in wave power, especially in specific frequency ranges, can provide insights into cognitive and sensory processing abnormalities linked to schizophrenia.

For each of the aforementioned features, there is a Jupyter notebook (named using the snake case) that implements the computation of the said feature. They are in the folder dataset_generators.
There is a dataset that then combines all features with participants for each participant with a complete set of features used as a record.

Model prediction

For each of the features created in the previous section(Dataset generation) a Jupyter notebook implements a machine learning model prediction pipeline to diagnosing for schizophrenia. This is also repeated for the combined dataset.

The pipeline for the model is implemented as follows (using MLFlow):

An experiment is created and details about the name of the feature(s) being used predict for schizophrenia and description of are supplied. Check the MLFlow documentation for more info ast to what an experiment is.
There is a run which is where the actual model logging occurs. Each run has a description which comprises:
- Model Description: Here the model being used is stated.
- Model Rationale: Here the reason behind the choice of model and whatever parameters were selected.
- Dataset Description: Here the dataset information is stated.
- Dataset Rationale: Here an hypothesis is provided as to what result the dataset would yield.
The model is then run and relevant metrics and recorded.
The following data is logged: model and dataset parameters, datasets, metrics.
The results are then interpreted and logged in a conclusion.

Information about model implementations can be accessed models folder.

How to use

Simply run the command below (preferably in a virtual environment):

pip install -r requirements.txt

Each notebook contains detailed explanations for each of steps taken for each feature programatically and mathematically.

Notes

A comprehensive report is in the works.
References would be included in the said report.

m1ndb0ts / ai-driven-eeg-schizophrenia-diagnosis Goto Github PK