Git Product home page Git Product logo

ai-driven-eeg-schizophrenia-diagnosis's Introduction

AI-Driven EEG Schizophrenia Diagnosis

Project Overview

What is schizophrenia?

Schizophrenia is a serious mental disorder that affects how a person thinks, feels, and behaves. It is often described as a type of psychosis, where individuals may have difficulty distinguishing their own thoughts and ideas from reality. The symptoms of schizophrenia can include hallucinations, delusions, muddled thoughts, and speech. The exact cause of schizophrenia is believed to involve a combination of genetic and environmental factors, and it is considered to be a brain disorder.

Define electroencephalogram.

An electroencephalogram (EEG) is a diagnostic test that assesses the electrical activity in the brain by placing small metal discs (electrodes) on the scalp. The brain constantly communicates through electrical impulses, remaining active even during sleep. This ongoing activity is depicted as wavy lines on the EEG recording, providing valuable insights into brain function.

How machine learning can be used to detect schizophrenia using EEG?

Integrating machine learning with electroencephalogram (EEG) analysis can enhance early schizophrenia detection. EEG, a non-invasive method measuring brain electrical activity, is invaluable in capturing diverse patterns during different conditions such as rest, sleep, listening, and cognitive activities. This versatility proves crucial in schizophrenia research, revealing distinctive EEG patterns in patients compared to a control group. Examining EEG across varied conditions contributes to a comprehensive understanding of schizophrenia-related brain activity, bolstering its diagnostic potential. Machine learning algorithms excel in analyzing these intricate EEG patterns, offering insights and identifying potential markers for schizophrenia. Numerous studies underscore the superior performance of machine learning in accurately classifying schizophrenia from diverse EEG data.

Dataset

Metadata (data about the data)

  • The raw data comprises 32 participants (17 of which are diagnosed with schizophrenia and the remaining 15 being the control group).

  • The EEG data recorded using a reference-free montage using the 10-20 system with the following electrodes shown in the table and image below:

    Electrode Location
    Fp1 Frontopolar, Left Hemisphere
    Fp2 Frontopolar, Right Hemisphere
    F3 Frontal, Left Hemisphere
    F4 Frontal, Right Hemisphere
    C3 Central, Left Hemisphere
    C4 Central, Right Hemisphere
    P3 Parietal, Left Hemisphere
    P4 Parietal, Right Hemisphere
    O1 Occipital, Left Hemisphere
    O2 Occipital, Right Hemisphere
    F7 Frontotemporal, Left Hemisphere
    F8 Frontotemporal, Right Hemisphere
    T3 Temporal, Left Hemisphere
    T4 Temporal, Right Hemisphere
    T5 Temporal, Left Hemisphere (posterior to T3)
    T6 Temporal, Right Hemisphere (posterior to T4)
    Fz Frontal Midline
    Pz Parietal Midline
    Cz Central Midline

EEG electrodes placement using 10-20 system, Source: Wikipedia

  • For most of the participants, four phases of EEG data were recorded. The first and third phases being when the participant is at rest. The second phase is when the participant was performing an arithmetic task. The fourth phase is when the participant was subject to frequency-modulated auditory stimuli. There are multiple trials for each participant.
  • All EEG data was saved using the European Data Format(EDF). It is a common file standard for recording multichannel biological and physical data.

Data cleaning

The following steps were taken and outputted to the processed_data folder:

  • Each participant's trial phase data was extracted and saved in a CSV file and would serve as a datapoint in the dataset. They are stored in the eeg_data folder.
  • All invalid EDF files were discarded.
  • A CSV file in Participants Trial Data.csv containing all datapoints with their corresponding metadata (i.e. phase, trial).
  • Another CSV file, Participants Data.csv contains all participants with their respective categories (i.e. Patient or Control).
  • Event markers data i.e. those pertaining to the fourth phase were retrieved and stored in the event_markers folder.

Dataset preprocessing

  • In this project phase, statistical analysis was applied to each datapoint within specific EEG recording phases. Mean, median, variance, standard deviation, and range were computed, and their histograms were inspected.
  • The goal was to identify a statistic with a concentrated datapoint distribution, indicating consistent measurements. Outliers, potentially indicative of erroneous EEG data, were targeted for removal.
  • Given the sensitivity of EEG measurements, where precision is critical, outliers were considered possible instances of inaccuracies.
  • Range emerged as the most fitting statistic, effectively highlighting concentrated regions on histograms and aiding in the identification of potential outliers. This approach enhances the overall reliability of the EEG dataset, crucial for subsequent analyses and machine learning model. The result of this step was stored in the Filtered Range Participant Trial Data.csv.

Dataset generation

The following features can be derived from the EEG data and have been found to have a correlation with people diagnosed with schizophrenia:

  1. Fuzzy Entropy

    • Definition: Fuzzy entropy is a measure of the degree of irregularity or fuzziness in time series data.
    • Significance: Higher fuzzy entropy values indicate increased irregularity in the EEG signal, which may be associated with schizophrenia.
  2. MMN (Mismatch Negativity)

    • Definition: MMN measures the mismatch between the power in the presence of a baseline auditory stimulus supplied at a frequent rate and the power in the presence of a deviant auditory stimulus, which is infrequent.
    • Significance: Deviations in MMN can reveal the brain's sensitivity to unexpected stimuli, a factor often associated with schizophrenia.
  3. Wave Power

    • Alpha (8 to 13 Hz): Detected in a restful state.
    • Beta (12 to 30 Hz): Detected in the performance of cognitive exercises, e.g., arithmetic.
    • Gamma (30 to 100+ Hz): Detected when subjected to auditory stimuli.
    • Significance: Aberrations in wave power, especially in specific frequency ranges, can provide insights into cognitive and sensory processing abnormalities linked to schizophrenia.
  • For each of the aforementioned features, there is a Jupyter notebook (named using the snake case) that implements the computation of the said feature. They are in the folder dataset_generators.
  • There is a dataset that then combines all features with participants for each participant with a complete set of features used as a record.

Model prediction

  • For each of the features created in the previous section(Dataset generation) a Jupyter notebook implements a machine learning model prediction pipeline to diagnosing for schizophrenia. This is also repeated for the combined dataset.

The pipeline for the model is implemented as follows (using MLFlow):

  • An experiment is created and details about the name of the feature(s) being used predict for schizophrenia and description of are supplied. Check the MLFlow documentation for more info ast to what an experiment is.
  • There is a run which is where the actual model logging occurs. Each run has a description which comprises:
    • Model Description: Here the model being used is stated.
    • Model Rationale: Here the reason behind the choice of model and whatever parameters were selected.
    • Dataset Description: Here the dataset information is stated.
    • Dataset Rationale: Here an hypothesis is provided as to what result the dataset would yield.
  • The model is then run and relevant metrics and recorded.
  • The following data is logged: model and dataset parameters, datasets, metrics.
  • The results are then interpreted and logged in a conclusion.

Information about model implementations can be accessed models folder.

How to use

  • Simply run the command below (preferably in a virtual environment):
pip install -r requirements.txt
  • Each notebook contains detailed explanations for each of steps taken for each feature programatically and mathematically.

Notes

  • A comprehensive report is in the works.
  • References would be included in the said report.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.