Git Product home page Git Product logo

ovarian-tumor-aggregation's Introduction

Solving the problem of incomplete data in medical diagnosis via interval modeling

Introduction

This project describes files which compose the research implementation on supporting medical diagnosis under incomplete data. The approach includes interval modeling of incomplete data, uncertaintification of classical models and aggregation of incomplete results. The evaluation of the approach uses medical data for ovarian tumor diagnosis, where the problem of missing data is commonly encountered.

Citation

If you use the code or results of this project in your research, please cite the following article:

 Wójtowicz, A., Żywica, P., Stachowiak, A., & Dyczkowski, K. (2016).
 Solving the problem of incomplete data in medical diagnosis via interval modeling.
 Applied Soft Computing, 47, 424-437.

DOI

Technical details

All scripts are written in R 3.1.2. The RStudio project is supervised by packrat software to maintain compatibility of R packages. Documents are generated with use of knitr.

Experiment at a glance

The research consists of 3 steps:

  1. making analytic datasets,
  2. training and evaluation,
  3. visualizing results.
--< datasets/db-2015-04-30.csv
|
|              STEP 1                                  STEP 2                                STEP 3
|
|  ##############################     #########################################     #########################
|  #  make-datasets.Rmd         #     #  training-and-evaluation.Rmd          #     #  results-overview.Rmd #
|  ##############################     #########################################     #########################
|  #                            #     #                                       #     #                       #
----> make-datasets.R           #  ----> training-and-evaluation.R            #  ----> results-overview.R   #
   #   |                        #  |  #   |                                   #  |  #                       #
   #   -> datasets/training.csv >---  #   -> datasets/evaluation-output.RData >---  #                       #
   #   -> datasets/test.csv     >---  #                                       #     #                       #
   #                            #     #                                       #     #                       #
   ##############################     #########################################     #########################
     |                                  |                                             |
     -> make-datasets.html              -> training-and-evaluation.html               -> results-overview.html

Downloading the results

To view outputs of the experiment, run download-data.R script. It will download CSV datasets and binary RData output:

  • datasets/training.csv,
  • datasets/test.csv,
  • datasets/evaluation-output.Rdata.

Reproducing the research

To prepare the software environment to the experiment, open ovarian-tumor-aggregation.Rproj file in RStudio in order to launch packrat and download necessary libraries. The installation process may take from a few to several minutes.

Due to legal restrictions, the initial database datasets/db-2015-04-30.csv can not be published. Therefore, the first step is not reproducible. The remaining steps can be reproduced in two ways (A or B, see sections below). To reflect whole experiment, non-reproducible steps also will be mentioned.

A. Creating datasets and final results

To create only datasets and results, which can be further investigated, execute the following scripts:

  1. make-datasets.R (not reproducible),
  2. training-and-evaluation.R.

Created CSV and RData files are mentioned in the section Downloading the results.

Caution: running training-and-evaluation.R is very time-consuming and extensively absorbs computational resources; it is recommended to run it in environment with 32 x 2.0 GHz cores and at least 200 GB RAM in such setting; the calculation process should take approximately 18 hours.

B. Creating datasets, final results and documents

To create the datasets, the results and additionaly generate the documentation (which explain the implementation of the experiment and the results) launch in knitr following .Rmd files:

  1. make-datasets.Rmd (not reproducible),
  2. training-and-evaluation.Rmd,
  3. results-overview.Rmd.

Created CSV and RData files are mentioned in the section Downloading the results. Created HTML files are linked in the section Experiment at a glance.

ovarian-tumor-aggregation's People

Contributors

andre-wojtowicz avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

bikol

ovarian-tumor-aggregation's Issues

Error in n >= lp : default method not implemented for type 'list'

pvals = matrix(p.adjust(pvals, method = "BH", n=sum(!is.na(pvals)|is.nan(pvals))), nrow=nrow(pvals), dimnames=list(rownames(pvals), colnames(pvals)))
this code at section

Training and evaluation of aggregation strategies

didn't work with error :
Error in n >= lp : default method not implemented for type 'list'
image
Could you take a look, please ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.