Git Product home page Git Product logo

proteins-stability's Introduction

Evaluation of feature selection and ranking methods' stability in the context of discovery proteomics data

Scripts for the Double Cross Validation described in my thesis.

The script was designed to handle datasets with small number of samples (~50) and bigger number of variables (~800). It uses many classifiers and Heavy feature selection procedures. Considering a laptop with i7: the ranking may take minutes/hours and the prioritization based on small sets of features may take days. In my study, 5 days.

Please cite my work, go to section Citation

Dependencies

# Tested with Python 3.6.8
# go to stability_selection and install pacakges
cd lib/stability_selection
pip install -r requirements.txt
python setup.py install

# go back to the proteins-stability folder and install packages
cd ../..
pip install -r requirements.txt

Would you like to contribute?

Any performance and design improvements are welcome! When my thesis is public available, I will post a link here.

Figures

The following Figures were some generated by the script and some were build using the .csv files that the scripts output.

Figure 1

General Pipeline to score the proteins and further rank them using all the information.

Figure 2

The RFA version developed for my thesis showed to be more Stable and rank better our true biomarkers than the classic RFE method.

If you use RFA or would like to contribute, please refer to its own repository: https://github.com/heberleh/recursive-feature-addition

Please cite my work, go to section Citation

Figure 3

For each training data set (color), count the number of times that a Protein appeared as top-10 in the 40 ranks. The greater is the number of colors and size of bars, more stable is the protein.

Figure 4

Heatmap showing the highest 50% average scores of each protein for each Train dataset (column). Some proteins are not selected to be processed using some training sets (columns), or obtain extremely low score (blue).

Figure 5

Top-10 proteins selected from a rank of good and stable proteins, such as the one in Figure 3.

Citation

Heberle, Henry Computational methods in Biology: cancer biomarkers, protein networks and lateral gene transfer. (University of São Paulo, 2019).

Heberle, H. Computational methods in Biology: cancer biomarkers, protein networks and lateral gene transfer. (University of São Paulo, 2019).

@phdthesis{Heberle2019,
    author = {Heberle, Henry},
    pages = {164},
    school = {University of S{\~{a}}o Paulo},
    title = {{Computational methods in Biology: cancer biomarkers, protein networks and lateral gene transfer}},
    type = {Doctoral Dissertation},
    year = {2019}
}

proteins-stability's People

Contributors

heberleh avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.