Git Product home page Git Product logo

rfscorevs's Introduction

RF-Score-VS 1.0

RF-Score-VS is a novel Random Forest-based scoring function for Virtual Screening which predicts binding affinity. Its descriptors are based on RF-Score developed by Pedro Ballester et. al. Presented binary implements RF-Score-VS v2, meaning, it counts atoms of certain types within a 12A radius, divided into 2A bins.

Presented repository contains scripts required to reproduce results included in publication introducing RF-Score-VS.

Standalone scoring function

The RF-Score-VS is available as a standalone scoring function with no dependencies required. Usage instructions and detailed information about the binary are available in README.md file alongside binaries and in separate repository.

Download RF-Score-VS for your platform:

Retraining scoring function / reusing features

Features used in training of RF-Score-VS are available in head1_full directory. They are stored as compressed CSV files (*.csv.gz) and divided by DUD-e target in subdirectories.

If you want to use all data, we provide a convenient flat CSV files.

Requirements for running iPython Notebooks

Required software:

  • Python 2.7
  • ODDT 0.2+
  • OpenBabel 2.4.1+
  • Scikit-Learn 0.17+
  • Seaborn
  • Pandas

Additional software:

  • sklearn-compiledtrees 1.3+ (compiling RFs for final scoring function)
  • dask / ipyparallel / ipython-cluster-helper (parallel computations on cluster)

References:

  • Wójcikowski M, Ballester PJ, Siedlecki P. Performance of machine-learning scoring functions in structure-based virtual screening. Sci Rep. Nature Publishing Group; 2017;7: 46710. doi:10.1038/srep46710

  • Wójcikowski M, Zielenkiewicz P, Siedlecki P. Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field. J Cheminform. 2015;7: 5317. doi:10.1186/s13321-015-0078-2

  • Ballester PJ, Mitchell JBO. A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics. 2010;26: 1169–1175. doi:10.1093/bioinformatics/btq112

  • Ballester PJ, Schreyer A, Blundell TL. Does a more precise chemical description of protein-ligand complexes lead to more accurate prediction of binding affinity? J Chem Inf Model. 2014;54: 944–955. doi:10.1021/ci500091r

  • Li H, Leung K-S, Wong M-H, Ballester PJ. Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets. Mol Inform. WILEY-VCH Verlag; 2015;34: 115–126. doi:10.1002/minf.201400132

rfscorevs's People

Contributors

mwojcikowski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

rfscorevs's Issues

How to train new data?

I found the rf-score-vs is a scoring function, and can not train again.

./rf-score-vs 
usage: rf-score-vs [-h] [-n N_CPU] [--version] [-i {sdf,mol2,pdb,pdbqt}]
                   [-o {csv,sdf,mol2,pdb,pdbqt}] [-O OUT_FILE]
                   [--receptor RECEPTOR] [--field SAVE_FIELDS]
                   in_file [in_file ...]
rf-score-vs: error: too few arguments

I want to train dud-e database, but i found the features in head1_full can not be used by
the rf-train in https://github.com/HongjianLi/RF-Score. The dud-e datasets are different from pdbbind datasets, for the former has active/decoy while the latter has experiment data. How did you train the dud-e data? or how did you get rf-score-vs in details? Could you give me some guides?

Integration with smina

Fantastic work! Are these scoring functions available/able to be integrated with Smina, the fork of vina that allows for use of custom scoring functions?

000_MASTER_plots.ipynb not working by missing dude_classical_final.csv and dude_classical_final_nfolds.csv

Hello, I am highly interested in your work and I want to compare my method with your method.

As mentioned in the title, 000_MASTER_plots.ipynb showed errors due to missing dude_classical_final.csv and dude_classical_final_nfolds.csv .
I failed to fix them since the docking score information is not included in the dataset.
So, could you provide dude_classical_final.csv and dude_classical_final_nfolds.csv?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.