Git Product home page Git Product logo

ratar's People

Contributors

dominiquesydow avatar

Stargazers

 avatar  avatar

Watchers

 avatar

ratar's Issues

Include HETATM entries

Molecules contain atoms belonging not to standard amino acids:

  • protonated amino acids
  • non-standard amino acids
  • ions
  • water
  • other solvent molecules

For calculations including z-scales, all atoms that do not belong to a standard amino acids are removed.

What about other encoding methods such as pdbqt?

Packages used

Here is a list of the packages that are used in the PR refactoring #1 , queried using https://github.com/volkamerlab/ratar/search?p=1&q=import

biopandas
pandas
numpy
seaborn
pymol

Testing or env scripts:
pytest
yaml

Used but not necessary to put in conda env, since already in python (see here):

  • sys
  • pathlib
  • datetime
  • argparse
  • os
  • glob
  • re
  • shutil
  • subprocess
  • typing
  • contextlib
  • tempfile
  • pickle

Test robustness of reference points

How robust are reference points in binding site?

  • Within one binding site: how do different reference points affect distance distributions?
  • Across similar binding sites (e.g. kinases): similar reference points?

Define binding sites and their size

Binding site definition/size varies between datasets:

  • KLIFS (~1300 atoms)
  • scPDB (~500 atoms)
  • TOUGH (~150 atoms)

What size is needed for good performance of encoding method?
Can we compare performance on these datasets with each other?

Check units and scaling of moments

First three moments of distribution:

  • Same units: 2nd moment - standard deviation; 3rd moment: 3rd root of skewness
  • Scaling: none (should fingerprint/moments be normalised?)

Absolute and relative paths in this repo

Hi @AndreaVolkamer,

I think the paths that you mentioned in your email are only at places where the code is using input or output data paths.
The inputs are not part of this repo but must be on your lab's harddrive. Note:

  • Regarding the KLIFS data, I strongly recommend downloading the latest dataset from their website.
  • The benchmark data can be used from the harddrive.

Examples that I found:

"/home/dominique/Documents/data/benchmarking/fuzcav/sim_dis_pairs/similar_pairs.txt"

data_dir="/home/dominique/Documents/data"

The folder structures on the harddrive should be untouched, hence updating the path prefix /home/dominique/Documents to wherever you copy the data should be enough. Happy to help here if you send me a directory tree once you found the data on the harddrive.

One big note: We used to work in the lab until 2019 (I think) still with pickle files which is not helpful when you do not have fixed versions of a package. So you likely won't be able to load any of the pickle files anymore. As we stopped development on ratar shortly after, we did not update that output file format.

Add `from_path` class method to all ratar.encoding classes

Write class method from_path, analogous to from_molecule. Current problem: files can contain multiple molecules, thus from_path would return a list of molecule objects instead of a molecule object as in case of from_molecule.

Differing behaviour here will not work well downstream, right? Check this.

Required `ratar` updates

Updates needed for this code base:

December 2021

See PR #14 for details.

  • Update environment file as suggested by @t-kimber in #12
  • Check if CLI is still running > now it is again.
  • Update README + installation + usage instructions (update tutorial!!!!)
  • Fill follow-up section below

Follow-up

Methodology / encoding

  • So far only full pocket encoding; we probably need subpocket encoding (overlapping patches)
  • Define binding sites and their size #7
  • Include non-standard amino acids #8
  • Check units of 4th to 6th dimensions of reference points
  • Check units and scaling of moments - First three moments of distribution:
    • Same units: 2nd moment - standard deviation; 3rd moment: 3rd root of skewness
    • Scaling: none (should fingerprint/moments be normalised?)
  • We started to look into pdbqt files to be added as "physchem" properties to our fingerprint, take a look at this notebook if still of interest
  • We already started to benchmark the method against similar/dissimilar pocket pairs from FuzCav, ProSPECCTs, and TOUGH-M1 (see README)
  • Encoding workers fine for mol2 files; pdb files may not and need revision
  • Since we probably have to move to NGLview anyways, PyMol functions have not been checked since 2019; probably they do not work anymore.

Testing and CI

  • Add unit tests for similarity module
  • CI: Add back Windows + MacOS support, lint package, format+lint+test docs tutorials

Code

  • Check similarity module - refactoring needed?
  • Address #FIXMEs and #TODOs in code (left-overs of major refactoring in PR #1); to be done after setting up unit tests
  • Remove pymol dependency (not on conda-forge; currently installed from tpeulen)
  • Remove flatten-dict dependency (only pip-installable)
  • Add from_path class method to all ratar.encoding classes: Write class method from_path, analogous to from_molecule. Current problem: files can contain multiple molecules, thus from_path would return a list of molecule objects instead of a molecule object as in the case of from_molecule.
  • We set up a logging.conf file to fine-grain our logging. Include back into the package if of interest.

Packaging

  • Update ratar environment - enable conda packaging

Installation update for 3.12

The installation instructions in the README only work for <3.12.

I recommend the following two steps:

  1. Get everything running on Python 3.8 or 3.9 (at least the CI tests on those, see here). Might work up until <3.12. How can you get this running? Pin the Python version here as python=3.8 after downloading the package and follow the instructions.
  2. Since the now up-to-date Python version is 3.12. This involves updating the installation* and most probably the code itself as there have been quite some API updates in our dependencies. *Update the repo as described here: wolberlab/dynophores#66

    Update setup.py Python version from >=3.8 to >=3.12
    Update versioneer.py so that it is compatible with Python 3.12

@AndreaVolkamer can you please ping your student here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.