The ratar from volkamerlab

ratar's Issues

Check units of 4th to 6th dimensions of reference points

Include HETATM entries

Molecules contain atoms belonging not to standard amino acids:

protonated amino acids
non-standard amino acids
ions
water
other solvent molecules

For calculations including z-scales, all atoms that do not belong to a standard amino acids are removed.

What about other encoding methods such as pdbqt?

Update `ratar_tutorial.ipynb` after PR #1 and #2

Packages used

Here is a list of the packages that are used in the PR refactoring #1 , queried using https://github.com/volkamerlab/ratar/search?p=1&q=import

biopandas
pandas
numpy
seaborn
pymol

Testing or env scripts:
pytest
yaml

Used but not necessary to put in conda env, since already in python (see here):

sys
pathlib
datetime
argparse
os
glob
re
shutil
subprocess
typing
contextlib
tempfile
pickle

Integrate pdbqt code into ratar

Test robustness of reference points

How robust are reference points in binding site?

Within one binding site: how do different reference points affect distance distributions?
Across similar binding sites (e.g. kinases): similar reference points?

Define binding sites and their size

Binding site definition/size varies between datasets:

KLIFS (~1300 atoms)
scPDB (~500 atoms)
TOUGH (~150 atoms)

What size is needed for good performance of encoding method?
Can we compare performance on these datasets with each other?

Check units and scaling of moments

First three moments of distribution:

Same units: 2nd moment - standard deviation; 3rd moment: 3rd root of skewness
Scaling: none (should fingerprint/moments be normalised?)

Absolute and relative paths in this repo

Hi @AndreaVolkamer,

I think the paths that you mentioned in your email are only at places where the code is using input or output data paths.
The inputs are not part of this repo but must be on your lab's harddrive. Note:

Regarding the KLIFS data, I strongly recommend downloading the latest dataset from their website.
The benchmark data can be used from the harddrive.

Examples that I found:

ratar/scripts/benchmarking.py

Line 43 in 197d3f4

 "/home/dominique/Documents/data/benchmarking/fuzcav/sim_dis_pairs/similar_pairs.txt" 

ratar/scripts/ratar.sh

Line 7 in 197d3f4

data_dir="/home/dominique/Documents/data"

The folder structures on the harddrive should be untouched, hence updating the path prefix /home/dominique/Documents to wherever you copy the data should be enough. Happy to help here if you send me a directory tree once you found the data on the harddrive.

One big note: We used to work in the lab until 2019 (I think) still with pickle files which is not helpful when you do not have fixed versions of a package. So you likely won't be able to load any of the pickle files anymore. As we stopped development on ratar shortly after, we did not update that output file format.

Add `from_path` class method to all ratar.encoding classes

Write class method from_path, analogous to from_molecule. Current problem: files can contain multiple molecules, thus from_path would return a list of molecule objects instead of a molecule object as in case of from_molecule.

Differing behaviour here will not work well downstream, right? Check this.

Required `ratar` updates

Updates needed for this code base:

December 2021

See PR #14 for details.

Update environment file as suggested by @t-kimber in #12
Check if CLI is still running > now it is again.
Update README + installation + usage instructions (update tutorial!!!!)
Fill follow-up section below

Follow-up

Methodology / encoding

So far only full pocket encoding; we probably need subpocket encoding (overlapping patches)
Define binding sites and their size #7
Include non-standard amino acids #8
Check units of 4th to 6th dimensions of reference points
Check units and scaling of moments - First three moments of distribution:
- Same units: 2nd moment - standard deviation; 3rd moment: 3rd root of skewness
- Scaling: none (should fingerprint/moments be normalised?)
We started to look into pdbqt files to be added as "physchem" properties to our fingerprint, take a look at this notebook if still of interest
We already started to benchmark the method against similar/dissimilar pocket pairs from FuzCav, ProSPECCTs, and TOUGH-M1 (see README)
Encoding workers fine for mol2 files; pdb files may not and need revision
Since we probably have to move to NGLview anyways, PyMol functions have not been checked since 2019; probably they do not work anymore.

Testing and CI

Add unit tests for similarity module
CI: Add back Windows + MacOS support, lint package, format+lint+test docs tutorials

Code

Check similarity module - refactoring needed?
Address #FIXMEs and #TODOs in code (left-overs of major refactoring in PR #1); to be done after setting up unit tests
Remove pymol dependency (not on conda-forge; currently installed from tpeulen)
Remove flatten-dict dependency (only pip-installable)
Add from_path class method to all ratar.encoding classes: Write class method from_path, analogous to from_molecule. Current problem: files can contain multiple molecules, thus from_path would return a list of molecule objects instead of a molecule object as in the case of from_molecule.
We set up a logging.conf file to fine-grain our logging. Include back into the package if of interest.

Packaging

Update ratar environment - enable conda packaging

Installation update for 3.12

The installation instructions in the README only work for <3.12.

I recommend the following two steps:

Get everything running on Python 3.8 or 3.9 (at least the CI tests on those, see here). Might work up until <3.12. How can you get this running? Pin the Python version here as python=3.8 after downloading the package and follow the instructions.
Since the now up-to-date Python version is 3.12. This involves updating the installation* and most probably the code itself as there have been quite some API updates in our dependencies. *Update the repo as described here: wolberlab/dynophores#66

Update setup.py Python version from >=3.8 to >=3.12
Update versioneer.py so that it is compatible with Python 3.12

@AndreaVolkamer can you please ping your student here?

volkamerlab / ratar Goto Github PK

ratar's People

Contributors

Stargazers

Watchers