Git Product home page Git Product logo

fcd's Introduction

Fréchet ChemNet Distance

PyPI Tests (master) Tests (dev) PyPI - Downloads GitHub release (latest by date) GitHub release date GitHub

Code for the paper "Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery" JCIM / ArXiv

Installation

You can install FCD using

pip install fcd

or run the example notebook on Google Colab .

Requirements

numpy
torch
scipy
rdkit

Updates

Version 1.1 changes

  • Got rid of unneeded imports
  • load_ref_model doesn't need an argument any more to load a model.
  • canonical and canonical_smiles now return None for invalid smiles.
  • Added get_fcd as a quick way to get a the fcd score from two lists of smiles.

Version 1.2 changes

fcd's People

Contributors

avaucher avatar gklambauer avatar kristinapreuer avatar renzph avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fcd's Issues

<Bug> Issue when using the function get_fcd

Hi, I am getting the following error when calling this function with two 10.000 molecules lists.

UnknownError: Graph execution error:

2 root error(s) found.
 (0) UNKNOWN:  IndexError: string index out of range
Traceback (most recent call last):

 File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/script_ops.py", line 271, in __call__
   ret = func(*args)

 File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/autograph/impl/api.py", line 642, in wrapper
   return func(*args, **kwargs)

 File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 1004, in generator_py_func
   values = next(generator_state.get_iterator(iterator_id))

 File "/usr/local/lib/python3.7/dist-packages/keras/engine/data_adapter.py", line 830, in wrapped_generator
   for data in generator_fn():

 File "/usr/local/lib/python3.7/dist-packages/fcd/FCD.py", line 156, in myGenerator_predict
   smiEnc = get_one_hot(currentSmiles, pad_len=nn)

 File "/usr/local/lib/python3.7/dist-packages/fcd/FCD.py", line 127, in get_one_hot
   if smiles[i + 1] in ['r', 'i', 'l']:

IndexError: string index out of range


    [[{{node PyFunc}}]]
    [[IteratorGetNext]]
    [[IteratorGetNext/_2]]
 (1) UNKNOWN:  IndexError: string index out of range
Traceback (most recent call last):

 File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/script_ops.py", line 271, in __call__
   ret = func(*args)

 File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/autograph/impl/api.py", line 642, in wrapper
   return func(*args, **kwargs)

 File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 1004, in generator_py_func
   values = next(generator_state.get_iterator(iterator_id))

 File "/usr/local/lib/python3.7/dist-packages/keras/engine/data_adapter.py", line 830, in wrapped_generator
   for data in generator_fn():

 File "/usr/local/lib/python3.7/dist-packages/fcd/FCD.py", line 156, in myGenerator_predict
   smiEnc = get_one_hot(currentSmiles, pad_len=nn)

 File "/usr/local/lib/python3.7/dist-packages/fcd/FCD.py", line 127, in get_one_hot
   if smiles[i + 1] in ['r', 'i', 'l']:

IndexError: string index out of range


    [[{{node PyFunc}}]]
    [[IteratorGetNext]]
0 successful operations.
0 derived errors ignored. [Op:__inference_predict_function_2176]

I have an feeling that it might be related to the smiles I generated but I am not a 100% sure. Could you confirm this is the case?
Thanks

Dataset used to train Chemnet

Hello!
Can I get the dataset used to train Chemnet? (In your paper, it says the model was trained to predict bioactivities of about 6 000 assays available in three major drug discovery databases (ChEMBL, ZINC, PubChem))
Thank you in advance!

ValueError: Imaginary component

Hello, I've got an issue for get_fcd function.

I reproduced this error in google colab

!pip install fcd
from fcd import get_fcd

smiles_list1 = ['COc1cccc(NC(=O)Cc2coc3ccc(OC)cc23)c1', 'Cc1noc(C)c1CN(C)C(=O)Nc1cc(F)cc(F)c1']
smiles_list2 = ['Oc1ccccc1-c1cccc2cnccc12', 'Cc1noc(C)c1CN(C)C(=O)Nc1cc(F)cc(F)c1']
get_fcd(smiles_list1, smiles_list2)

and I've got this result.


ValueError Traceback (most recent call last)
in <cell line: 3>()
1 smiles_list1 = ['COc1cccc(NC(=O)Cc2coc3ccc(OC)cc23)c1', 'Cc1noc(C)c1CN(C)C(=O)Nc1cc(F)cc(F)c1']
2 smiles_list2 = ['Oc1ccccc1-c1cccc2cnccc12', 'Cc1noc(C)c1CN(C)C(=O)Nc1cc(F)cc(F)c1']
----> 3 get_fcd(smiles_list1, smiles_list2)

1 frames
/usr/local/lib/python3.10/dist-packages/fcd/utils.py in calculate_frechet_distance(mu1, sigma1, mu2, sigma2, eps)
171 if not np.allclose(np.diagonal(covmean).imag, 0, atol=1e-3):
172 m = np.max(np.abs(covmean.imag))
--> 173 raise ValueError("Imaginary component {}".format(m))
174 covmean = covmean.real
175

ValueError: Imaginary component 1.90603044681631e+39

Do you have any idea to solve this?
I have to get fcd value for my project

An error in the function 'get_one_hot'

when the getitem function is called, get_one_hot(smiles, 350) is called. In get_one_hot(smiles, 350) function, the array_length is limited within 350, but the index of numeric can exceed 350, causing the IndexError for one_hot in axis 0.

class SmilesDataset(Dataset):
    __PAD_LEN = 350

    def __init__(self, smiles_list):
        super().__init__()
        self.smiles_list = smiles_list

    def __getitem__(self, idx):
        smiles = self.smiles_list[idx]
        features = get_one_hot(smiles, 350)  //set pad_len = 350
        return features / features.shape[1]

    def __len__(self):
        return len(self.smiles_list)
def get_one_hot(smiles: str, pad_len: int = -1) -> np.ndarray:
    """Generate one-hot representation of a Smiles string.

    Args:
        smiles (str): Input molecule as Smiles
        pad_len (int, optional): Whether or not to pad to a given size. Defaults to -1.

    Returns:
        np.ndarray: Array containing the one-hot encoded Smiles
    """
    smiles = smiles + "."

    # initialize array
    array_length = len(smiles) if pad_len < 0 else pad_len
    vocab_size = len(__vocab)
    one_hot = np.zeros((array_length, vocab_size))

    tokens = tokenize(smiles)
    numeric = [__vocab_c2i.get(token, __unk) for token in tokens]

    for pos, num in enumerate(numeric):  //pos can exceed 350
        one_hot[pos, num] = 1    //IndexError

    return one_hot

Shape missmatching

Following the tutorial i do following:

mu_chembl, cov_chembl = pickle.load(open("chembl_50k_stats.p", 'rb')).values()
gen_mol_act = get_predictions(gen_mol)

In this case mu_chembl is array of shape (512, 512), and gen_mol_act is of shape (4640, 512)

now, calling
calculate_frechet_distance(mu1=np.mean(gen_mol_act, axis=0), mu2=mu_chembl, sigma1=np.cov(gen_mol_act.T), sigma2=cov_chembl) yields an assertion error: Training and test mean vectors have different lengths

Essentially, mu1 is of shape (512, ) while mu2 is of shape (512, 512).

What would be the correct way of solving this?

how to install it?

Could you please give us some examples to show how to use it. Many thanks!

Add package to PyPi

fcd is a dependency to our package guacamol, which users can install from the PyPi repositories.

Pulling the fcd dependency from the GitHub repo works, but is a bit cumbersome since it often leads to errors and / or requires special flags when installing guacamol with pip.

It would be great if you could upload fcd on PyPi, in order for it to be installable directly with

pip install fcd

Here are the steps; it shouldn't take you more than ~10 minutes. A more complete tutorial can be found here.

  1. Register on https://pypi.org/
  2. Verify e-mail
  3. python3 -m pip install --upgrade setuptools wheel twine
  4. In the FCD project root: python3 setup.py sdist bdist_wheel
  5. twine upload dist/*

Let me know if I can help you in any way.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.