Git Product home page Git Product logo

img2mol's Introduction

Img2Mol: inferring molecules from pictures

Img2Mol Welcome to Img2Mol! 👋.

👉 For the Img2Mol web app switch to the "deployment-example" branch.

Overview

Here we provide the implementation of the img2mol model using PyTorch and PyTorch Lightning for training and inference, along with an exemplary jupyter notebook.

This repository is organized as follows:

  • examples/: contains example images to apply our proposed model on
  • img2mol/: contains necessary python modules for our proposed model
  • model/: stores the trained model weights as pickled files. The download-link will be provided in future soon

Installation

Requirements

python=3.8.5
pip=20.2.4
notebook=6.4.2
pillow=8.0.1
numpy=1.19.2
rdkit=2020.03.1
cudatoolkit=11.0
torchvision=0.8.0
torchaudio=0.7.0
pytorch=1.7.0
pytorch-lightning=1.0.8

Environment

Create a new environment:

git clone [email protected]:bayer-science-for-a-better-life/Img2Mol.git
cd Img2Mol
conda env create -f environment.yml
conda activate img2mol
pip install .

If you want to run Img2Mol as a standalone version with a locally loaded CDDD model instead of sending requests to our CDDD server, install the environment from environment.local-cddd.yml instead of environment.yml

Download Model Weights

You can download the trained parameters for the default model (~2.4GB) as described in our paper using the following link: https://drive.google.com/file/d/1pk21r4Zzb9ZJkszJwP9SObTlfTaRMMtF/view .
Please move the downloaded file model.ckpt into the model/ directory.

If you are working with the local CDDD installation, please * download and unzip the CDDD model and ove the directory default_model to path/to/anaconda3/envs/img2mol/lib/python3.6/site-packages/cddd/data/

Alternatively, we provide a bash script that will download and move the file automatically.

bash download_model.sh

If you have problems downloading the file using the bash script, please manually download the file using the browser.

Examples

Check the example notebook example_inference.ipynb to see how the inference class can be used. A demonstration of the usage with the usage with the local CDDD model is demonstrated in example_inference_local_cddd.ipynb.

Reference

Please cite our manuscript if you use our model in your work.

D.-A. Clevert, T. Le, R. Winter, F. Montanari, Chem. Sci., 2021, DOI: 10.1039/D1SC01839F

Img2Mol Code License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Model Parameters License

The Img2Mol parameters are made available for non-commercial use only, under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You can find details at: https://creativecommons.org/licenses/by-nc/4.0/legalcode

img2mol's People

Contributors

djork avatar obrink avatar sdvillal avatar tuanle618 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

img2mol's Issues

Reference files for benchmark datasets

Thank you for uploading the benchmark images. Could you also provide the SD files or a list of SMILES or some sort of reference file that contains the underlying chemical structure? The reference files for the USPTO, UoB, CLEF and JPO sets are available elsewhere but I cannot find references for the STAKER and Img2Mol sets.

Additionally, it is stated that only half of the images from the two bigger sets were used. Could you specify which images were used?

Thanks in advance!

Local CDDD installation has not been found.

I want to use local cddd server, and i put the directory default_model to "\wsl.localhost\Ubuntu-18.04\home\XXX\anaconda3\envs\ldm\lib\python3.8\site-packages\cddd\data" ,but it can't runing :Local CDDD installation has not been found.

image

Training Img2Mol on a new dataset

Hello,
Is it possible to train Img2Mol on a new dataset containing pairs of images and SMILES? Would that be possible to release the code used for training?
Thanks for your help,
Lucas

Try to batch convert the image to smiles

Hi:
I had 4k pictures need to translate the molecular depictions into their SMILES representation.
I used the code with smiles_list = {} for i in picture_list: res = img2mol(filepath=picture_path + "/" + i, cddd_server=cddd_server) smile = res['smiles'] name = os.path.splitext(i)[0] smiles_list[name] = smile time.sleep(20) in example_inference.ipynb. I got the followed error
`TimeoutError Traceback (most recent call last)
~/anaconda3/envs/img2mol/lib/python3.8/site-packages/urllib3/connection.py in _new_conn(self)
158 try:
--> 159 conn = connection.create_connection(
160 (self._dns_host, self.port), self.timeout, **extra_kw

~/anaconda3/envs/img2mol/lib/python3.8/site-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
83 if err is not None:
---> 84 raise err
85

~/anaconda3/envs/img2mol/lib/python3.8/site-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
73 sock.bind(source_address)
---> 74 sock.connect(sa)
75 return sock

TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

NewConnectionError Traceback (most recent call last)`

Benchmark dataset filtering rules

Thank you for uploading the exact benchmark datasets that you used to validate Img2Mol!
I have a question regarding the size of the different sets. It appears that you have removed some images from the original benchmark sets. For example, the JPO dataset normally consists of 450 images but here it is 365. Could you share the criteria for the removal of images from the sets?
Thanks in advance! :)

Kernel restart when running first cell of `example_inference_local_cddd.ipynb`

Hi, I have created a local CDDD conda environment and am trying to run example_inference_local_cddd.ipynb, however my Jupyter Notebook kernel keeps on restarting when I try to run the first cell within the notebook. In particular, this line of code

from img2mol.inference import *

Unfortunately there's no error code, just this screenshot of the error I get from Jupyter Notebook itself. The same issue repeats after pressing the OK button and running the first cell again.

I believe the CDDD server is also down, as I am getting a connection error that has been reported in Issues before.

kernal_restart

Issue--How to train this Img2mol model

Hi, OBrink:
When I was testing this model, I met one problem:
test5
test6
I used train model but get wrong predictions.

Do you have any advices to get high accuracy to test these pictures, we want to input these pictures from patents and get right smiles ?
In my personal opinion, it is because there is no such sample when the model is trained, so can I use this Img2mol model that has been trained?

example_inference.ipynb not working

Hi
I get some problem about your code especially cddd_server.py

when i run example_inference.ipynb, it cause ConnectionError like this
ConnectionError: HTTPConnectionPool(host='ec2-18-157-240-87.eu-central-1.compute.amazonaws.com', port=8892): Max retries exceeded with url: /cddd_to_smiles/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff999ca6668>: Failed to establish a new connection: [Errno 110] Connection timed out',))

I'm waiting for your help.

Thanks :)

Issue with installing dependencies from yml file on M1 mac

After following the install directions I could not install the environment.yml file. I got an error about the cudatoolkit=11.0
I tried to remove that line within yml file, proceed with install instructions, activate env and then install cudatoolkit with the following command <conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch -c conda-forge> but still could not get it to work. Just a note: I am on a M1 mac and I am not sure if that's part of the problem. Has anyone else had this problem or have any advice?
Screen Shot 2023-02-03 at 12 20 54 PM

Inference error

Hi,

I was trying out the inference file (example_inference.ipynb) you provided but encountered an error at:

res = img2mol(filepath="examples/digital_example1.png", cddd_server=cddd_server) part.

I am copying the error message here. I just followed the installation through the readme and just tried this script with the images you provided.

I think there are some changes in the arguments of the transforms.RandomRotation and transforms.RandomAffine.

----> [1]res = img2mol(filepath="examples/digital_example1.png", cddd_server=cddd_server)

/Img2Mol/img2mol/inference.py:136), in Img2MolInference.__call__(self, filepath, cddd_server, return_cddd)
    131 def __call__(self,
    132              filepath: str,
    133              cddd_server: CDDDRequest = None,
    134              return_cddd: bool = False,
    135              ) -> dict:
--> 136     images = self.read_image_to_tensor(filepath, repeats=50)
    137     with torch.no_grad():
    138         cddd = self.model(images).detach().cpu().numpy()

/Img2Mol/img2mol/inference.py:126), in Img2MolInference.read_image_to_tensor(self, filepath, repeats)
    124     return "Image must be jpg or png format!"
    125 image = self.read_imagefile(filepath)
--> 126 images = torch.cat([torch.unsqueeze(self.transform_image(image), 0)
    127                     for _ in range(repeats)], dim=0)
    128 images = images.to(self.device)
    129 return images

/Img2Mol/img2mol/inference.py:126), in <listcomp>(.0)
    124     return "Image must be jpg or png format!"
...
--> 107     img_PIL = transforms.RandomRotation((-15, 15), resample=3, expand=True, center=None, fill=255)(image)
    108     img_PIL = transforms.ColorJitter(brightness=[0.75, 2.0], contrast=0, saturation=0, hue=0)(img_PIL)
    109     shear_value = np.random.uniform(0.1, 7.0)

TypeError: RandomRotation.__init__() got an unexpected keyword argument 'resample'

When I run the code with `res = img2mol(filepath="examples/digital_example1.png", cddd_server=cddd_server)` in example_inference.ipynb I get the followed error

TimeoutError Traceback (most recent call last)
~/anaconda3/envs/img2mol/lib/python3.8/site-packages/urllib3/connection.py in _new_conn(self)
158 try:
--> 159 conn = connection.create_connection(
160 (self._dns_host, self.port), self.timeout, **extra_kw

~/anaconda3/envs/img2mol/lib/python3.8/site-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
83 if err is not None:
---> 84 raise err
85

~/anaconda3/envs/img2mol/lib/python3.8/site-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
73 sock.bind(source_address)
---> 74 sock.connect(sa)
75 return sock

TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

NewConnectionError Traceback (most recent call last)
~/anaconda3/envs/img2mol/lib/python3.8/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
669 # Make the request on the httplib connection object.
--> 670 httplib_response = self._make_request(
671 conn,

~/anaconda3/envs/img2mol/lib/python3.8/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
391 else:
--> 392 conn.request(method, url, **httplib_request_kw)
393

~/anaconda3/envs/img2mol/lib/python3.8/http/client.py in request(self, method, url, body, headers, encode_chunked)
1254 """Send a complete request to the server."""
-> 1255 self._send_request(method, url, body, headers, encode_chunked)
1256

~/anaconda3/envs/img2mol/lib/python3.8/http/client.py in _send_request(self, method, url, body, headers, encode_chunked)
1300 body = _encode(body, 'body')
-> 1301 self.endheaders(body, encode_chunked=encode_chunked)
1302

~/anaconda3/envs/img2mol/lib/python3.8/http/client.py in endheaders(self, message_body, encode_chunked)
1249 raise CannotSendHeader()
-> 1250 self._send_output(message_body, encode_chunked=encode_chunked)
1251

~/anaconda3/envs/img2mol/lib/python3.8/http/client.py in _send_output(self, message_body, encode_chunked)
1009 del self._buffer[:]
-> 1010 self.send(msg)
1011

~/anaconda3/envs/img2mol/lib/python3.8/http/client.py in send(self, data)
949 if self.auto_open:
--> 950 self.connect()
951 else:

~/anaconda3/envs/img2mol/lib/python3.8/site-packages/urllib3/connection.py in connect(self)
186 def connect(self):
--> 187 conn = self._new_conn()
188 self._prepare_conn(conn)

~/anaconda3/envs/img2mol/lib/python3.8/site-packages/urllib3/connection.py in _new_conn(self)
170 except SocketError as e:
--> 171 raise NewConnectionError(
172 self, "Failed to establish a new connection: %s" % e

NewConnectionError: <urllib3.connection.HTTPConnection object at 0x2b1c5e957700>: Failed to establish a new connection: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

MaxRetryError Traceback (most recent call last)
~/anaconda3/envs/img2mol/lib/python3.8/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
438 if not chunked:
--> 439 resp = conn.urlopen(
440 method=request.method,

~/anaconda3/envs/img2mol/lib/python3.8/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
725
--> 726 retries = retries.increment(
727 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]

~/anaconda3/envs/img2mol/lib/python3.8/site-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
445 if new_retry.is_exhausted():
--> 446 raise MaxRetryError(_pool, url, error or ResponseError(cause))
447

MaxRetryError: HTTPConnectionPool(host='ec2-18-157-240-87.eu-central-1.compute.amazonaws.com', port=8892): Max retries exceeded with url: /cddd_to_smiles/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2b1c5e957700>: Failed to establish a new connection: [Errno 110] Connection timed out'))

During handling of the above exception, another exception occurred:

ConnectionError Traceback (most recent call last)
in
----> 1 res = img2mol(filepath="examples/digital_example1.png", cddd_server=cddd_server)

~/project/lig_amine/Img2Mol-main/img2mol/inference.py in call(self, filepath, cddd_server, return_cddd)
122 cddd = np.median(cddd, axis=0)
123
--> 124 smiles = cddd_server.cddd_to_smiles(cddd.tolist())
125 mol = Chem.MolFromSmiles(smiles, sanitize=True)
126 # if the molecule is valid, i.e. can be parsed with the rdkit

~/project/lig_amine/Img2Mol-main/img2mol/cddd_server.py in cddd_to_smiles(self, embedding)
48 url = "{}:{}/cddd_to_smiles/".format(self.host, self.port)
49 req = json.dumps({"cddd": embedding})
---> 50 response = requests.post(url, data=req, headers=self.headers, verify=False)
51 return json.loads(response.content.decode("utf-8"))
52

~/anaconda3/envs/img2mol/lib/python3.8/site-packages/requests/api.py in post(url, data, json, **kwargs)
117 """
118
--> 119 return request('post', url, data=data, json=json, **kwargs)
120
121

~/anaconda3/envs/img2mol/lib/python3.8/site-packages/requests/api.py in request(method, url, **kwargs)
59 # cases, and look like a memory leak in others.
60 with sessions.Session() as session:
---> 61 return session.request(method=method, url=url, **kwargs)
62
63

~/anaconda3/envs/img2mol/lib/python3.8/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
528 }
529 send_kwargs.update(settings)
--> 530 resp = self.send(prep, **send_kwargs)
531
532 return resp

~/anaconda3/envs/img2mol/lib/python3.8/site-packages/requests/sessions.py in send(self, request, **kwargs)
641
642 # Send the request
--> 643 r = adapter.send(request, **kwargs)
644
645 # Total elapsed time of the request (approximately)

~/anaconda3/envs/img2mol/lib/python3.8/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
514 raise SSLError(e, request=request)
515
--> 516 raise ConnectionError(e, request=request)
517
518 except ClosedPoolError as e:

ConnectionError: HTTPConnectionPool(host='ec2-18-157-240-87.eu-central-1.compute.amazonaws.com', port=8892): Max retries exceeded with url: /cddd_to_smiles/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2b1c5e957700>: Failed to establish a new connection: [Errno 110] Connection timed out'))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.