Git Product home page Git Product logo

find_sds's Introduction

Python 3 Updates codecov python version tested platforms

FIND MISSING SAFETY DATA SHEET (SDS)

This program is designed to find and download safety data sheet of chemical using CAS number.


CONTENTS

DETAILS

  • Provided with a list of CAS numbers, this program searches and downloads safety data sheet (SDS) into a designated folder. If a download folder is not provided, SDS will be downloaded into folder 'SDS' inside folder find_sds.
  • This program uses multithreading to speed up the download process. By default, ten threads are used but it can be changed depends on running computer.
  • Downloaded SDS are saved as '<CAS_Number>-SDS.pdf'
  • Lookup databases include:

REQUIREMENTS


USAGE

  1. Clone this repository:

    $ git clone https://github.com/khoivan88/find_sds.git    #if you have git
    # if you don't have git, you can download the zip file then unzip
  2. Change into the directory of the program:

    $ cd find_sds
  3. (Optional): create virtual environment for python to install dependency: Note: you can change find_sds_venv to another name if desired.

    $ python -m venv find_sds_venv   # Create virtual environment
    $ source find_sds_venv/bin/activate    # Activate the virtual environment on Linux
    # find_sds_venv\Scripts\activate    # Activate the virtual environment on Windows
  4. Install python dependencies:

    $ pip install -r requirements.txt
  5. Example usage:

    $ python
    >>> from find_sds.find_sds import find_sds
    >>> cas_list = ['141-78-6', '110-82-7', '67-63-0', '75-09-2', '109-89-7',
    ...     '872-50-4', '68-12-2', '96-47-9', '111-66-0', '110-54-3',
    ...     '00000-00-0',    # invalid CAS number, or unknown CAS
    ... ]
    >>> download_path = 'SDS'
    >>> find_sds(cas_list=cas_list, download_path=download_path, pool_size=10)
    Downloading missing SDS files. Please wait!
    
    Searching for 96-47-9-SDS.pdf ...
    
    Searching for 110-82-7-SDS.pdf ...
    
    Searching for 141-78-6-SDS.pdf ...
    
    Searching for 872-50-4-SDS.pdf ...
    
    Searching for 00000-0-0-SDS.pdf ...
    
    Searching for 111-66-0-SDS.pdf ...
    
    Searching for 110-54-3-SDS.pdf ...
    
    Searching for 75-09-2-SDS.pdf ...
    
    Searching for 68-12-2-SDS.pdf ...
    
    Searching for 67-63-0-SDS.pdf ...
    
    Searching for 109-89-7-SDS.pdf ...
    
    Still missing SDS:
    {'00000-00-0'}
    
    Summary:
            1 SDS files are missing.
            10 SDS files downloaded.
    
    
    (Optional): you can turn on debug mode (more error printing during search) using the following command:
    python find_sds/find_sds.py  --debug
    
    >>>

VERSIONS

See here for the most up-to-date

find_sds's People

Contributors

khoivan88 avatar pyup-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

find_sds's Issues

Missing SDS Sheets

Hello,

When I run the program on a list of 378 CAS numbers the program reports download of 354 and cannot find 15. However when I compare the downloads to my original CAS list, I identify 24 missing.

Here is the list without 3 non-typical numbers that the program skipped.

10257-55-3
106-93-4
110489-05-9
111-87-5
124-73-2
1323-83-7
139-02-6
15022-08-9
18586-22-6
1859-08-1
2156-97-0
3687-18-1
39389-20-3
558-20-3
63316-43-8
68441-33-8
70900-21-9
7440-06-4
75-47-8
75-69-4
853-68-9

use with argparse

Hello! This issue is concerning your comments on using argparse. I haven't run your code, but here are a few comments.

I see you using global debug a lot. This should be necessary. The variable debug as declared should already be "global" (ie readable inside of the functions). Since you are only reading it, I believe you do not need debug. This should be true when the file is run as a script whether or not it is in __main__. However, if the definition is in __main__ and you run with pytest (or run functions from an import), debug will not be defined, so tests will fail.

You might try putting everything with argparse in __main__ and adding a default argument (debug=False) to all of your function definitions which can be overridden with the value from argparse when the script is run. This way, debug is defined automatically to be False. When you use it, you will call it with the appropriate values (in __main__, this would be debug=args.debug).

find_sds, installation issue

Using find_sds within a virtual environment by Python 3.9.7 (default as provided by Linux Debian 12/bookworm, branch testing) I notice the installation halts when aiming to resolve the dependencies at the explicit version requirement for idna==3.1:

idna

The installation however proceeds well (including provision of a functional application) if the third line of requirements only reads idna. In the venv created, help(idna) indicates that version 2.10 is loaded (based on idna-2.10-py2.py3-none-any.whl).

The observation is based on fetching a pristine clone of the repository (6cf1565 by May 9, 2021).

Very useful repository

Hi @khoivan88 , First of all thanks for your valuable work. Really helps me lot. Hope in near future you can add more source ๐Ÿ™.

Regards,
Sitanshu.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.