Git Product home page Git Product logo

data_in_life_blog's Introduction

My brief background introduction can be accessed here via my blog, which showcases several different cheminformatics, machine learning and data science projects using various software toolkits. The main project I've been working on lately is the tree series in machine learning on ChEMBL-derived data (decision tree 1, decision tree 2, decision tree 3, random forest, random forest classifier, boosted trees).

There are also several other side projects that I've worked on over the past year such as:

Open-source contributions: practical_cheminformatics_tutorials, chembl_downloader

data_in_life_blog's People

Contributors

jhylin avatar

Watchers

 avatar  avatar

data_in_life_blog's Issues

Suggestions for using ChEMBL-Downloader

Great to see you're using chembl-downloader in https://jhylin.github.io/Data_in_life_blog/posts/17_ML2-2_Random_forest/2_random_forest_classifier.html! I have a few suggestions to take your reproducibility to the next level.

Generating SQL

# Query chembl_downloader to show SQL required 
# to extract ChEMBL data for a specific protein target
# e.g. target_chembl_id for acetylcholinesterase (AChE): CHEMBL220
queries.markdown(queries.get_target_sql(target_id="CHEMBL220", target_type="SINGLE PROTEIN"))

The queries.get_target_sql actually generates a string that represents the SQL. I saw in the next cell, you re-define this explicitly. I'd suggest you change this cell to

# Generate SQL for a query on the given target_id
# for acetylcholinesterase (AChE): CHEMBL220
sql = queries.get_target_sql(target_id="CHEMBL220", target_type="SINGLE PROTEIN")

# Pretty-print the SQL in Jupyter
queries.markdown(sql)

Then, you don't have to redefine it in the next cell by hand!

Querying

The following 3 cells in a row (paraphrased) are about actually running the query and getting the data. There's a more reproducible way that doesn't require commenting out code, which means that anyone can re-run the notebook:

df = chembl_downloader.query(sql)

...

# Save df as .csv file (uncomment last line to run)
#df.to_csv("chembl_d_ache", sep=",", index=False)

...

# Load dataset from saved .csv file
df_ache = pd.read_csv("chembl_d_ache")
print(df_ache.shape)
df_ache.head()

Try it like this:

from pathlib import Path

# Pick any directory, but make sure it's relative to your home directory
directory = Path.home().joinpath(".data", "blog")
# Create the directory if it doesn't exist
directory.mkdir(exist_okay=True, parents=True)

# Create a file path that corresponds to the version, since this could change
path = directory.joinpath(f"chembl_d_ache_{latest_version}.tsv")

if path.is_file():
    # If the file already exists, load it
    df_ache = pd.read_csv(path, sep=',')
else:
    # If the file doesn't already exist, make the query then cache it
    df_ache = chembl_downloader.query(sql)
    df_ache.to_csv(path, sep=",", index=False)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.