TOX
Final Project for CS272.
Introduction to Biomedical Informatics Research
Methodology at Stanford University.
Team
Ayush Kevin Shreyas Tom
Setup
Configuring your system for the project.
Conda Environment
Preparing the local environment.
-
conda create -n tox python=3.9
-
conda activate tox
-
conda install -c conda-forge biopython
-
conda install -c pytorch pytorch
-
conda install scikit-learn
Jupyter Notebook
Making Conda available in Jupyter.
-
conda install -c anaconda ipykernel
-
python -m ipykernel install --user --name=tox
Files
Contains data from previous papers,
including ToxIBTL, ToxDL & ToxinPred.
Contains Python files for exploratory data analysis.
This includes the reading in and wrangling of data into
a standard format ( sequences and toxic / non-toxic ),
identification of duplicate sequences, division of data
into training and test, as well as analysis of sequence
similarity.
Contains data related to CD-HIT, which we use to
determine sequences that are at least 40%
similar.
Contains Jupyter notebooks used in the
process of developing our ToxIN model.
The /ToxIBTL/
folder contains
original code from ToxIBTL.