Interactive lessons in bioinformatics.

Home Page: http://readIAB.org

License: Other

Python 97.95% TeX 1.95% CSS 0.10%

an-introduction-to-applied-bioinformatics's Issues

improve formatting of substitution matrices

this is related to scikit-bio/scikit-bio#161

add Travis-CI testing

Depends on #18 so that we have stuff to test.

MSA big-O tweaks

Few comments:

increasing from 25 to 150 to highlight how bad this can be for just two of your millions of miseq sequences
using a log-linear plot may be awesome and will show the diffs between the curves better
the y-axis is not seconds, but an undefined variable that is proportional to time

link to big-o cheatsheet

http://bigocheatsheet.com/

thanks for the link @wasade.

Embedded LaTeX doesn't always render properly on iPad

I'm sure this is an upstream notebook issue, but thought it might be worth tracking here. I noticed it when reading through the pairwise section. If someone else can confirm that would be good.

add suggested reading in a new "getting started with biology" section

possible suggestions:
http://www.amazon.com/Processes-Life-Introduction-Molecular-Biology/dp/0262013053
NIH bookshelf

presentation of notebook start page needs some work

Currently when launching the notebook server from the top-level directory, the sub-directories are un-ordered. It's also confusing, because some of the directories (licenses, iab) don't contain any notebooks, so show up as blank. Need a better way to handle this so users aren't confused.

Screenshot:

add back beer example and real-word msa

real-world msa could be based on some subset of Greengenes (derived from the cookbook example)

removed these as i didn't get to update them for new msa code yet as part of #81

add automated testing of notebooks

We need a way to automatically run all notebook code cells and make sure there aren't any errors. This could then be hookd up to Travis once #19 is completed.

I started looking into this for QIIME, but ran into issues because QIIME's notebooks only really run external commands (e.g., !validate_mapping_file.py -m ...) and I couldn't easily find a way to check if the external commands failed. However, I think the things I link to here may be helpful for testing the iab notebooks, which are AFAIK all Python code.

Who knows, there might even be a way to do this directly using IPython 2.0 (there wasn't when I looked into it a few months ago).

Mention that iterative/progressive alignment can take a while in the exercises

Just so students aren't surprised if it takes 5 min or so

clean up matrix formatting

The code for formatting dynamic programming and traceback matrices is scattered and ugly. It should be consolidated, and all output rows should fit on a single line. Maybe the way to go with this is to load into numpy arrays and use the built-in formatting?

update contents

drop the "General Molecular Biology" section, but maybe merge some of that information into "Getting Started" - removed from README.md, but notebooks are still in place so links don't change for students who are using them this semester
drop all of the python stuff (beyond the scope of what I can do here, and there are a lot of resources out there for learning python - I'll link to those from "Getting Started") - removed from README.md, but notebooks are still in place so links don't change for students who are using them this semester
rename algorithms to "Fundamentals" - done in README.md, but notebooks are still in algorithms directory, so links don't change for students who are using them this semester
add an "Applications" section, where I add new notebooks on measuring diversity (i.e., qiime-like stuff), genome assembly, ...;
drop the 'Statistics' section (beyond the scope, and will link to other materials);
drop the "Other topics" section (pending notebooks that'd fall under that category, but might add some of my reproducibility in computing type stuff there at some point);
remove numbers from all chapter names, now that the chapter layout is ordered in the Index.ipynb (waiting a couple of weeks on this so links don't change for students who are using them this semester)

port project timeline to milestones and issues

This will initially derive content from this document and should be easily editable by me.

add "developer notes"

I think it would make sense to highlight some of the discussions that are included as Developer notes, where we briefly describe things that you'd want to think about if you were developing the functionality described in the text. These could be highlighted in some kind of box to stand out from the rest of the text. An example is in the pairwise alignment chapter:

Next steps: All of those steps are a bit ugly, so as a developer you'd want to make this functionality generally accessible to users. To do that, you'd want to define a function that takes all of the necessary input and provides the aligned sequences and the score as output, without requiring the user to make several function calls. What are the required inputs? What steps would this function need to perform?

1-pairwise-alignment: linkrot in BLOSUM link to wikipedia

this is the wrong link
wikipedia article on this topic.
I think you meant this:
wikipedia article on this topic

add Travis build icon to README.md

1-pairwise-alignment needs some copyediting

Various minor typos, thinkos, etc. Mainly submitting this for bookkeeping to associate with pull-request for commits 3ba3f7c and d445c57.

Genome Assembly/Analysis

http://evomicsorg.wpengine.netdna-cdn.com/wp-content/uploads/2015/01/2015-evomics-assembly.pdf
https://www.coursera.org/course/ads1

remove iab.algorithms.DNA when skbio dependency is updated beyond 0.1.4

This light wrapper is required for the functionality introduced in skbio's #507 which is used the MSA chapter, while IAB depends on skbio 0.1.4.

lifehacker u

We should try to get IAB listed by Lifehacker U

make iab pip-installable

pairwise-alignment working directory is algorithms. When trying to import iab.algorithms, need to change back to project root

http://stackoverflow.com/questions/15514593/importerror-no-module-named-when-trying-to-run-python-script

In my case,

import os
os.getcwd()
import sys
sys.path.append('~/iab/An-Introduction-To-Applied-Bioinformatics/')

move the database searching chapter to applications

Minor typo

There is a minor typing error in getting started - index.ipynb

Variables is written as varaibles.

library code clean-up and testing

functions need to be tested, and many ported to skbio

make notebooks run faster

The tests take ~40 minutes via Travis, which is running through all of the notebooks. Once we hit 50 minutes, Travis will abort the tests. Most of the cells run instantly, or with little delay, but some cells take several minutes to complete. @gregcaporaso thoughts on this?

Travis also requires that there is some sort of output printed within a 10-minute window, otherwise the tests will be killed. We're currently okay, but there are some cells that are likely close to this threshold.

clustering chapter to do items

better name for cluster_greedy - this really should be something like cluster_centroid_distance
images to illustrate the different clustering algorithms - see phone for photos of whiteboard from lecture
add open reference discussion
move cluster functions to iab/algorithms/__init__.py
add some discussion of real world run time for OTU picking (several people asked questions about doing this iteratively - like iterative msa - which is interesting, but runtime would be a limiting factor)
instead of computing all kmer distances in cluster_greedy_kmer, just compute the kmer distances to the cluster centroids (Rob M. pointed this out)
add discussion of why approximations are required (i.e., why can't you compute distances between all pairs of sequences, build a tree, and define OTUs based on clades in the tree?) - this should go in the top of the notebook so it's clear why we don't compare all sequences against all other sequences.

bug in guide_tree_from_query_sequences

guide_tree_from_query_sequences fails when passed correct data.

686 
687     guide_dm = DistanceMatrix(guide_dm, seq_ids)

--> 688 guide_lm = average(guide_dm.condensed_form())
689 guide_tree = to_tree(guide_lm)
690 if display_tree:

AttributeError: 'DistanceMatrix' object has no attribute 'condensed_form'

scikit currently requires numpy==1.8 (API version 9), release ipython comes with version 7?

I updated ipython just before starting into iab, but found this as one of my first dependencies that needed addressing. Should these versions be pinned or maintained in requirements.py?

better real-world example of creating tree from biological sequences

The mammal tree based on hemoglobin doesn't work well, I think because the sequences are too similar, but maybe because we need a better distance metric for the aligned sequences that counts using blosum50. Basically want something that has organisms that students can relate to (in that they'll have an intuitive feel for who should be more/less closely related) and that is well annotated so those names are evident.

Add reference to github issue tracker in Disclaimer, alongside email and pull requests

I feel bad having sent a bunch of email to Greg, but I didn't know the issue tracker was a thing until after I sent him a bunch of email about dependencies in the "Getting Started" page.

add suggested reading in a new "getting started with python" section

possible suggestions:
http://nbviewer.ipython.org/gist/anonymous/11250965
http://learnpythonthehardway.org/
http://www.amazon.com/Practical-Computing-Biologists-Steven-Haddock/dp/0878933913
http://www.amazon.com/Practical-Programming-Introduction-Pragmatic-Programmers/dp/1934356271

phylogeny chapter text corrections

@EvolDoc is working on proof-reading and critical review of this chapter.

Some other thoughts that I sent to @EvolDoc by email:

I reviewed the phylogeny chapter this morning and remember just how basic/short it actually is (much less developed than the other chapters). So, one thing that would be great to get your input on is what other methods you think are important to introduce students to at this stage. I specifically try to focus it on methods where the math isn't too challenging, and then point them to other resources for learning more. One great thing to add to this chapter would be a discussion of the limitations of a simple method like UPGMA, and what has been done to address those with other methods.

add acknowledgements to readme

link to Bayesian Methods for Hackers, as a source of inspiration

rename "alignment-exercises" as "pairwise-alignment-exercises" after BIO 299 assignment based on this is due

don't want to deal with changing url before then

updates to teaching web site to consolidate material here

#38
#39
link to IAB

add dependencies, instructions for use to readme.md

six dependency currently requires future==1.13

Details on stack overflow, but this definitely affects the person trying to get started

http://stackoverflow.com/questions/26247431/future-utils-six-not-found-when-trying-to-import-skbio-modules

iab setup

Hi there, when I downloded the book and started working with it I already had all the necessary packages except iab. As I didn't need to install any module Ididn't saw the pip intall . line . Maybe if you could make a reference bout it in the installation section of the readme would be great. So others can see that the module iab mut be installed using pip.

Thanks !

also, should we re-direct appliedbioinformatics.us to this page, or should we leave as a caporasolab.us url, to bring attention to the lab website?

check out Daniel's python tutorial

Review tutorial generated by @wasade here.

MSA chapter intro text suggestions

@kschwarzberg is currently reviewing this one.

applied-bioinformatics / an-introduction-to-applied-bioinformatics Goto Github PK

an-introduction-to-applied-bioinformatics's Issues

Recommend Projects

Recommend Topics

Recommend Org