applied-bioinformatics / an-introduction-to-applied-bioinformatics Goto Github PK
View Code? Open in Web Editor NEWInteractive lessons in bioinformatics.
Home Page: http://readIAB.org
License: Other
Interactive lessons in bioinformatics.
Home Page: http://readIAB.org
License: Other
this is related to scikit-bio/scikit-bio#161
Depends on #18 so that we have stuff to test.
Few comments:
thanks for the link @wasade.
I'm sure this is an upstream notebook issue, but thought it might be worth tracking here. I noticed it when reading through the pairwise section. If someone else can confirm that would be good.
possible suggestions:
http://www.amazon.com/Processes-Life-Introduction-Molecular-Biology/dp/0262013053
NIH bookshelf
Currently when launching the notebook server from the top-level directory, the sub-directories are un-ordered. It's also confusing, because some of the directories (licenses
, iab
) don't contain any notebooks, so show up as blank. Need a better way to handle this so users aren't confused.
Screenshot:
real-world msa could be based on some subset of Greengenes (derived from the cookbook example)
removed these as i didn't get to update them for new msa code yet as part of #81
We need a way to automatically run all notebook code cells and make sure there aren't any errors. This could then be hookd up to Travis once #19 is completed.
I started looking into this for QIIME, but ran into issues because QIIME's notebooks only really run external commands (e.g., !validate_mapping_file.py -m ...
) and I couldn't easily find a way to check if the external commands failed. However, I think the things I link to here may be helpful for testing the iab notebooks, which are AFAIK all Python code.
Who knows, there might even be a way to do this directly using IPython 2.0 (there wasn't when I looked into it a few months ago).
Just so students aren't surprised if it takes 5 min or so
The code for formatting dynamic programming and traceback matrices is scattered and ugly. It should be consolidated, and all output rows should fit on a single line. Maybe the way to go with this is to load into numpy arrays and use the built-in formatting?
algorithms
directory, so links don't change for students who are using them this semesterThis will initially derive content from this document and should be easily editable by me.
I think it would make sense to highlight some of the discussions that are included as Developer notes, where we briefly describe things that you'd want to think about if you were developing the functionality described in the text. These could be highlighted in some kind of box to stand out from the rest of the text. An example is in the pairwise alignment chapter:
Next steps: All of those steps are a bit ugly, so as a developer you'd want to make this functionality generally accessible to users. To do that, you'd want to define a function that takes all of the necessary input and provides the aligned sequences and the score as output, without requiring the user to make several function calls. What are the required inputs? What steps would this function need to perform?
this is the wrong link
wikipedia article on this topic.
I think you meant this:
wikipedia article on this topic
This light wrapper is required for the functionality introduced in skbio's #507 which is used the MSA chapter, while IAB depends on skbio 0.1.4.
We should try to get IAB listed by Lifehacker U
In my case,
import os
os.getcwd()
import sys
sys.path.append('~/iab/An-Introduction-To-Applied-Bioinformatics/')
There is a minor typing error in getting started - index.ipynb
Variables is written as varaibles.
functions need to be tested, and many ported to skbio
The tests take ~40 minutes via Travis, which is running through all of the notebooks. Once we hit 50 minutes, Travis will abort the tests. Most of the cells run instantly, or with little delay, but some cells take several minutes to complete. @gregcaporaso thoughts on this?
Travis also requires that there is some sort of output printed within a 10-minute window, otherwise the tests will be killed. We're currently okay, but there are some cells that are likely close to this threshold.
cluster_greedy
- this really should be something like cluster_centroid_distance
iab/algorithms/__init__.py
cluster_greedy_kmer
, just compute the kmer distances to the cluster centroids (Rob M. pointed this out)guide_tree_from_query_sequences fails when passed correct data.
686
687 guide_dm = DistanceMatrix(guide_dm, seq_ids)
--> 688 guide_lm = average(guide_dm.condensed_form())
689 guide_tree = to_tree(guide_lm)
690 if display_tree:
AttributeError: 'DistanceMatrix' object has no attribute 'condensed_form'
I updated ipython just before starting into iab, but found this as one of my first dependencies that needed addressing. Should these versions be pinned or maintained in requirements.py?
The mammal tree based on hemoglobin doesn't work well, I think because the sequences are too similar, but maybe because we need a better distance metric for the aligned sequences that counts using blosum50. Basically want something that has organisms that students can relate to (in that they'll have an intuitive feel for who should be more/less closely related) and that is well annotated so those names are evident.
I feel bad having sent a bunch of email to Greg, but I didn't know the issue tracker was a thing until after I sent him a bunch of email about dependencies in the "Getting Started" page.
@EvolDoc is working on proof-reading and critical review of this chapter.
Some other thoughts that I sent to @EvolDoc by email:
I reviewed the phylogeny chapter this morning and remember just how basic/short it actually is (much less developed than the other chapters). So, one thing that would be great to get your input on is what other methods you think are important to introduce students to at this stage. I specifically try to focus it on methods where the math isn't too challenging, and then point them to other resources for learning more. One great thing to add to this chapter would be a discussion of the limitations of a simple method like UPGMA, and what has been done to address those with other methods.
link to Bayesian Methods for Hackers, as a source of inspiration
don't want to deal with changing url before then
Details on stack overflow, but this definitely affects the person trying to get started
Hi there, when I downloded the book and started working with it I already had all the necessary packages except iab. As I didn't need to install any module Ididn't saw the pip intall . line . Maybe if you could make a reference bout it in the installation section of the readme would be great. So others can see that the module iab mut be installed using pip.
Thanks !
The algorithms section is what I want to present as the example of where I'm hoping this project will lead. Clean up the various chapters to be more presentable.
this is a little cleaner, and code has syntax highlighting. see the getting-started/overview notebook for example.
@ElDeveloper created a nice xkcd-style plot, and it'd be great to include the dates of introduction of the different technologies from this page which @walterst pointed me at.
this could use seaborn heatmaps, e.g. where high values get darker colors.
related to #73
...and the ssw aligner
local_pairwise_align_ssw
(addresses #30) in
iab
moduleauto-generated gh-pages is now posted here:
http://caporasolab.us/An-Introduction-To-Applied-Bioinformatics/
i don't really like how narrow the theme is - probably need some work on a better theme. @ebolyen, we should chat about this a little bit.
also, should we re-direct appliedbioinformatics.us to this page, or should we leave as a caporasolab.us url, to bring attention to the lab website?
@kschwarzberg is currently reviewing this one.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.