caseywdunn / phylogenetic_biology Goto Github PK
View Code? Open in Web Editor NEWA book for my course
A book for my course
For figure 2.10, I suppose b) should be "soft polytomy" and c) should be "hard polytomy" instead of "with polytomy" --- probably just a typo.
Dear Prof. Dunn,
These are such minor questions, but are there options to highlight bookdown text and read bookdown as two pager (viewing "two pages" at the same time)? That is, more like reading a book...
Best,
phylogenetic_biology/06-evaluation.rmd
Line 99 in 98e8bae
Not sure if it is too much that AIB and BIC can deal with non-nested models (as it's not about a test-statistic). Maybe it's too complicated but for students who are interested, Burnham and Aderson is a great resource
phylogenetic_biology/03-simulation.Rmd
Line 453 in 6547f83
There seems to be a typo here: From here on out a I
phylogenetic_biology/03-simulation.Rmd
Line 148 in 6547f83
I love the way you explain this fundamental concept of phylogenetics, but this is precisely one concept that I often see as being completely misunderstood by students and even by phylogeneticists. Not sure if adding some adjectives or something to highlight the critical importance of this central "problem" of phylogenetics (and how, as you mention at the end, dating with fossils or tip dating with viruses help disentangle time-rate)
phylogenetic_biology/06-evaluation.rmd
Line 18 in 98e8bae
Here: The GTR model 11 parameters
maybe The GTR model has
11 parameters?
phylogenetic_biology/03-simulation.Rmd
Line 512 in 6547f83
a word missing here, big ?? We could draw the starting state from a big
phylogenetic_biology/bayes.rmd
Line 32 in bd3f8a3
This sentence does not read well. Maybe a typo or missing words?
phylogenetic_biology/03-simulation.Rmd
Line 602 in 6547f83
Change text
for next
phylogenetic_biology/06-evaluation.rmd
Line 103 in 98e8bae
Home columns will be
change to Some
phylogenetic_biology/03-simulation.Rmd
Line 52 in 6547f83
I think that adding one sentence about what parameters "mean" could help to create an even better intuition for what parameters do in models. In this specific example, you could add a sentence explaining the m is the value that captures the relative change in the variables or the value you would need to multiply x to get the value of y (this is evident from the equation, but it might help); and b is the value of y when x is 0. Not sure if you want to even add that to the plot.
phylogenetic_biology/06-evaluation.rmd
Line 24 in 98e8bae
to vary, than values other than 6
change to then
Also, I wonder if, in the following sentences, the explanation for the relationship between the relative rate and the related rate is super clear. I mean, the equivalencies 1 to 10 to 6. Maybe is just the grammatical structure of the sentences?
phylogenetic_biology/04-inference.Rmd
Line 197 in d638f89
Not sure if you want to add here (a few line above) the "rules of probability": when events are and
== multiplication
, when events are or
== summation
. This mnemonic works for me.
phylogenetic_biology/02-phylogenies.Rmd
Line 69 in b582d02
As part of the section on abstraction, it might be worth mentioning that parent nodes are rarely the immediate parents like they are in a genealogy. They are some ancestral lineage in common to the two child nodes, and could be thousands or millions of years removed.
Hi, I just want to raise a few minor suggestions:
In section 2.1, under figure 2.3, this sentence has two "it" that are a bit confusing
" Because it makes it easier to learn from adjacent fields when using mathematical conventions that are shared across fields, I will tend to use mathematical notation for phylogenies rather than the classical botanical nomenclatures."
Right after figure 2.5
"Rectangular layouts are the most common, because the entire edge length is along one axis of the plot. In a rectangular tree, each node is depicted as a line that is orthogonal to the edges. The confusing thing is that, because this line has the same width and color as the edges, it looks as if it is part of the edge. It isn’t though– its length is arbitrary, and it just shows which edges attach to that node. It also adds right-degree elbows where the ends of the node lines connect to the edges, forming a corner. "
I am an undergrad who has never taken a phylogenetic class before. I found the paragraph rather confusing. What is "this line" referring to? I understand the node and the edge are perpendicular to each other, but which one is which? Perhaps it can help if we have arrows on the figure 2.6!
Gladys Fang
Show two trees that share two splits but differ on a third, eg:
'((A,B),((C,D),E),F);'
'((A,B),(C,D),(E,F));'
phylogenetic_biology/06-evaluation.rmd
Line 132 in 98e8bae
It jsut indicates
--> just
phylogenetic_biology/04-inference.Rmd
Line 7 in d638f89
change as
to is
in likely hypothesis as referred to
phylogenetic_biology/04-inference.Rmd
Line 237 in c1b3106
"the log likelihoods will range from
"the log likelihoods will range from
Optimize edge lengths on each topology, then show optimized likelihood of each topology and indicate ML tree. May need to add more than one site for this to be interesting.
Comments on chapter 6:
The GTR model 11 parameters.
Missing the verb.so that they can differ from each other.
You go straight into how a-f have to add to 6, but a simple example like you do for π (Just something simple like 0.5 means equal probability of transition from X to Y while 0.2 means ...) would really help explainIn typical use, mu=1
Should that be µ
?That means that the best possible likelihood under HKY85 is also available under GTR
-- good message.there are challenges far short of this extreme example
- little bit awkward?I
and G
maybe use the Γ symbol and put (gamma) in parentheses? G is usually a lazy shortcut for finding the gamma key, right?the topology we are evaluating is the focal topology
-- should be as the
-- and maybe bold focal topology.focal tree as a whole be asking how frequent
-- by askingIt jsut indicates how frequently an edge is recovered
--> just
\mathbf{\Pi} not rendering in chapter 3.
phylogenetic_biology/06-evaluation.rmd
Line 28 in 98e8bae
All other parameters are either constant (set to a specific value ahead of the analysis; boxes with straight corners) or deterministic (their value depends on the value of other parameters according to specified relationships; dashed boxes)
phylogenetic_biology/03-simulation.Rmd
Line 99 in 6547f83
Two things:
In the figure you don't have reversals to A after there has been a substitution to another nucleotide - is this on purpose?
It's interesting you start this explanation in the context of DNA replication. When I talk about how this works, I explain it in the context of changes from the "ancestral" nucleotide site to the current site. "given that the ancestral sequence was A and today we see a C, through time it could have changed to A (no change), C, G, T, back to A, etc..." I mean, it's the same after all because DNA replication & inheritance from ancestor to descendant is the underlying process.
phylogenetic_biology/04-inference.Rmd
Line 106 in d638f89
Not sure if you want to add and branch lengths
at the end of this sentence as this is the other relevant parameter to specify the "complete" history of a site: This history is the full set of states at all nodes
Add linear model to chapter 1 as example of relationshoip between observed values (y), unobserved values (x), model, and model parameters.
phylogenetic_biology/06-evaluation.rmd
Line 121 in 98e8bae
Of it is zero
to If
phylogenetic_biology/bayes.rmd
Line 1 in bd3f8a3
Phyloge
netics
phylogenetic_biology/06-evaluation.rmd
Line 36 in 98e8bae
a smaller subset of the values that the more complex model can
change that
to than
I would also change values for parameters.
phylogenetic_biology/bayes.rmd
Line 42 in 55e1b03
phylogenetic_biology/bayes.rmd
Line 9 in 55e1b03
phylogenetic_biology/bayes.rmd
Line 66 in 55e1b03
Figure 2.11 has cropped labels.
Tried a couple fixes, but no luck... https://www.biostars.org/p/312389/
In chapter 2, explicitly pick apart these tree thinking issues, with reference to basal/ early diverging/ etc...
Specify the name of matrix P in chapter 3
Probably to chapter 2
References are in both the references section and at the end of each chapter. Need to remove them at the end of each chapter.
phylogenetic_biology/04-inference.Rmd
Line 15 in d638f89
Root this tree on an internal node rather than edge. Then add demonstration that re-rooting does not impact likelihood despite asymmetry of P. This builds on explanation from @mtholder:
'The probability matrix doesn't have to be symmetric. The constraint is similar to the "detailed balance" constraint in MCMC, namely:
\pi_i \Pr(destination=j | start=i) = \pi_j \Pr(destination=i | start=j)
So if the ratio of the probabilities of the "forward" and "reverse" substitutions is identical to the ratio of the equilibrium frequencies of the destination and source states, then placing the root at any point on the tree will result in the same likelihood. So, you can't infer the root position from the character state data alone.'
Confusion in class how placing root in outgroup gives you the root of ingroup, since now you are discussing two roots in one tree but only one point can be the oldest. Talk more about how each subtree (clade) has a root, and we can talk about those roots in the context of a single tree.
Section 0.1 - guthub
Figure 1.1 Estimate
Silhouettes
Two concepts of relatedness:
if
should be of
I think it might be prudent to use a box or a table to quickly summarize monophyly, polyphyly, and paraphyly in a manner that students could easily find during a study period, i.e.,
monophyly-group including common ancestor and all descendants
paraphyly-group including common ancestor but only some descendants
polyphyly-group including tips but not the common ancestor nor all descendants of the common ancestor shared by the included tips
phylogenetic_biology/03-simulation.Rmd
Line 508 in 6547f83
I would change the order of clauses in this sentence, for example, I would start with Now what?
phylogenetic_biology/03-simulation.Rmd
Line 392 in 6547f83
At the end of this line, there seems to be a typo: because the keep the average
Would like to change the main font to Gyre Schola.
Info on font selection in bookdown here - https://bookdown.org/yihui/rmarkdown-cookbook/latex-variables.html
Output specifics set in https://github.com/caseywdunn/phylogenetic_biology/blob/master/_output.yml . There can see that I am using latex engine xelatex
. So font should be set with mainfont
. That is how I am setting it now, but it doesn't work. Maybe this font is not installed on the local system?
In https://github.com/caseywdunn/phylogenetic_biology/blob/master/docker/Dockerfile, I do install it with tlmgr install tex-gyre
. But that installs it as a tex package. I probably need to install it at a system level.
phylogenetic_biology/06-evaluation.rmd
Line 128 in 98e8bae
be resampling
--> by
through chapter 6, had only run Buil Book with gitbook output. Now want to Build Book to pdf and ensure both output formats moving forward. Resolved some issues, but these still remain:
Table of contents not populated
citation, equation, and maybe other references are proken. Show up as ?? in text, and phylogenetic_biology.log has many warnings about these. Since they work fine for gitbook, seems like there is some global latex problem
There is a math formatting problem that aborts pdf build on page 52. This can be seen in tail of log file:
LaTeX Warning: Reference `eq:jc69' on page 53 undefined on input line 980.
! Missing $ inserted.
<inserted text>
$
l.983
Here is how much of TeX's memory you used:
18584 strings out of 479465
323958 string characters out of 5881418
785041 words of memory out of 5000000
37582 multiletter control sequences out of 15000+600000
541685 words of font info for 88 fonts, out of 8000000 for 9000
14 hyphenation exceptions out of 8191
84i,7n,118p,1251b,568s stack positions out of 5000i,500n,10000p,200000b,80000s
Output written on phylogenetic_biology.pdf (52 pages).
Some relevant links on this error:
https://www.overleaf.com/learn/latex/Errors/Missing%20$%20inserted
https://tex.stackexchange.com/questions/52804/missing-inserted-inserted-text (suggests that it could be a math character pulled in from bib)
phylogenetic_biology/06-evaluation.rmd
Line 105 in 98e8bae
I would describe Gamma slightly differently.
G refers to the actual rate heterogeneity across variant sites. This is modeled with a continuous
Gammadistribution that is discretized, usually in 4 rate categories. Sites are assigned to these different rate categories. To specify the shape of the
Gamma distribution, we use a parameter referred to as \alpha.
the discrete Gamma model of rate heterogeneity (Yang 1994), where each site is assigned to one of a set number (usually 4) rate categories. The distribution of rates across these categories is modeled with a parameter referred to as \alpha
Rows are analysis types, columns are features (tree topology, tree edge lengths, tip node states, internal node, etc...) and cells are how each of these features are handled in the analysis (clamp, marginzlize, estimate, etc...)
for example:
Sometimes estimates are nuisances
Clamping.
Estimate
Marginalizing.
and here is a start at the table:
Goal,tree topology,tree edge lengths,tip node states,internal node
inference,estimate,estimate,clamp,marginzlize,clamp,estimate
simulate data,clamp,clamp,estimate,estimate,clamp,clamp
independent contrast,clamp,clamp,clamp,estimate,clamp,estimate
simulate tree,estimate,estimate,,,,
inference,estimate,estimate,clamp,marginzlize,clamp,estimate
simulate data,clamp,clamp,estimate,estimate,clamp,clamp
independent contrast,clamp,clamp,clamp,estimate,clamp,estimate
simulate tree,estimate,estimate,,,,
inference,estimate,estimate,clamp,marginzlize,clamp,estimate
simulate data,clamp,clamp,estimate,estimate,clamp,clamp
independent contrast,clamp,clamp,clamp,estimate,clamp,estimate
simulate tree,estimate,estimate,,,,
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.