vegandevs / vegan Goto Github PK

View Code? Open in Web Editor NEW

444.0 36.0 96.0 494.23 MB

R package for community ecologists: popular ordination methods, ecological null models & diversity analysis

Home Page: https://vegandevs.github.io/vegan/

License: GNU General Public License v2.0

R 82.85% Fortran 5.79% C 10.16% TeX 1.20%

ecological-modelling ecology ordination r

vegan's Introduction

vegan: an R package for community ecologists

Ordination methods, diversity analysis and other functions for community and vegetation ecologists.

Website for the development version of the vegan package.

Vignettes are available on R-universe

Installation

To install the development version of vegan you can use the usual git and R CMD build -> R CMD INSTALL dance on the cloned repo (or downloaded sources). You'll need to be able to install packages from source for that to work; if you don't have the relevant developer tools, you won't be able to install vegan this way.

Using remotes

If you do have the developer tools installed but don't want the hassle of keeping a local source code tree up-to-date, use the remotes package:

install.packages("remotes")
remotes::install_github("vegandevs/vegan")

Installing binaries from R Universe

If you just want to install a binary version of the packages, just as you would from CRAN, you can install from our R Universe repository. Run the following in your R session:

install.packages('vegan',
    repos = c('https://vegandevs.r-universe.dev','https://cloud.r-project.org'))

vegan's People

Contributors

Stargazers

Watchers

Forkers

bbolker vanderleidebastiani entomomeyer mabarbour jherlan jarioksa guiblanchet wsenanan legendre fvd j-tweed cguzma8 fsh1v07 plantarum psolymos eduardszoecs palc dmcglinn friendly luffyluffy alenzhao simutin22 rededsky pnovack-gottshall loisstaylor microbiology tankmermaid gepellet nemochina2008 entomolab pappeh kevinwkc aaronjg brendanf cstan11 kasperskytte gravitytrope zhilongjia chreman ehoumanevans johaleguizamon pchalee valentinfche becarioprecario diegoibt nioz-qingz itimling hbwxf meta00 gavinsimpson jebyrnes fkeck k-o-n-s-t corryn-kn diogosbr ecodiv han-tun prminchin asfoep dbaldarrago andschar xiaolu2018 taegun89 stuartwheater nayanemaia cdelmera davilamaciel wook2014 evoart standardgalactic barbosabiologo antagomir klausvigo michaelchirico jtang-qhzx danielvlrd simrit1 salix-d bonniebao68 alicecci korterling hezibu dchaupis christiangoueguel leobrait jbernal7 handibles allenlile aris-budiman dhtc odongoisaya welch16

vegan's Issues

Methods to identify the number of nontrivial axes in wcmdscale

Paul and Anderson (2013) discuss 4 methods to assess how many numbers of axis might be usefull/interpretable/[putwhatyouwanthere]

a) broken-stick [implemented in vegan]
b) bootstrap CI for eigenvalues
c) bootstrap from null model
d) variation of c)

I wondered if b)-c) are already implemented in vegan? And if not, if it's worth implementing them?

I have somewhere on my machine a more veganized version of their code...

Paul WL, Anderson MJ (2013) Causal modeling with multivariate species data. Journal of Experimental Marine Biology and Ecology 448:72–84. doi: 10.1016/j.jembe.2013.05.028

simper with identical classes

In the following, we make a 'dummy' class that is a copy of 'ungrazed' class. The last 'ungrazed_dummy'
contrast compares two identical copies of the data:

library(vegan)
data(varespec)
x <- varespec
x <- rbind(x, x[19:24,])
z <- c(rep("grazed",18), rep("ungrazed", 6), rep("dummy",6))
mod <- simper(x, z, permu=99)
summary(mod)

We do still get variable P-values.

In addition, although the rows are copies of each other and the means are identical, we still get simper
contributions to the differences (which in this case actually are to within-class variability):

> mod
cumulative contributions of most influential species:
...
$ungrazed_dummy
 Cla.ste  Cla.ran  Vac.vit  Emp.nig  Ple.sch 
39.28130 53.28941 61.39104 68.85289 76.28894

This puts some shade over the whole method.

Parallel processing

One big change in the upcoming vegan 2.2-0 is parallel processing within some functions. Parallel processing is linked with permutations: these often take long time and can be parallelized. The getPermuteMatrix API (issue #31) is build so that it is easily put into parallel processing. Basically, if you can write the test procedure so that it uses apply, sapply or lapply over permutations, you can replace these with their parallelized counterparts. That was the reason why I wanted to get off the for loop and put test in one function when puttint mso into permute API (commit b67a8d2): now it is parallel-ready.

Not all permutation functions need be parallelized for the v2.2-0 release. Firstly, writing parallel code is more tedious than permutation code, because we must gather for so many different cases. For instance, Windows and Unix (including MacOS) have different preferred frameworks. That adds several lines of code. Secondly, the advantages of parallel processing can often be small. There is an overhead of launching parallel processing, and for fast and easy things non-parallel process can be faster. Parallel processing is useful only for long calculations. Thirdly, parallel processes need much more memory, and large problems can exhaust the memory. In that case the computer freezes. Therefore I don't think it is necessary to parallelize every process that can be parallelized. Specifically, this is not a blocking issue for the release. This issue only outlines the current situation so that we know what we could or should still do.

Currently following permutation routines can use parallel processing: adonis, anosim, mantel, mantel.partial, mrpp, permutest.cca (and hence anova.cca, ordistep etc.), and simper.

The following permutation routines are not parallelized, but they are parallel ready as they use the permute framework: CCorA, factorfit + vectorfit (usually called from envfit), mso, protest and permutest.betadisper. These (or some of these) could perhaps be moved to parallel processing, but I do not think this is release critical. Even if they are parallelized, that can happen after 2.2-0 release.

The following routines do not use permutation, but can use parallel processing: bioenv which makes an exhaustive search of all permissible models, and can do this in parallel; oecosimu can evaluate the test statistic in parallel for an array of simulated community matrices (and it has argument batchsize to limit memory use); metaMDS can launch several random starts in parallel. Actually, I have seen greatest benefits in these three cases. However, oecosimu is faster only if the evaluation of the statistic is expensive, and the risk of memory exhaustion is large.

There may be other such functions that could be parallelized easily. Among functions using longest time in R CMD check, cascadeKM looks like being easily parallelized (K-means clustering for different group sizes can be launched in parallel). The kendall.global etc also take long time, but it looks that more architectural work is needed before they can be turned parallel. If you have some other candidates for parallel processing, feel free to discuss them or implement parallel processing. As a rule, any loop that can be turned into apply, sapply or lapply can also be parallelized, and if that single apply, sapply or lapply can take a long time, the function should be parallelized.

The currently used design of parallel processing in vegan is discussed in the current vignette on Design decisions (in github or R-Forge mirrors, not yet in CRAN).

How to use R function `prc` in package `vegan` to produce the tables in van den Brink, P.J. & ter Braak, C.J.F. (1999)

Note this is a duplicate of my question on Cross Validated

This is also not a bug report but a question.

The paper is:

van den Brink, P.J. & ter Braak, C.J.F. (1999). Principal response curves: Analysis of time-dependent multivariate responses of biological community to stress. Environmental Toxicology and Chemistry, 18, 138-148.

The example data used in the paper:

spdta <- structure(list(`Species 1` = c(100, 100, 100, 100, 100, 110, 
110, 137.5, 157.143, 157.143, 120, 120, 150, 240, 240, 130, 130, 
144.444, 216.667, 216.667), `Species 2` = c(100, 100, 100, 100, 
100, 90, 90, 90, 90, 90, 80, 80, 80, 80, 80, 100, 100, 100, 100, 
100), `Species 3` = c(100, 100, 100, 100, 100, 120, 120, 96, 
84, 84, 140, 140, 112, 70, 70, 160, 160, 144, 96, 96), `Species 4` = c(100, 
100, 100, 100, 100, 90, 90, 72, 63, 63, 80, 80, 64, 40, 40, 70, 
70, 63, 42, 42), `Species 5` = c(200, 200, 200, 200, 200, 240, 
240, 153.6, 117.6, 117.6, 240, 240, 153.6, 60, 60, 200, 200, 
162, 72, 72), `Species 6` = c(100, 100, 100, 100, 100, 130, 130, 
83.2, 63.7, 63.7, 70, 70, 44.8, 17.5, 17.5, 100, 100, 81, 36, 
36)), .Names = c("Species 1", "Species 2", "Species 3", "Species 4", 
"Species 5", "Species 6"), class = "data.frame", row.names = c(NA, 
-20L))

# time
week <- gl(4, 5, labels=0:3)
# treatment
dose <- factor(rep(c("C","C","L","H","H"), 4),levels=c("C","L","H"),ordered=FALSE)

Perform a principal response curve analysis:

require(vegan)
#the species data are natural log(x) transformed
mod <- prc(response = log(spdta), treatment = dose, time = week)
plot(mod)

However, the species scores are difference from the ones listed in the paper, but I can get it back by the function scores

# Species scores:
scores.sps.1999 <- scores (mod, choices=1, scaling=1, const=-sqrt(nrow(spdta)), dis='sp')
scores.sps.1999

My question is:

The canonical coefficients are also different from the paper. I can manage to transform my results into the paper ones but don't know why.

sum_prc <- summary(mod,scaling=1)#, const=-sqrt(nrow(spdta)))

sum_prc

-(sum_prc$coefficients)/(sqrt(nrow(spdta))/2)
Also, the paper gives the standardized canonical coefficients rdt and standard deviations sdt. How do I get this piece of information from my PRC results?

rdt <- c(0,0,0,0,0.1805,0.3970,0,0.1805,0.7716,0,0.0852,0.5686)
sdt <- c(1e-10,1e-10,1e-10,1e-10,0.2179,0.3,1e-10,0.2179,0.3,1e-10,0.2179,0.3)
Cdt <- 0.2*rdt/sdt

Oecosimu: is the p-value not in accordance with alternative hypothesis?

Consider the following example of an analysis made with the oecosim function:

> analysis
oecosimu object

Call: oecosimu(comm = ba40.xtabs, nestfun = nestedchecker, method =
"tswap", nsimul = 1000, burnin = 10000, thin = 1000, statistic =
"C.score", alternative = "less")

simulation method tswap with 1000 simulations
options:  thin 25661, burnin 10000
alternative hypothesis: simulated median is less than the statistic

Checkerboard Units    : 1815 
C-score (species mean): 40.33333 

        statistic       z   mean     0%    50%    95% Pr(sim.)
C.score    40.333 -1.7485 41.491 39.756 41.444 42.734    0.97

Most details of this example are not very important, what confuses me is the following: the way the alternative hypothesis is articulated in relation to the resulting "p-value" (under the Pr(sim.) header at the end of the table).

The alternative hypothesis (AH) in this output reads:

simulated median is less than the statistic

Which is consistent with the option used for the argument alternative. So, given this AH my interpretation is that the p-value should be the probability that simulated values (i.e.: values produced in a "null hypothesis scenario") are equal or less than the observed one[*]. Rephrasing it, the p-value is the probability of committing a type-I error (accepting a false positive), being the "positive" case when the statistic (C.score here) is smaller than the null hypothesis case.

In this particular case, you can see that the observed value (40.333) and it´s normalized counterpart (z = -1.7485) is clearly smaller than most of the simulated ones. So the p-value should be small. As you can see, this is not the case, and indeed the opposite is true. I made this image also: a histogram of the simulated values and a vertical line where the observed C.score falls: histogram

I have been giving a lot of thought to this and even if I feel that I am most certainly right, there must be something that I must not be seeing. Because vegan is such a widely used package and has been for so long, it really baffles me that this can really be a mistake, silly as it may be.

Thank you for your time,
Juan Manuel

[*] after reading a wikipedia article on the p-value I thought that maybe "significance level" might be a more apropiate term; I'm not really sure anyway.

Adonis-function

Hello,
I´m a Phd student from Germany and I have a question concerning the Adonis-function out of vegan package. I tried to run a Multivariate Analysis of Covariance with my nonparametric dataset, but a problem arised. The R programm says this, every time I want to run the analysis :

ergebnis <- adonis(SICIEEG ~ Gruppe + AlterTMS, data=daten)
Fehler in rowSums(x, na.rm = TRUE) :
'x' must be an array of at least two dimensions

SICIEEG stands for the Y and is a continuous variable, Gruppe = group and is a nominal variable and AlterTMS= age and is also continuous. Do someone have any ideas what could be the problem concerning the analysis? I would really appreciate it, if somebody could help me, because this anaylsis isn´t very common and all other statistic softwares like SPSS aren´t able to run a Analysis of Covariance with a nonparametric dataset. Thank you for your help.

King regards

Anna Lisa

envfit(..., na.rm = TRUE)

envfit throws an error when NAs occur in a whole factor-level:

data(varespec)
data(varechem)
ord <- metaMDS(varespec, trace = 0)
varechem$fac <- gl(2,12)   # create factor
varechem$na <- c(rep(NA, 12), rnorm(12))   # create NAs so that a whole factor-level drops
(fit <- envfit(ord, varechem, na.rm = TRUE, perm = 999))
# Error in `colnames<-`(`*tmp*`, value = "fac") : 
#  attempt to set colnames on object with less than two dimensions

The reason is probably that the factor-level with NAs is not dropped, this causes the error in centroids.cca() [espacially line 15: tmp <- lapply(tmp, function(z) sapply(z, rbind))
does then not work how it should, which then yields to the error in line 26]

A quick-fix would be to add 'env <- droplevels(env)' after line 16 in envfit.default or in centroids.cca().

meandist - standard deviations

A user asked me per email if also the standard-deviations of dissimilarities can be computed (similar to meandist()).

A quick solution would be to change/add
tapply(dist, list(cl1, cl2), mean)
to
tapply(dist, list(cl1, cl2), sd)
within meandist().

I wondered if

this makes sense
and is worthwhile to be implemented as additional output of meandist() or as a new function sddist()?

orditkplot in Windows

Exporting orditkplot graphics worked differently in Windows and other OS's (Linux, MacOS): In Linux and MacOS the selected file type (pdf, png, jpg etc.) was appended automatically to the name, but no type was attached to the Windows output. This was an error in Windows, and the OS complained on unknown graphics type until user gave the output file name with appendix: filename fig failed, but it had to be given as fig.png (if it was a png file). NB, Windows does not usually show that extension after dot, but it still has to be given when exporting the graph.. The reason for the different behaviour was not in R nor in vegan, but in the external Tcl/Tk library. PR #97 fixed this behaviour, and now Windows should behave similarly as other platforms. However, I have only a limited access to Windows PC's and I could not test the change in different variants of Windows and R. So I pledge any Windows user over there to try this fix.

The fix is now included in source files of master and cran-2.2 branches in github as well as in R-Forge which provides binary Windows packages. Please try the fixed version and report all problems back to us with information on the platform where you had the problem.

Versions, ChangeLog, NEWS and R-Forge

These have in common that they are linear. In contrast, git is a branching and merging web. How to combine these linear structures with git?

So far we have used version numbering in development branch to have some internal structuring in development versions. In R-Forge we had separate release branch where versions were numbered by CRAN releases. So the version numbers changed much more rapidly in in development versions: we had 10 CRAN releases in 2.0, and 41 development versions in the same time.

Version numbering is linear. If we make a "feature fork", we cannot change versions there, but they must always be changed in the main branch after merging the feature. However, I suggest we drop the practice of frequent version changes in development version, and align the development version numbers with CRAN releases. I have two alternative numbering schemes:

First release is 2.2-0 and development (main) branch will be 2.2-1. We merge things we merge to the release branch and then release it as 2.2-2, renumber development to 2.2-3 and so on.
First release is 2.2-0 and development (main) branch will be 2.3-0. After publishing 2.2-0 in CRAN, we renumber release branch as 2.2-1 which would be a "patched" version. Then we release it as 2.2-2.

In both schemes, only even numbered versions would be in CRAN. It may be possible (haven't checked) to have decimal minor numbers (2.2-0.9), but text is not allowed (2.2-0-patched). Separating development (github main branch), CRAN release, and patched CRAN release should be somehow guaranteed.

Another issue is ChangeLog and NEWS. Now we have two separate things for about the same purpose. The idea was that ChangeLog is more fine grained and intended for fellow developers, and NEWS are only updated for the release and intended for users. In addition, version control adds its own very fine grained logs. Three levels is too much. R manages with svn logs and NEWS which are updated continuously as features are added and bugs fixed. We could do the same and drop the ChangeLog (this is my suggestion). In particular, if we do not have frequent version number changes, ChangeLog has little advantage over release NEWS. Naturally, we must keep NEWS linear: they cannot be changed in feature forks but only in the main branch after the merge.

Last issue is R-Forge. I have googled for advice on maintaining repos in parallel in svn and in git. The most usual piece of advice is "don't!". The major problem is again that R-Forge is linear. So far I have managed to keep R-Forge in sync with git within a day or two. If we want to keep R-Forge, we could change policy so that we only sync the release branch to R-Forge. This would change rather infrequently and be linear minimizing the pain of syncing. The most important advantage of R-Forge is that they offer Windows binaries that can be installed directly with standard R tools. With this scheme, the stable versions are in CRAN, patched versions in R-Forge and unstable development in github. The first two can be installed without any special tools also in Windows (but no more in Mac in R-Forge), but github version needs special R packages plus C and Fortran compilers.

Xref to an archived package (mvpart)

Brian Ripley sent this message:

Date: Sun, 14 Dec 2014 09:25:38 +0000
From: Prof Brian Ripley
To: Jari Oksanen
CC: CRAN
Subject: CRAN package vegan

Its checks are now giving warnings:

checking Rd cross-references ... WARNING
Unknown package mvpart�� in Rd xrefs

That package is not mentioned in the DESCRIPTION file (it should have
been), and has finally been archived.

Please submit an update correcting this.

permustats for anova.cca

I just noticed that there is no permustats method for anova.cca.

Shouldn't be hard to implement:
Just return F.perm from permutest with the anova.cca-object and write a permustats.anova.cca method.

Why is this? Maybe not the clutter the anova.cca object?

Incorrect number of neighbors when plotting isomap

When ploting an isomap object, I observed that some points have more neighbors than expected. I have check that it was not due to overlapping points.

Minimal working example:

library(vegan)
set.seed(15)
dataset <- data.frame(matrix(rnorm(1000), ncol=8))
imap <- isomap(dist(dataset), k=4)
plot(imap)
which(imap$points[,2] > 4) # check for overlap

The top point has five neighbors whereas k was set to 4.

suggested plots to have (feature request)

library(vegan)

The dune data set is not the most ideal, but since it is used for illustration purposes in the package, we will use it here to show plots that would be good to add to the package

data(dune)
dim(dune)
dune[1,]
dune[,1]

Load other libraries

library(ggplot2)
library(dplyr)
library(tidyr)

Plot the marginals for both Species and Site

Purpose is to examine the difference in abundances across Species and Sites. Any with numbers too small might be better to remove for analysis

dune$Site <- as.character(1:20)
d.g <- gather(dune, Species, Count, -Site)
d.m <- summarise(group_by(d.g, Species), med = median(Count, na.rm=T))
d.g$Species <- factor(d.g$Species, levels=d.m$Species[order(d.m$med)])

Use a boxplot on the raw scale.

qplot(Species, Count, data=d.g, geom="boxplot") + coord_flip()
d.s <- summarise(group_by(d.g, Species), sum = sum(Count, na.rm=T))
d.g$Species <- factor(d.g$Species, levels=d.s$Species[order(d.s$sum)])

Show as a barchart of counts across sites

qplot(Species, weight=Count, data=d.g, geom="bar") + coord_flip()

d.m2 <- summarise(group_by(d.g, Site), med = median(Count, na.rm=T))
d.g$Site <- factor(d.g$Site, levels=d.m2$Site[order(d.m2$med)])

Use a boxplot on the raw scale.

qplot(Site, Count, data=d.g, geom="boxplot") + coord_flip()
d.s2 <- summarise(group_by(d.g, Site), sum = sum(Count, na.rm=T))
d.g$Site <- factor(d.g$Site, levels=d.s2$Site[order(d.s2$sum)])

Show as a barchart of counts across species

qplot(Site, weight=Count, data=d.g, geom="bar") + coord_flip()

Use a fluctuation diagram to show species by count

The purpose is to see species that are prevalent everywhere, vs specialist, sites which only host particular species, vs sites where most species can be found. Looking at associations between sites and species. These plots can help to digest and diagnose the MDS results

qplot(Site, Species, data=d.g, size=Count) + scale_size_area()
qplot(Site, Species, data=d.g, fill=Count, geom="tile")

Allow more control over colours / groups for plot.betadisper

Several of the drawn components of the plot.betadisper() output are hardcode. Allow users to specify vectors of colours for these elements.

Remove `ordirgl()` and related from vegan to veganrgl pkg?

rgl is a pain to use on Macs because in Apple's infinite wisdom having provide their uses with an OS with a strong Unix heritage they labotomise it by not including a lot of important tools/compilers etc.

How about we rip ordirgl() and related functions out of vegan and stick them in a new package, say veganrgl and then we include this package in the Enhances: field for vegan?

To be honest, rgl is a little gimmicky anyway, so perhaps we don't lose much by relegating this to its own package. I made a similar change to my analogue package in response to reports that it was hard to install because of rgl. We could try the same fix here with vegan or just carve off this functionality into a new package?

Thoughts?

anova.prc

anova.prc gives an error when used with the argument by = "axis".

require(vegan)
data(pyrifos)  
week <- gl(11, 12, labels=c(-4, -1, 0.1, 1, 2, 4, 8, 12, 15, 19, 24))   
dose <- factor(rep(c(0.1, 0, 0, 0.9, 0, 44, 6, 0.1, 44, 0.9, 0, 6), 11)) 

pyr_prc <- prc(response = pyrifos, treatment = dose, time = week)
anova(pyr_prc, by = "axis")
# Error in eval(expr, envir, enclos) : 
#  could not find function "rda.formula"

The problem is perhabs that rda.formula is not exported by vegan.
A quick fix is to change "rda.formula" in line 18 by "rda".

Check packages dependent on vegan

Here an update of checking packages dependent on vegan.

The check summary is

Check status summary:
                  ERROR WARN NOTE OK
  Source packages     0    0    0  1
  Reverse depends     5    1   18 37

However, most of the reported problems are not related to vegan. Below I list all cases which seem to be related to vegan. The following two failures are caused by changes in vegan:

analogue install fails due to splitting vegan3d from vegan:

Error : object ‘ordirgl’ is not exported by 'namespace:vegan'

I assume @gavinsimpson already fixed this, but CRAN still has the old version of analogue.

BiodiversityR fails in examples because dune data moved to 4+4 names:

> dune.env$Agrsto<- dune$Agrsto
> Count.model1 <- glm(Agrsto ~ Management + A1, family=quasipoisson(link=log), 
+     data=dune.env, na.action=na.omit)
Error in eval(expr, envir, enclos) : object 'Agrsto' not found

The rest are all cases where the package uses vegan function, but does not import them properly. These are not caused by vegan, but should be fixed within these functions.

betaper refers to non-exported vegan functions. Should be fixed there. NB cca.formula and rda.formula are not exported from vegan.

* checking R code for possible problems ... NOTE
cca.pertables : cca.var: no visible binding for global variable
  ‘cca.formula’
cca.pertables: no visible binding for global variable ‘cca.formula’
rda.pertables : rda.var: no visible binding for global variable
  ‘rda.formula’
rda.pertables: no visible binding for global variable ‘rda.formula’

cocorresp @gavinsimpson should check NAMESPACE (vectorfit and factorfit are exported):

* checking R code for possible problems ... NOTE
envfit.coca: no visible global function definition for ‘vectorfit’
envfit.coca: no visible global function definition for ‘factorfit’

isopam should check its imports:

* checking R code for possible problems ... NOTE
isopam : core: no visible global function definition for
  ‘distconnected’

mpmcorrelogram should check its imports:

* checking R code for possible problems ... NOTE
mpmcorrelogram: no visible global function definition for
  ‘mantel.partial’

rich should check its imports (there are four other references to specpool):

c2cv : SRobs: no visible global function definition for ‘specpool’

vegclust should check its imports:

* checking R code for possible problems ... NOTE
vegdiststruct: no visible global function definition for ‘decostand’

yaImpute should check its imports:

yai: no visible global function definition for ‘cca’
yai: no visible global function definition for ‘rda’

Typo leading to bug in conditional test in linestack()

In linestack() we have a conditional test for backwards compatability regarding the labels argument (Line 5)

    if (!missing(labels) && length(labels == 1) && pmatch(labels, 
                                   c("right", "left"), nomatch = FALSE)) {
        side <- labels
        labels <- NULL
        warning("argument 'label' is deprecated: use 'side'")
    }

The bug induced by the typo is in the part length(labels == 1), which should be length(labels) == 1.

This causes problems when you actually supply a vector labels as depending on what you supply as labels the == binary operator may not make sense.

I'm happy to fix this alongside some other changes in issue #80 .

tolerance.cca fails with unconstrained CA

tolerance.cca function is documented to work both with constrained (CCA) unconstrained (CA) correspondence analysis, but it fails with unconstrained:

> data(dune)
> mod <- cca(dune)
> tolerance(mod)
Error in tolerance.cca(mod) : 
  dims [product 600] do not match the length of object [0]

Variance estimator for Chao-1 index in estimateR

vegan uses unbiased estimator for Chao-1 index but biased esimator for its variance in estimateR. The issue was raised by Jose M. Blanco Moreno in R-sig-ecology mailing list. Moreover, it seems that EstimateS software has introduced small-sample correction and probably we should follow. These are documented in the EstimateS web pages, but it looks to me that we need to go to the original papers to check the information on that page and also find the justification for these equations.

Using anova.ccabyterm within function?

I am trying to use anova.ccabyterm (but this also concerns *byaxis and *bymargin) within a function.

Example
For example using the pyrifos data and to run a RDA for every week, I could do something along these lines:

data(pyrifos)
week <- gl(11, 12, labels=c(-4, -1, 0.1, 1, 2, 4, 8, 12, 15, 19, 24))
dose <- factor(rep(c(0.1, 0, 0, 0.9, 0, 44, 6, 0.1, 44, 0.9, 0, 6), 11))

# RDA per week
out <- NULL
for (i in levels(week)) {
  out[[i]] <- anova(rda(pyrifos[week == i, ] ~ dose[week==i]), by = "terms")
}
# works!

However wrapping into a function breaks:

foo2 <- function(response, treatment, time){
  out <- NULL
  for (i in levels(time)) {
    out[[i]] <- anova(rda(response[time == i, ] ~ treatment[time==i]), by = 'terms')
  }
  return(out)
}
foo2(response = pyrifos, treatment = dose, time = week)
#  Error in eval(expr, envir, enclos) : object 'response' not found

The error throws along the lines of model.frame.cca. Any Ideas?

labels argument of plot.envfit not doing what it is documented to do

There is a bug in the documentation or the code relating to the labels argument in plot.envfit. labels is supposed to

  labels: Change plotting labels. The argument should be a list with
          elements ‘vectors’ and ‘factors’ which give the new plotting
          labels. If either of these elements is omitted, the default
          labels will be used. If there is only one type of elements
          (only ‘vectors’ or only ‘factors’), the labels can be given
          as vector. The default labels can be displayed with ‘labels’
          command.

but in the case where there is only one of vectors or factors the labels can't be passed as a vector and have any effect on what is drawn. Here is a reproducible example:

data(varespec)
data(varechem)
library(MASS)
ord <- metaMDS(varespec)
fit <- envfit(ord, varechem, perm = 999)
plot(fit, labels = paste0("label", seq_len(ncol(varechem))), add = FALSE)

which draws

But it should have changed the labels because

> is.null(fit$vectors)
[1] FALSE
> is.null(fit$factors)
[1] TRUE

Oecosimu/How to obtain pairwise species C.score

Hi, I'm new R and Vegan package user and wanted to know how I can get the values of C-score for each pair of species.
Thank u in advance
C.

pcnm should fail earlier if passed something clearly not a distance matrix

This is a little unfriendly:

> data(mite.xy)
> foo <- pcnm(mite.xy)
Error in as.matrix.dist(d^2) : 
  length of 'dimnames' [1] not equal to array extent

Suggest we change pcnm to have the following check

if (!inherits(dis, "dist")) {
    dims <- dim(dis)
    if (length(unique(dims)) >1) {
        stop("'dis' doesn't appear to be a square distance matrix.")
    }
    dis <- as.dist(dis)
}

Thoughts?

Switch vegan permutations to use permute package

I work currently in branch jarioksa/use-permute-pkg to make all vegan permutation function to use permute package.

The user should be able to define permutations using

number of permutations in which case simple free permutations are generated, possibly with strata.
a permutation matrix
a definition of permutations for the permute package, usually generated by permute::how(). I call this a how structure later.
From the programmer point of view, all permutations use permutation matrix (alternative 2). So the task is to produce permutation matrix (2) if user gave (1) or (3).

The solution in use-permute-pkg branch is to have a non-exported interface function GetPermuteMatrix. With that the switch to use permute becomes a one-liner, and replaces many lines of variable code in the functions. It seems that all vegan functions already use internally permutation matrices so that this change should be easy (expect trouble when you say so). The solution is modular and easy to maintain.

At the moment, the following vegan functions still use non-exported vegan:::permuted.index(n, strata) to generate permutation matrices: adonis, anosim, CCorA, envfit (and factorfit and vectorfit: these are called from envfit, but also can be used independently), mantel, mantel.partial, mrpp and nesteddisc. The following use permute::shuffleSet() for the same task: permutest.betadisper, protest, simper. These have various hacks to handle different user input, and should also be changed to use GetPermuteMatrix. Functions permutest.cca, anova.cca and anova.cca* already use GetPermuteMatrix in the use-permute-pkg branch. The others will follow. I have not yet decided if this should be merged to upstream by stages or after finishing everything.

This change also means that strata argument will go. Instead, users should use how to generate constrained permutations. The strata argument has been in vegan "always" (from vegan 1.4-1 released in October, 2002), and we must remove it gracefully:

In the release notes of 2.2-0 we tell that strata will be deprecated in the future.
We add a deprecation message ("use how() instead of 'strata'") in GetPermuteMatrix.
We remove strata from the argument lists of functions, but may still allow its use via ....
We deprecate and remove handling strata in GetPermuteMatrix.
That would mean four releases to phase out strata.

*byaxis and prc()

Using anova.ccabyaxis with a prc-object breaks.

However defining a PRC as partial RDA seems to work.
I think this is a scoping issue - it breaks on the line

sol <- anova(update(object, fla, data = lc), ...)

and maybe related to update (see Gavins email from 11/22/2013 08:01 PM).
However I don't know why this exactly happens, since the prc-object should be similar to the rda-object?

data(pyrifos)
week <- gl(11, 12, labels=c(-4, -1, 0.1, 1, 2, 4, 8, 12, 15, 19, 24))
dose <- factor(rep(c(0.1, 0, 0, 0.9, 0, 44, 6, 0.1, 44, 0.9, 0, 6), 11))
# PRC
mod <- prc(pyrifos, dose, week)

anova.cca(mod, by = 'axis')
# Error in eval(expr, envir, enclos) : object 'lc' not found
anovacca(mod, by = 'axis')
# Error in eval(expr, envir, enclos) : object 'LC' not found

# PRC as partial RDA (usual model matrix)
mod2 <- rda(pyrifos ~ dose * week + Condition(week))
anova.cca(mod2, by = 'axis', step = 10)
anovacca(mod2, by = 'axis')

plot.cca can sometimes not set xlim correctly to contain the requested scores

plot.cca considers only

species
sites
constraints
default

components of the scores requested when computing xlim and ylim for plotting. This causes at least issues when all the variables are categorical and the user requests "cn", with say "species" also drawn. For example, consider the plots in the image below

The plot on the left was created with plot(c1) whilst the plot on the right was created with plot(c1, display = c("species","cn")). c1 is the result of a call to rda().

    if (missing(xlim)) 
        xlim <- range(g$spe[, 1], g$sit[, 1], g$con[, 1], g$default[, 1], na.rm = TRUE)
    if (!any(is.finite(xlim))) 
        stop("no finite scores to plot")
    if (missing(ylim)) 
        ylim <- range(g$spe[, 2], g$sit[, 2], g$con[, 2], g$default[, 2], na.rm = TRUE)

The offending lines are above; we don't consider all the scores that the user requests. We should consider all the types of possible scores allowed for the scores() method.

There is also another issue above, as we are rely on partial matching the components in the use of $, which we probably shouldn't be doing.

Calculating Shannon's equitability?

Can we add function to calculate Shannon's equitability? It can be added along with the diversity index option.

Prepare vegan for the CRAN release 2.2-0

Here we list various minor issues that must be fixed before the release. The major problems have their own issue numbers.

Decide how we do it: in R-Forge we had a separate branches/2.0 branch for releases, and we merged from the main (development) branch into it. The argument was that we could be freer with development and maintain a stable release. The same argument would apply here as well, and branching is easier and cheaper in git. The idea was also that minor releases (2.0-1, 2.0-2 etc) could be made by gradual merging, but major releases would directly fork from the main (development) branch. We also edited the release branch to differ from the main branch: for instance, tests/ were removed. Resolution: have a separate release branch. Real Life(™) has demonstrated that development is too lively to be release ready.
Write NEWS
Decide what to do with scores.hclust (remove?) -- removed separate Rd page so that the function is less visible (aef5c54). Still regarded as vulnerable or nearly endangered.
scoverage needs better documentation (also on justification), or then it can be dropped. We had some discussion about this with @psolymos years ago, but I have to refresh my memory (also on the method). Removed with 51db4da.
permutest.betadisper(..., pairwise=TRUE) bug must be fixed. This is discussed in commit a4793c4. Fixed with 47f3482.
remove u.eig and other *.eig items from cca result object and test that other functions still work after this. This seems to be doable within vegan, but there are 60 packages depending on, importing or suggesting vegan in CRAN (as of Sep 30, 2014). The only case where vegan functions still use *.eig are imaginary axes in capscale (i.e,, those associated with negative eigenvalues). These could be left as such. -- Done now in my private fork. I will make a PR after some checks. I am not sure this should be merged to 2.2-0: we have said that there is no guarantee to keep *.eig items as long as we have documented cca.object or since 2005, but clear warning of really removing the items has not been in any CRAN release (only in R-Forge versions since March 2013). Removed in 70e5542 .
update man/permutations.Rd to the new permute infrastructure (describes the old way).
some anova.cca* cases fail with missing data. Some of these could be fixed by listwise deletion of observations: do so or clearly decide to delay till later (minor) release. Resolution: nothing will be done here for the release. There is a need to do something, but it seems to require quite too much work at the moment. I also think that handling of missing values would need serious rethinking at all levels: now permutest.cca omits NA cases both from the community and constraints and only non-missing values are permuted. We should permute all values so that missing values are shuffled, too. Needs thinking and working, and both too much just now.
ordiconsensus of @guiblanchet. This is a separate body of code and could be cleanly merged. What worries me is that there are more than 1000 lines of code and 14 new files. I would like to have a better grasp of the code before going to a merge. Moreover, it should be checked if this body of code should be veganified. Some functions look pretty long and seem to have pieces that could be independently usable, and then the vegan style is to allow user access. Further, it seems to introduce new community null models, but these should be made to fit in the vegan nullmodel framework. We do not have only nullmodel here, but also simulate.rda etc use that framework. Resolution: This is not release critical. We can merge the new functions later in the 2.2 releases. That's something for @guiblanchet ...

linestack() doesn't allow user to successfully pass a vector of expressions to argument 'labels'

Because of the implementation in linestack(), whereby we set whatever is passed to labels as the names() of the vector of values plotted (and then proceed to operate on the assumption that these names are set and are valid), a user cannot pass a vector of expressions to use as the labels for the stacked markers.

Well, they can pass such a vector but the conversion that happens in setting these labels as the names attribute results in them being displayed explicitly on the plot, including the plotmath markup, rather than as plotmath-interpreted values.

This will need a minor rewrite of the logic in the function so that it works without relying on setting names and then using those names for plotting.

Make `scaling` more user friendly

I am forever forgetting which way round the scaling numbers go for "site"- or "species"-focussed scalings. The numerical concept for scalings is canonical because of Canoco (& related software) but even that has long-since dropped that convention in the main user interface.

Proposal is to allow informative, non-numeric, indicators for the type of scaling within all functions that support a scaling argument and where such a change makes sense.

Thoughts on the initial idea before we speak to specifics of how to implement and what to use for the scaling strings?

Places we need to change this are, at a minimum:

simulate.nullmodel or commsimulator: attrition during simulation?

Hi,

I am trying to simulate bipartite network with known degree sequences using commsimulator (or simulate.nullmodel, whatever works...). It looks like the function actually decreases the number of 1's in the matrix (see code below).

Any help in finding out whether this is a real bug of commsimulator or a mistake of mine will be greatly appreciated.
Cheers,

François

Here is the incriminated code:

## function alpha: computes parameter necessary for adjusting the power law degree sequences

alpha<-function(s,beta,c,d){
    locfun<-function(x){
        ((rep(1,s)%*%sapply(sapply(floor(x*(1:s)^(-(1/(beta-1)))),function(z) min(z,d)),function(y) max(y,1)))-c)
    }
    uniroot(f=locfun,interval = c(0,2*c))$root
}

## function powerlawdegreeseq: yields a sequence of non-increasing degrees such that the ensuing distribution follows a power law of parameter beta, following Chung et al. 2003

powerlawdegreeseq<-function(s,beta,c,d){
    a <- alpha(s,beta,c,d)
    sapply(sapply(floor(a*(1:s)^(-(1/(beta-1)))),function(z) min(z,d)),function(y) max(y,1))
}

## function generateZ: takes two power law degree sequences and make it a random bipartite network

generateZ<-function(sa,sp,betaa,betap,c){
    delta <- powerlawdegreeseq(sa,betaa,c,sp)
    d <- powerlawdegreeseq(sp,betap,c,sa)
    z <- r2dtable(1, delta, d)[[1]]
    simulate(nullmodel(z, "quasiswap"))
}

## the problem with commsimulator...
histogram(sapply(rep(40,1000),function(x) sum(generateZ(20,20,2.2,2.2,x))))
## compare with what is obtained from Patel's algorithm (not binary matrix)
histogram(sapply(rep(40,1000),function(x) sum(r2dtable(1, powerlawdegreeseq(20,2.2,x,20), powerlawdegreeseq(20,2.2,x,20))[[1]])))

Option to not use the biased estimator of a p-value in permutation tests

I think we need to add an argument or include a test in the permutation codes used in vegan to allow users to not use the biased estimator of a p value in the situation where complete enumeration of all permutations is requested (or initiated through the permute heuristics). In such cases, we have an exact test and thus have the tail probability exactly and don't need to estimate it.

Which would we prefer?

an argument to each function (say biased = TRUE) which allows a user to turn this off at any time if they so wish, or
we add a test in each function which checks to see if complete enumeration is active and if it is we don't use the biased estimator of the p value.

Other suggestions or thoughts?

Error running [R] vegan function 'simper' analysis on OS Windows 7, R v 2.15.3 in RStudio

Error running [R] vegan function 'simper' analysis on OS Windows 7, R v 2.15.3 in RStudio

Function
simper(comm, group...)

    **run function simper, as per example in help file for simper:** # see below for explanation of variables
    >(simp<-with(site.cluster, simper(an.t.m, clust)))

Error in matrix(ncol = P, nrow = n.a * n.b) : invalid 'nrow' value (too large or NA)

In my script,

matrix an.t.m, equivalent to comm
and data frame site.cluster, containing variable clust equivalent to group
as per example in the help file, using dune and dune.env

    **Data structure:** #of full data set
    >str(an.t.m)

'data.frame': 16 obs. of 17 variables:
> str(site.cluster)
'data.frame': 16 obs. of 18 variables:

E.g. below is subet of the data frames, some of the columns are omitted for brevity

    **am.t.m** *row.names* show the sites, *column headings* show the benthic cover, equivalent to species, and *clust* shows the cluster equivalent to group in the function.

subsetted using :an.t.m[,c("SAND..SND.","CORAL..C.", "SEAGRASS..SG.", "ALGAL.ASSEMBLAGE..AA.")]

row.names	SAND..SND.	CORAL..C.	SEAGRASS..SG.	ALGAL.ASSEMBLAGE..AA.
test_AN_20140613	75.6410256	8.717949	0.00000	6.153846
AN13_s1	0.0000000	57.142857	0.00000	11.904762
AN12_closer_to_shore	74.2021277	0.000000	0.00000	0.000000
AN13	0.0000000	58.974359	0.00000	15.384615
AN14	11.2531969	8.439898	42.45524	0.000000
AN14_to_AN15	37.8640777	3.236246	0.00000	0.000000
AN17_deeper	6.3218391	12.643678	0.00000	0.000000
AN18	37.0748299	35.714286	1.70068	0.000000
AN04_bac	16.8421053	16.842105	0.00000	15.789474
AN04_but	65.8682635	19.760479	0.00000	1.197605
AN04_slo	10.0334448	39.464883	0.00000	16.722408
AN07	3.0386740	96.132597	0.00000	0.000000
AN09_back	0.7936508	29.365079	0.00000	22.486772
AN09_butt	27.2972973	57.297297	0.00000	7.027027
AN09_cres	9.0909091	3.030303	0.00000	63.636364
AN09_slop	6.7073171	30.792683	0.00000	13.109756

site.cluster #subsetted using :site.cluster[,"clust"]

row.names	clust
test_AN_20140613	1
AN13_s1	2
AN12_closer_to_shore	1
AN13	2
AN14	3
AN14_to_AN15	3
AN17_deeper	4
AN18	2
AN04_bac	2
AN04_but	1
AN04_slo	2
AN07	5
AN09_back	4
AN09_butt	2
AN09_cres	4
AN09_slop	2

Checking packages dependent on vegan

I ran the following check with vegan fba9b24 with PR #51 merged, and found quite a few problems we have brought along with vegan. Annoyingly, these also concern our own packages (analogue and cocorresp of @gavinsimpson ). The output of the test will be pretty long. This message will contain only that output, and a follow-up message comments on some packages that we broke. Thanks to the R and CRAN team for this excellent and powerfull tool! But here the test in its full extent:

> library(tools)
## I had vegan_2.1-43.tar.gz in /tmp
> uh <- check_packages_in_dir("/tmp", reverse = list(repos = getOption("repos")["CRAN"]), Ncpus=6)
> summary(uh)
Check results for packages in dir '/tmp':

Check status summary:
                  ERROR WARN NOTE OK
  Source packages     0    0    0  1
  Reverse depends     7    2   13 21

Check results summary:
vegan ... OK
rdepends_analogue ... ERROR
* checking whether package ‘analogue’ can be installed ... ERROR
rdepends_BAT ... OK
rdepends_betaper ... NOTE
* checking dependencies in R code ... NOTE
* checking R code for possible problems ... NOTE
* checking Rd \usage sections ... NOTE
rdepends_bipartite ... WARN
* checking examples ... WARNING
rdepends_biplotbootGUI ... NOTE
* checking dependencies in R code ... NOTE
* checking R code for possible problems ... NOTE
rdepends_blender ... OK
rdepends_cocorresp ... ERROR
* checking R code for possible problems ... NOTE
* checking Rd cross-references ... NOTE
* checking examples ... ERROR
rdepends_CommunityCorrelogram ... OK
rdepends_dave ... NOTE
* checking R code for possible problems ... NOTE
rdepends_Demerelate ... OK
rdepends_FD ... OK
rdepends_forams ... OK
rdepends_FreeSortR ... OK
rdepends_geomorph ... OK
rdepends_GUniFrac ... OK
rdepends_iDynoR ... NOTE
* checking dependencies in R code ... NOTE
* checking R code for possible problems ... NOTE
rdepends_isopam ... NOTE
* checking R code for possible problems ... NOTE
rdepends_lefse ... OK
rdepends_loe ... OK
rdepends_memgene ... NOTE
* checking R code for possible problems ... NOTE
rdepends_metacom ... WARN
* checking examples ... WARNING
rdepends_mpmcorrelogram ... ERROR
* checking dependencies in R code ... NOTE
* checking R code for possible problems ... NOTE
* checking examples ... ERROR
rdepends_MVPARTwrap ... OK
rdepends_palaeoSig ... NOTE
* checking package dependencies ... NOTE
* checking dependencies in R code ... NOTE
rdepends_paleoMAS ... NOTE
* checking dependencies in R code ... NOTE
* checking data for non-ASCII characters ... NOTE
rdepends_PCPS ... OK
rdepends_picante ... NOTE
* checking R code for possible problems ... NOTE
rdepends_PopGenReport ... OK
rdepends_poppr ... NOTE
* checking R code for possible problems ... NOTE
rdepends_RAM ... ERROR
* checking package dependencies ... ERROR
rdepends_rareNMtests ... OK
rdepends_recluster ... OK
rdepends_rich ... NOTE
* checking dependencies in R code ... NOTE
* checking R code for possible problems ... NOTE
rdepends_rioja ... ERROR
* checking package dependencies ... ERROR
rdepends_sharpshootR ... ERROR
* checking package dependencies ... ERROR
rdepends_simba ... OK
rdepends_soundecology ... ERROR
* checking package dependencies ... ERROR
rdepends_SYNCSA ... OK
rdepends_taxize ... OK
rdepends_tmap ... OK
rdepends_tspmeta ... OK
rdepends_vegclust ... NOTE
* checking dependencies in R code ... NOTE
* checking R code for possible problems ... NOTE
rdepends_xergm ... NOTE
* checking R code for possible problems ... NOTE

anova.cca doesn't pass relevant arguments on to permutest.cca in the basic overall test

anova.cca() calls permutest.cca() in the basic overall test mode as follows:

    ## basic overall test
    tst <- permutest.cca(object, permutations = permutations, ...)

This does not pass relevant arguments on to permutest.cca(), especially parallel. This is because ... doesn't contain those arguments as they are listed after ... in definition of anova.cca. We need to pass the relevant arguments to permutest.cca if we want to have parallel processing in the general case. Likewise the current code doesn't allow passing model to permutest.cca, nor strata.

The above call should be changed to

    ## basic overall test
    tst <- permutest.cca(object, permutations = permutations, parallel = parallel,
                         model = model, strata = strata, ...)

Export `ordiArrowMul()`

Would there be any objection to exporting the currently internal function ordiArrowMul()? ggvegan makes use of this to rescale arrows in (tri|bi)plots of constrained ordinations. I'm currently accessing this as vegan:::ordiArrowMul but R CMD check complains about this usage as a Note.

If it's not suitable for export in its current form, what would be required before we'd be happy exporting it?

anova.cca of partial model fails with strata

The following fails:

> m <- cca(dune ~ Moisture + Condition(Management), dune.env)
> anova(m, strata=dune.env$Management)
Error in getPermuteMatrix(permutations, N, strata = strata) : 
  'strata' can be used only with simple permutation or with 'how()'

It seems that getPermuteMatrix is called twice: first in the main anova.cca which then passes the generated permutationMatrix object to permutest.cca with strata argument and this triggers the error in test:

    if (!missing(strata) && !is.null(strata)) {
        if (!inherits(perm, "how")) 
            stop("'strata' can be used only with simple permutation or with 'how()'")
    }

pass permutation matrix to anova.cca / hooking permute into vegan

Currently, permutest.cca accepts if the users passes a permutation matrix.

If a user wants to use a restricted permutation sheme he can construct the permutation matrix with the permute-package and pass this to permutest.cca.

However this does not work with anova.cca. anova.cca tries to be efficient with permutations and does not pass a given permution matrix.

A quick fix would be to bypass the while-loop if permutations-Argument is a martix of correct dimensions. Haven't checked how this could be done in anova.ccaby* functions.

Oecosimu/bipartite C.score

Hi! I am quite new to bioinformatic and I have difficulties to understand the meaning of the output of oecosimu. What mean the values of "statistics" , the different percentages and the "Pr (sim)"? I got told if Pr(sim)=0.014, that means 1,4% is explained by randomization but I am not totally sure.

And about the C.score, I read that size of data.set will influence the value of C score, with larger data meaning lower C score. A paper found that lot of randomization models showed null hypothesis rejected with non-corrected C score, but after being corrected, these showed that nulll hypothesis was accepted. I am studying microbial communities so I was wondering if there are correction patterns as i have large dataset (1094 OTUs for 20 samples).

Thank you!

bug in rda()

Hello,

I recently have problems with running the rda function (vegan version 2.0-7), even in datasets/scripts that worked in earlier versions. All attempts to use rda result in the error message

"Warning message:
In is.na(x$error.rate[1]) :
  is.na() applied to non-(list or vector) of type 'NULL'"

even though the same datasets compute easily with cca() or princomp() functions.

see example below

thanks,
Lukas

> data
                 F2B2   Gposneg  cy17prec
ER 1 2 L   0.90895926 0.3577752 0.3430286
ER 2 1 L   1.11370941 0.3557982 0.3400728
ER 3 1 L   0.95430094 0.3850514 0.3464962
GC NBW 2 L 1.08125909 0.3640970 0.5118545
GC NBE 1 L 0.49672687 0.4687129 0.4660554
GC O'R 1 L 0.71866998 0.4029527 0.5032395
ER 1 2 F   0.25949388 0.4377208 0.2837508
ER 2 1 F   0.31283615 0.3919091 0.2901602
ER 3 1 F   0.27872644 0.3930680 0.3054990
GC NBW 2 F 0.30475400 0.4553372 0.3621558
GC NBE 1 F 0.18854520 0.4689930 0.3740901
GC O'R 1 F 0.31850585 0.4845143 0.4645049
ER 1 2 H   0.12427040 0.3673476 0.4045438
ER 2 1 H   0.20619640 0.4336769 0.2231765
ER 3 1 H   0.10008428 0.3694892 0.3556774
GC NBW 2 H 0.17673029 0.3870191 0.3308577
GC NBE 1 H 0.12834551 0.4877205 0.3697912
GC O'R 1 H 0.18291568 0.4356605 0.4215446
ER 1 2 B   0.03497051 0.2322700 0.7843062
ER 2 1 B   0.02904011 0.3686987 0.7139940
ER 3 1 B   0.03024040 0.3768345 0.6292142
GC NBW 2 B 0.01942259 0.2897734 0.7365536
GC NBE 1 B 0.02849548 0.2792693 0.7661750
GC O'R 1 B 0.02954817 0.3244115 0.4464901

> princomp(data)
Call:
princomp(x = data)
Standard deviations:
    Comp.1     Comp.2     Comp.3 
0.34953160 0.15425761 0.04774404 
 3  variables and  24 observations.

> rda(data)
Call: 
rda(X = data)
Regularization parameters: 
NULL
Prior probabilities of groups: 
NULL
Misclassification rate: 
       apparent:  %
Warning message:
In is.na(x$error.rate[1]) :
  is.na() applied to non-(list or vector) of type 'NULL'

> cca(data)
Call: cca(X = data)
              Inertia Rank
Total          0.2447     
Unconstrained  0.2447    2
Inertia is mean squared contingency coefficient 
Eigenvalues for unconstrained axes:
    CA1     CA2 
0.20872 0.03603 

> R.version
platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          3                           
minor          0.2                         
year           2013                        
month          09                          
day            25                          
svn rev        63987                       
language       R                           
version.string R version 3.0.2 (2013-09-25)
nickname       Frisbee Sailing

Scoping issue in `anova.ccabymargin`

Not sure if we can do much about this now or in the future, but I'm recording it here in case we update something in the future?

The following example illustrates a scoping bug/infelicity that, ideally, we'd be able to handle in vegan

library("vegan")
data(dune, dune.env)
foo <- function(x, env) {
    m <- rda(x ~ Manure + A1, data = env)
    anova(m, by = "margin")
}
out <- lapply(dune, foo, env = dune.env)

which fails with:

> out <- lapply(dune, foo, env = dune.env)
Error in eval(expr, envir, enclos) : object 'x' not found

the traceback() is:

> traceback()
18: eval(expr, envir, enclos)
17: eval(expr, p)
16: eval.parent(specdata, n = envdepth)
15: ordiParseFormula(formula, data, na.action = na.action, subset = substitute(subset))
14: rda.formula(formula = x ~ A1, data = env)
13: rda(formula = x ~ A1, data = env)
12: eval(expr, envir, enclos)
11: eval(call, parent.frame())
10: update.default(object, paste(".~.-", nm))
9: update(object, paste(".~.-", nm))
8: permutest(update(object, paste(".~.-", nm)), permutations, ...)
7: FUN(c("Manure", "A1")[[1L]], ...)
6: lapply(trmlab, function(nm, ...) permutest(update(object, paste(".~.-", 
       nm)), permutations, ...), ...)
5: anova.ccabymargin(object, permutations = permutations, model = model, 
       parallel = parallel, scope = scope)
4: anova.cca(m, by = "margin")
3: anova(m, by = "margin") at #3
2: FUN(X[[1L]], ...)
1: lapply(dune, foo, env = dune.env)

The obvious workaround is

outs <- vector(mode = "list", length = NCOL(dune))
for (i in seq_len(NCOL(dune))) {
    resp <- dune[, i, drop = FALSE]
    m <- rda(resp ~ Manure + A1, data = dune.env)
    outs[[i]] <- anova(m, by = "margin")
}

or, retaining lapply():

bar <- function(i, env) {
    m <- rda(dune[,i] ~ Manure + A1, data = env)
    anova(m, by = "margin")
}
out <- lapply(seq_len(NCOL(dune)), bar, env = dune.env)

`scores` method returns only two site-axes by default, should return all

The documentation for the scores method choices argument states that, "If missing, default method returns all axes." However, recently, I've found that it is actually defaulting to only the first two axes. I would prefer the stated default because otherwise I will need to add an additional line accessing some aspect of the cca object in order to determine (at least) the number of axes I should request explicitly in a choices argument. This is somewhat antithetical to (one of) the goal(s) of scores, which is to abstract the action of accessing the axes values of an ordination result from the internal structure of various ordination objects.

Reproducible example: Perform correspondence analysis on dune

library(vegan)
data(dune)
dune.cca <- cca(dune)

Attempt to get all site scores using default

scores(dune.cca, display="sites")

But this just returns a 2-column matrix.

One undesirable work-around that I can use, for now, is to dig into the CA$eig element and ask how long that vector is:

scores(dune.cca, choices=1:length(dune.cca$CA$eig), display="sites")

However, this won't work if the object is the result of a constrained correspondence analysis, because in that case there should be CCA axes as well, and so the hack would need to be extended. Is there an obvious replacement / alternative? Is this a bug, and the current documentation is describing the expected (but missing) behavior?

Bug in p-value calculations in adonis

The p-value calculated by adonis is sometimes wrong.
For example, the following code

library("vegan")
  options(digits = 12)
  g = as.factor(c("Control","Control","Group1", "Group2", "Group2"))
  d = as.dist(matrix(c(0,3,7,2,1, 3,0,5,4,1, 7,5,0,2,6, 4,2,4,0,2, 1,1,6,2,0), ncol=5))
  pp = adonis(d ~ g)
  pp

returns


Call:
adonis(formula = d ~ g) 

Permutation: free
Number of permutations: 120

Terms added sequentially (first to last)

          Df SumsOfSqs MeanSqs     F.Model           R2  Pr(>F)
g          2      23.3   11.65 3.584615385 0.7818791946 0.20833
Residuals  2       6.5    3.25             0.2181208054        
Total      4      29.8                     1.0000000000

The p-value is wrong (only 24 permutations with high f instead of 39). The correct p-value can be computed from the definition to be

(rowSums(t(pp$f.perms >= pp$aov.tab$F.Model[1]))+1) / (length(pp$f.perms) + 1)

which correctly returns

[1] 0.333333333333

It looks like the bug is due to rounding. When comparing f.perms and F.mod, the value of f.perms is rounded but the value of F.mod is not (although it is rounded afterwards). This makes the >= comparison fail in this example because the value of f.perms is rounded down.

diversity accumulation models do not use standard permutation API

We claim in latest NEWS that all permutation based functions use a unified API based on the permute package or alternatively take a permutation matrix. This is not true of the following functions: specaccum, poolaccum, estaccumR, renyiaccum and tsallisaccum. They all accept only a single integer which is used as the number of permutations in simple non-restricted permutation with sample(). This is documented, but still inconsistent, and should be changed. The question (apart from time to do this) is whether there ever is a need for restricted permutation in these models, or should we just change the argument name and use simple permutations.

Another aspect is that these functions can potentially benefit from parallel processing. I have only looked at poolaccum which is lightning fast even now, and estaccumR which is really slow and could be parallelized (and probably also sped up within estimateR.default).

Problem with a conditional test in rarecurve()

This showed up whilst checking the current github master branch of vegan. I don't have time to look at this just now, but we're seeing a lot of warnings like this

Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used

indicating that sample or minsample (probably the former given the object names) is of length > 1. The full Example check log output for this example is shown below

> ### Name: diversity
> ### Title: Ecological Diversity Indices and Rarefaction Species Richness
> ### Aliases: diversity rarefy rrarefy drarefy rarecurve fisher.alpha
> ###   specnumber
> ### Keywords: univar
> 
> ### ** Examples
> 
> data(BCI)
> H <- diversity(BCI)
> simp <- diversity(BCI, "simpson")
> invsimp <- diversity(BCI, "inv")
> ## Unbiased Simpson of Hurlbert 1971 (eq. 5):
> unbias.simp <- rarefy(BCI, 2) - 1
> ## Fisher alpha
> alpha <- fisher.alpha(BCI)
> ## Plot all
> pairs(cbind(H, simp, invsimp, unbias.simp, alpha), pch="+", col="blue")
> ## Species richness (S) and Pielou's evenness (J):
> S <- specnumber(BCI) ## rowSums(BCI > 0) does the same...
> J <- H/log(S)
> ## beta diversity defined as gamma/alpha - 1:
> data(dune)
> data(dune.env)
> alpha <- with(dune.env, tapply(specnumber(dune), Management, mean))
> gamma <- with(dune.env, specnumber(dune, Management))
> gamma/alpha - 1
       BF        HF        NM        SF 
0.5483871 0.6666667 1.6250000 1.2909091 
> ## Rarefaction
> (raremax <- min(rowSums(BCI)))
[1] 340
> Srare <- rarefy(BCI, raremax)
> plot(S, Srare, xlab = "Observed No. of Species", ylab = "Rarefied No. of Species")
> abline(0, 1)
> rarecurve(BCI, step = 20, sample = raremax, col = "blue", cex = 0.6)
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used
Warning in if (sample > minsample) warning(gettextf("sample = %d is larger than smallest site maximum (%d)",  :
  the condition has length > 1 and only the first element will be used

suggests/imports contradictions, and tcltk in particular

Vegan currently suggests two functions in its DESCRIPTION: tcltk and parallel. Simultaneously vegan imports their namespace in its NAMESPACE file. The idea of suggestion is that the package is not essential to vegan, but is only needed in special functions and loaded (by require()) only when needed. However, the package namespace is loaded with import. This means that loading of vegan will fail if the package is not available -- which is against the idea of only suggesting the package that is not necessarily needed.

We only suggest and import recommended and base packages and we have assumed this can be done safely. However, because all namespace imports are needed, we should not list those packages as suggests but as imports in DESCRIPTION. We had a specific discussion on tcltk when we split off vegan3d, but as tcltk is a base package, we assumed it is safe to have in vegan. However, tcltk is dependent on external system utilities (Tcl/Tk) that may not exist and having tcltk is a configurable option in R. If R has been compiled without tcltk capability, installing current vegan will fail:

** preparing package for lazy loading
Warning: S3 methods ‘as.character.tclObj’, ‘as.character.tclVar’, ‘as.double.tclObj’, 
‘as.integer.tclObj’, ‘as.logical.tclObj’, ‘as.raw.tclObj’, ‘print.tclObj’, ‘[[.tclArray’, ‘[[<-.tclArray’, 
‘$.tclArray’, ‘$<-.tclArray’, ‘names.tclArray’, ‘names<-.tclArray’, ‘length.tclArray’, ‘length<-.tclArray’, 
‘tclObj.tclVar’, ‘tclObj<-.tclVar’, ‘tclvalue.default’, ‘tclvalue.tclObj’, ‘tclvalue.tclVar’, ‘tclvalue<-.default’, 
‘tclvalue<-.tclVar’, ‘close.tkProgressBar’ were declared in NAMESPACE but not found
Error : .onLoad failed in loadNamespace() for 'tcltk', details:
  call: fun(libname, pkgname)
  error: Tcl/Tk support is not available on this system
ERROR: lazy loading failed for package ‘vegan’
* removing ‘/home/jarioksa/R/x86_64-unknown-linux-gnu-library/3.2/vegan’
* restoring previous ‘/home/jarioksa/R/x86_64-unknown-linux-gnu-library/3.2/vegan’

If you manage to get through the installation process, you will get the same error with library(vegan). This should not happen with a suggested package.

I will suggest a PR #85 that will fix this issue. It will remove import(tcltk) from the NAMESPACE and prefix every Tcl/Tk command with tcltk::. I tried to avoid this when making the changes for new namespace rules because there are 126 tcltk:: commands, and changing all these was boring and required several iterations to seek all needed cases.

parallel is another suggested package. It is a recommended package, and a lean R can be built without recommended packages. I don't expect such a lean R to contain vegan so that we may pretty safely assume that it has recommended packages if it has vegan. However, there is now a contradiction between DESCRIPTION (which suggests parallel) and NAMESPACE (which compulsorily imports parallel) and this could be changed. Either we

change DESCRIPTION to import parallel and remove require(parallel) in eleven functions (because it will be loaded anyway), or
only suggest parallel, but remove import(parallel) from NAMESPACE and seek and prefix all parallel commands with parallel:: in functions.

Currently we import selectively from cluster, mgcv and MASS which means that these packages must be available to install and load vegan. All are recommended packages, but could be only suggested. I think cluster::daisy() would be the easiest case in three files. We use mgcv only in ordisurf, but there prefixing as mgcv::gam(y ~ mgcv::s()) looks ugly. MASS usage is more dispersed, but could also be replaced with MASS:: prefixed commands.

Import te() from package mgcv

PR #32 imported only s() from mgcv's namespace. We should import te() too, if only to be explicit about what we want to us.

function bioenvdist() seems to be missing

I ran into this problem when trying to extract some information from bioenv() object.
Replicated here with code copied from manual pdf.

require(vegan)
data(varespec)
data(varechem)
sol <- bioenv(wisconsin(varespec) ~ log(N) + P + K + Ca + pH + Al, varechem)

sol

bioenvdist(sol)

Error: could not find function "bioenvdist"

vegandevs / vegan Goto Github PK

vegan's Introduction

vegan: an R package for community ecologists

Installation

Using remotes

Installing binaries from R Universe

vegan's People

Contributors

Stargazers

Watchers

Forkers

vegan's Issues

The dune data set is not the most ideal, but since it is used for illustration purposes in the package, we will use it here to show plots that would be good to add to the package

Load other libraries

Plot the marginals for both Species and Site

Purpose is to examine the difference in abundances across Species and Sites. Any with numbers too small might be better to remove for analysis

Use a boxplot on the raw scale.

Show as a barchart of counts across sites

Use a boxplot on the raw scale.

Show as a barchart of counts across species

Use a fluctuation diagram to show species by count

The purpose is to see species that are prevalent everywhere, vs specialist, sites which only host particular species, vs sites where most species can be found. Looking at associations between sites and species. These plots can help to digest and diagnose the MDS results

subsetted using :an.t.m[,c("SAND..SND.","CORAL..C.", "SEAGRASS..SG.", "ALGAL.ASSEMBLAGE..AA.")]

Recommend Projects

Recommend Topics

Recommend Org