Git Product home page Git Product logo

grunwaldlab / poppr Goto Github PK

View Code? Open in Web Editor NEW
67.0 13.0 26.0 160.7 MB

đŸŒ¶ An R package for genetic analysis of populations with mixed (clonal/sexual) reproduction

Home Page: https://grunwaldlab.github.io/poppr

Makefile 0.13% R 79.81% C 16.22% Shell 0.09% TeX 3.72% CSS 0.03%
r multilocus-genotypes genetic-distances population-genetics multilocus-lineages genetic-analysis populations minimum-spanning-networks clonality

poppr's Introduction

Poppr version 2

R-CMD-check

What is poppr?

Poppr is an R package designed for analysis of populations with mixed modes of sexual and clonal reproduction. It is built around the framework of adegenet's genind and genlight objects and offers the following implementations:

  • clone censoring of populations at any of multiple levels of a hierarchy
  • convenient counting of multilocus genotypes and sub-setting of populations with multiple levels of hierarchy
  • define multilocus genotypes
  • calculation of indices of genotypic diversity, evenness, richness, and rarefaction
  • drawing of dendrograms with bootstrap support for genetic distances
  • drawing of minimum spanning networks for genetic distances
  • calculation of the index of association (Index of association) or (Standardized index of association)
  • batch processing on any server that has R (≄ 2.15.1) installed
  • calculation of Bruvo's distance for microsatellite (SSR) markers (implemented in C for speed)
  • import of data from and export to GenAlEx

New in version 2.0:

  • handling of genomic SNP data
  • custom multilocus genotype definitions
  • collapse multilocus lineages by genetic distance
  • calculate reticulate minimum spanning networks
  • calculate index of association in a sliding window across snps
  • bootstrapping of MLG diversity statistics
  • interactive exploration of minimum spanning networks
  • and more!

For full details, see the NEWS file or type in your R console:

news(Version >= "2.0.0", package = "poppr")

Citation

If you use poppr at all, please specify the version and cite:

Kamvar ZN, Tabima JF, GrĂŒnwald NJ. (2014) Poppr: an R package for genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction. PeerJ 2:e281 https://doi.org/10.7717/peerj.281

If you use poppr in a presentation please mention it as the poppr R package, specify the version, and use our logo: (png) | (svg).

Additionally, if you use any following functionalities:

  • minimum spanning networks with reticulation
  • collapsing multilocus genotypes into multilocus lineages with mlg.filter()
  • custom multilocus genotype definitions with mlg.custom()
  • index of association for genomic data with win.ia() or samp.ia()
  • bootstrapping any genetic distance with genind, genlight, or genpop objects with aboot()

Please also cite:

Kamvar ZN, Brooks JC and GrĂŒnwald NJ (2015) Novel R tools for analysis of genome-wide population genetic data with emphasis on clonality. Front. Genet. 6:208. doi: 10.3389/fgene.2015.00208

You can obtain citation information in R by typing:

citation(package = "poppr")

Installation

From CRAN

Binary versions for mac and windows are available for R ≄ 2.15.1 here.

To install, make sure R is at least version 2.15.1 (the authors recommend ≄ 3.0), and in your console, type:

install.packages("poppr")

If you want the absolute latest version of poppr, see about installing unreleased versions below.


Stable version

New features are occasionally added to {poppr}, but it can take time for it to get to CRAN. If you know that you want the latest version of {poppr}, (which will contain bug fixes and new features to be included in future releases), then you can use the custom R-Universe repository, which is updated hourly: https://zkamvar.r-universe.dev/builds

To install poppr from the R-Universe, you can use the following code:

universe <- c("https://zkamvar.r-universe.dev", "https://cloud.r-project.org")
install.packages("poppr", repos = universe)

The universe repository also contain up-to-date versions of {adegenet} and {hierfstat}, which are commonly used in conjunction with {poppr} and are notoriously out of date on CRAN.

Unstable/Development versions

All Development versions of {poppr} will be on GitHub, but need to be compiled.

To install this package from github, make sure you have the following:

For Linux users, make sure that the function getOption("unzip") returns "unzip" or "internal". If it does not, then run options(unzip = "internal").

Once you have {remotes} and a C compiler installed, you can use the install_github() function to install the current version from github.

All new features in testing will be released on different branches. These features will be in various stages of development and may or may not be documented. Install with caution. The below command would install features on the branch called "devel". Note that these branches might be out of date from the main branch. Note: if you don't have LaTeX installed, you should set build_vignettes = FALSE.

remotes::install_github(repo = "grunwaldlab/poppr@devel", build_vignettes = TRUE)
library("poppr")

Help / Documentation

R documentation

To access a descriptive index of help files in poppr, type in your console:

package?poppr

Vignettes

A few vignettes have been written for poppr:

Title Command
Algorightms and Equations vignette("algo", "poppr")
Data import and manipulation vignette("poppr_manual", "poppr")
Multilocus Genotype Analysis vignette("mlg", "poppr")

User Group

Users who have any questions/comments/suggestions regarding any version of poppr (stable or development) should direct their comments to the Poppr google group

Book/Primer

In Spring of 2014, Dr. Niklaus J. GrĂŒnwald, Dr. Sydney E. Everhart and Zhian N. Kamvar wrote a primer for population genetic analysis in R located at https://grunwaldlab.github.io/Population_Genetics_in_R.

Contributing

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms. If you wish to contribute code to poppr, please fork the repository and create a pull request with your added feature.

poppr's People

Contributors

brookjon avatar davefol avatar fdchevalier avatar grunwald avatar jimhester avatar jonahbrooks avatar kant avatar katrinleinweber avatar klausvigo avatar knausb avatar robchoudhury avatar tabima avatar zachary-foster avatar zkamvar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

poppr's Issues

[.table method in R-devel breaks poppr.amova

This is an issue that will not affect most of the user base at the moment, but will do so when R releases a new version:

From Kurt Hornik:

Dear maintainer,

Current r-devel has added a [.table method which by default preserves
the table class in "most" cases.

This breaks your package with

amova.result <- poppr.amova(agc, ~Pop/Subpop)

No missing values detected.
Warning in Ops.factor(left, right) : ‘<’ not meaningful for factors
Warning in Ops.factor(left, right) : ‘<’ not meaningful for factors
Error in if (any(samples < 0)) stop("Negative value in samples") : 
 missing value where TRUE/FALSE needed
Calls: poppr.amova -> <Anonymous>

I think the problem is that in mlg.table() you have

 mlgtab <- mlgtab[, which(colSums(mlgtab) > 0)]
 return(mlgtab)

where subscripting used to drop the table class, but no longer does, and
so

 xtab    <- t(mlg.table(x, bar = FALSE, quiet = TRUE, mlgsub = unique(mlg.vector(x))))
 xtab    <- as.data.frame(xtab)

ends up calling as.data.frame.table().

To fix, use e.g.

 xtab    <- as.data.frame(unclass(xtab))

I think.

Can you pls fix this as necessary, and upload a new version as quickly
as possible?

Alongside, pls also fix

The Title field should be in title case, current version then in title case:
‘Genetic Analysis of Populations With Mixed Reproduction’
‘Genetic Analysis of Populations with Mixed Reproduction’

and single quote 'adegenet' in the Description.

Best
-k

Add 2.0-rc tag after merge

This will allow compatibility for the link in the frontiers paper.

  • merge 2.0-rc into master and devel
  • remove 2.0-rc branch
  • change README and NEWS
  • tag as 2.0-rc so that link in frontiers paper still works and make pre-release.
  • manually change version and submit to CRAN
  • tag version once it's accepted on CRAN.

Pairwise AMOVA

GenAlex generates pairwise PhiPT values for any AMOVA analysis run with more than one between group comparison. We have many between group comparisons to make. Is there an easy way using Poppr to generate something similar to what GenAlex produces? Please add pairwise_amova

Thanks!
S

Include tests to be skipped on CRAN

The package testthat has a function called skip_on_cran() that will skip tests on CRAN to save run time. This will allow the construction of a bare set of tests to make sure the package works and an expanded set to increase code coverage to test for caveats.

Add names to colors in *.msn functions

Currently the colors of the pies will be represented by unnamed character vectors of hexadecimal colors. This vector of colors is used as a lookup table when the color palette needs to be changed, but the assumption is that all colors are different. In the case that not all colors are different, the names of the populations are important at differentiating the colors. If the vector resulting from the color palette is named with the population names, then the colors in the pie.colors will be named as well resulting in an easier lookup.

example here can add the additional line:

names(color) <- [email protected]

This should be changed for the functions bruvo.msn and poppr.msn and the functions below update_poppr_graph should also be updated.

Explicitly document model choice options in Bruvo's distance

Bruvo's distance currently has the options to turn on and off the genome addition and genome loss models, but does not explicitly state what happens when both are on and both are off. This should be documented in all bruvo functions as well as the Algorithms and Equations vignette.

Implement index selection (filtering) in bitwise.IA

bitwise.IA accepts as input a vector of locus indices to be used in its calculations. Currently this vector is ignored. The final version should be able to compute the index of association over this limited set of loci.

Add support for custom MLG definitions

Create a slot that allows users to customize MLG definitions such that they would be able to increase or decrease MLG diversity based on their knowledge of the system.

Allow text connections for read.genalex

read.genalex currently reads in a file from disc formatted as a genalex csv. Unfortunately, it does not work well on connection objects. To alleviate this, the argument skip = 2 on this line in read.genalex should be changed to skip = ifelse("connection" %in% class(genalex), 0, 2), which would allow connections to be read in thusly:

library(poppr)
data(partial_clone)
pc <- file()
genind2genalex(partial_clone, filename = pc)
partial_clone2 <- read.genalex(pc)
close(pc)

This would make it easier to create reproducible examples.

Single locus genalex files cannot be imported

This post in the poppr forum showed that a single locus data set could not be imported. Here's a reproducible example.

# Note, this is a subset of the monpop data set, which is haploid. I'm treating it as a diploid here
zz <- "1    6   1   6
7_09_BB         
Ind Pop CHMFc4  CHMFc5
A004    7_09_BB 224 85
A002    7_09_BB 224 97
A011    7_09_BB 224 97
A009    7_09_BB 224 97
A006    7_09_BB 224 97
A013    7_09_BB 224 97"

read.genalex(textConnection(z), sep = "\t")
 Error in gena2[, ((x - 1)/ploidy) + 1] <<- pop_combiner(gena, hier = x:(x +  : 
  incorrect number of subscripts on matrix 

The reason for this error was because of our old friend drop = TRUE being default for subsetting matrices. I am scouring the code and setting drop = FALSE everywhere.

Bug in permutation scheme with triploids

There is a bug in the default shuffling algorithm for shufflepop and ia that comes down to numerical precision. The general process is:

  1. Take the sum of the columns (alleles) at a locus and multiply them by the ploidy to get the counts of each allele.
  2. Create a vector of indices of alleles where each index is replicated the number of times that allele is represented.
  3. Use that to shuffle the locus.

The problem occurs when there are triploids and 1/3 by 3 doesn't always equal 1.

mat <- structure(c(0.333333333333333, 0.333333333333333, 0.333333333333333, 
0.333333333333333, 0, 0, 0.333333333333333, 0, 0.666666666666667, 
0.666666666666667, 0.333333333333333, 0.666666666666667), .Dim = c(4L, 
3L), .Dimnames = list(structure(c("1", "2", "3", "5"), .Names = c("1", 
"2", "3", "4")), c("D13.000", "D13.144", "D13.190")))

bucket <- colSums(mat) * 3
bucket
## D13.000 D13.144 D13.190
##       4       1       7

rep(1:length(bucket), bucket)
## [1]11112333333

print(bucket, dig = 20)
## D13.000 D13.144 D13.190
## 4.0000000000000000000 1.0000000000000000000 6.9999999999999991118

A control scheme to check for the correct number of alleles is needed.

Allow named vector of repeat lengths for Bruvo's distance

Bruvo's distance requires knowledge of repeat lengths for calculations. The current method has the user supply a numeric vector with repeat lengths, but they must be in the exact order as the loci. A nice feature to have would be to have the option to submit named repeat lengths and match them to the loci.

add function to recalcuate MLGs

If the user has a genclone object, the MLGs are static from when the data is read in to allow for identification downstream. There is no obvious way to reset those MLGs currently. It would be good to provide a small function that would do this.

add backwards compatibility for hierarchy methods in version 2.0

Since the hierarchy methods were moved to adegenet 2.0, all of them have changed names slightly to account for the preferred style (camelCase in adegenet). To make the transition smoother, the hierarchy functions from poppr should be kept, but should only serve as wrappers to the methods in adegenet (note that both the assignment and functional methods need to have replacers):

poppr version adegenet version
sethierarchy strata
gethierarchy strata
splithierarchy splitStrata
addhierarchy addStrata
namehierarchy nameStrata

add skip_on_cran() to test-filter.R

I got a lovely email from BDR today:

On Jul 18, 2015, at 03:45 , Prof Brian Ripley [email protected] wrote:

See https://cran.r-project.org/web/checks/check_results_poppr.html . There are several separate errors, but for fedora-gcc it seems you are testing floating-point equality which the manual warns you not to do.

Please fix ASAP.

Brian D. Ripley, [email protected]
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

Because I'm already testing on coveralls, I think I can safely add skip_on_cran() to that particular test. Since I will also be uploading the version that fixes #47, I will increase the version number hopefully the last time.

Allow poppr.amova to take subsetted genclone objects

Currently poppr.amova partitions the mlg table by numerical index at this line. In the genclone objects, unfortunately, this index doesn't work as some MLGs might be excluded. The error produced is this:

Error in xtab[unique(mlg.vector(x)), ] : subscript out of bounds

To get around this, mlg.table should be called with the flag mlgsub = unique(mlg.vector(x, quiet = TRUE)), where x is a genclone object. This will automatically sort the MLG table to prevent this line from being needed.

poppr breaking with new adegenet on CRAN

CRAN's maintainer has reported the following issue caused by the new version of adegenet on CRAN:

** preparing package for lazy loading
Error in setIs(Class, class2, classDef = classDef, where = where) :
  class “genclone” cannot extend class “genind”: slot in class “genclone” must extend corresponding slot in class “genind”: fails for hierarchy
Error in setClass("genclone", contains = "genind", representation = representation(mlg = "numeric",  :
  error in contained classes ("genind") for class “genclone”; class definition removed from ‘poppr’
Error : unable to load R code in package ‘poppr’
ERROR: lazy loading failed for package ‘poppr’

User model choice being ignored in bruvo.msn

Current workaround solution: calculate bruvo's distance with bruvo.dist() and then feed it into poppr.msn()

Problem: all calls to bruvo.dist() from bruvo.msn() and poppr:::singlepop_msn() do not give the add or loss parameters, thus defaulting to average over genome loss and genome addition models, despite the user choice.

I will fix these in poppr's current devel branch and then make a pull request for @JonahBrooks so that he can incorporate it into his code.

Memory leak with valgrind

I haven't gotten an email yet, but there's a memtest note on CRAN. It's coming from bitwise.c, which makes me think it might be a threadding issue :(

==21510== Memcheck, a memory error detector
==21510== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==21510== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==21510== Command: /data/blackswan/ripley/R/R-devel-vg/bin/exec/R -f test-all.R --restore --save --no-readline --vanilla
==21510== 

R Under development (unstable) (2015-07-03 r68625) -- "Unsuffered Consequences"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(testthat)
> test_check("poppr")
Loading required package: poppr
Loading required package: adegenet
Loading required package: ade4

   /// adegenet 2.0.0 is loaded ////////////

   > overview: '?adegenet'
   > tutorials/doc/questions: 'adegenetWeb()' 
   > bug reports/feature resquests: adegenetIssues()


This is poppr version 2.0.0. To get started, type package?poppr
==21510== Invalid read of size 4
==21510==    at 0x1BD52E0F: bitwise_distance_haploid._omp_fn.0 (packages/tests-vg/poppr/src/bitwise_distance.c:232)
==21510==    by 0x35D0809BBE: GOMP_parallel (in /usr/lib64/libgomp.so.1.0.0)
==21510==    by 0x1BD50F08: bitwise_distance_haploid (packages/tests-vg/poppr/src/bitwise_distance.c:175)
==21510==    by 0x479A92: do_dotcall (svn/R-devel/src/main/dotcode.c:1251)
==21510==    by 0x4B1C3E: Rf_eval (svn/R-devel/src/main/eval.c:655)
==21510==    by 0x4B4C35: do_set (svn/R-devel/src/main/eval.c:2106)
==21510==    by 0x4B1AB0: Rf_eval (svn/R-devel/src/main/eval.c:627)
==21510==    by 0x4B36C8: do_begin (svn/R-devel/src/main/eval.c:1716)
==21510==    by 0x4B1AB0: Rf_eval (svn/R-devel/src/main/eval.c:627)
==21510==    by 0x4B1AB0: Rf_eval (svn/R-devel/src/main/eval.c:627)
==21510==    by 0x4B36C8: do_begin (svn/R-devel/src/main/eval.c:1716)
==21510==    by 0x4B1AB0: Rf_eval (svn/R-devel/src/main/eval.c:627)
==21510==  Address 0x1ce20844 is 772 bytes inside a block of size 1,968 alloc'd
==21510==    at 0x4A06BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==21510==    by 0x4DE26D: GetNewPage (svn/R-devel/src/main/memory.c:865)
==21510==    by 0x4DFD71: Rf_allocVector3 (svn/R-devel/src/main/memory.c:2437)
==21510==    by 0x541032: Rf_allocVector (svn/R-devel/src/include/Rinlinedfuns.h:194)
==21510==    by 0x541032: HashLookup (svn/R-devel/src/main/unique.c:801)
==21510==    by 0x542C10: match5 (svn/R-devel/src/main/unique.c:902)
==21510==    by 0x4A3DFA: bcEval (svn/R-devel/src/main/eval.c:5524)
==21510==    by 0x4B178F: Rf_eval (svn/R-devel/src/main/eval.c:558)
==21510==    by 0x4B6DBC: Rf_applyClosure (svn/R-devel/src/main/eval.c:1039)
==21510==    by 0x4AA831: bcEval (svn/R-devel/src/main/eval.c:5496)
==21510==    by 0x4B178F: Rf_eval (svn/R-devel/src/main/eval.c:558)
==21510==    by 0x4B6DBC: Rf_applyClosure (svn/R-devel/src/main/eval.c:1039)
==21510==    by 0x4AA831: bcEval (svn/R-devel/src/main/eval.c:5496)
==21510== 
==21510== Thread 2:
==21510== Invalid read of size 4
==21510==    at 0x1BD52EE0: bitwise_distance_haploid._omp_fn.0 (packages/tests-vg/poppr/src/bitwise_distance.c:248)
==21510==    by 0x35D080DB65: ??? (in /usr/lib64/libgomp.so.1.0.0)
==21510==    by 0x35B5607529: start_thread (in /usr/lib64/libpthread-2.20.so)
==21510==    by 0x35B4F0022C: clone (in /usr/lib64/libc-2.20.so)
==21510==  Address 0x1ce20844 is 772 bytes inside a block of size 1,968 alloc'd
==21510==    at 0x4A06BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==21510==    by 0x4DE26D: GetNewPage (svn/R-devel/src/main/memory.c:865)
==21510==    by 0x4DFD71: Rf_allocVector3 (svn/R-devel/src/main/memory.c:2437)
==21510==    by 0x541032: Rf_allocVector (svn/R-devel/src/include/Rinlinedfuns.h:194)
==21510==    by 0x541032: HashLookup (svn/R-devel/src/main/unique.c:801)
==21510==    by 0x542C10: match5 (svn/R-devel/src/main/unique.c:902)
==21510==    by 0x4A3DFA: bcEval (svn/R-devel/src/main/eval.c:5524)
==21510==    by 0x4B178F: Rf_eval (svn/R-devel/src/main/eval.c:558)
==21510==    by 0x4B6DBC: Rf_applyClosure (svn/R-devel/src/main/eval.c:1039)
==21510==    by 0x4AA831: bcEval (svn/R-devel/src/main/eval.c:5496)
==21510==    by 0x4B178F: Rf_eval (svn/R-devel/src/main/eval.c:558)
==21510==    by 0x4B6DBC: Rf_applyClosure (svn/R-devel/src/main/eval.c:1039)
==21510==    by 0x4AA831: bcEval (svn/R-devel/src/main/eval.c:5496)
==21510== 
==21510== Thread 1:
==21510== Invalid read of size 4
==21510==    at 0x1BD5331F: bitwise_distance_diploid._omp_fn.1 (packages/tests-vg/poppr/src/bitwise_distance.c:439)
==21510==    by 0x35D0809BBE: GOMP_parallel (in /usr/lib64/libgomp.so.1.0.0)
==21510==    by 0x1BD5125F: bitwise_distance_diploid (packages/tests-vg/poppr/src/bitwise_distance.c:374)
==21510==    by 0x479A92: do_dotcall (svn/R-devel/src/main/dotcode.c:1251)
==21510==    by 0x4B1C3E: Rf_eval (svn/R-devel/src/main/eval.c:655)
==21510==    by 0x4B4C35: do_set (svn/R-devel/src/main/eval.c:2106)
==21510==    by 0x4B1AB0: Rf_eval (svn/R-devel/src/main/eval.c:627)
==21510==    by 0x4B36C8: do_begin (svn/R-devel/src/main/eval.c:1716)
==21510==    by 0x4B1AB0: Rf_eval (svn/R-devel/src/main/eval.c:627)
==21510==    by 0x4B1AB0: Rf_eval (svn/R-devel/src/main/eval.c:627)
==21510==    by 0x4B36C8: do_begin (svn/R-devel/src/main/eval.c:1716)
==21510==    by 0x4B1AB0: Rf_eval (svn/R-devel/src/main/eval.c:627)
==21510==  Address 0x1d27d254 is 724 bytes inside a block of size 1,968 alloc'd
==21510==    at 0x4A06BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==21510==    by 0x4DE26D: GetNewPage (svn/R-devel/src/main/memory.c:865)
==21510==    by 0x4DFD71: Rf_allocVector3 (svn/R-devel/src/main/memory.c:2437)
==21510==    by 0x446671: Rf_allocVector (svn/R-devel/src/include/Rinlinedfuns.h:194)
==21510==    by 0x446671: do_nzchar (svn/R-devel/src/main/character.c:123)
==21510==    by 0x4A3DFA: bcEval (svn/R-devel/src/main/eval.c:5524)
==21510==    by 0x4B178F: Rf_eval (svn/R-devel/src/main/eval.c:558)
==21510==    by 0x4B6DBC: Rf_applyClosure (svn/R-devel/src/main/eval.c:1039)
==21510==    by 0x4AA831: bcEval (svn/R-devel/src/main/eval.c:5496)
==21510==    by 0x4B178F: Rf_eval (svn/R-devel/src/main/eval.c:558)
==21510==    by 0x4B6DBC: Rf_applyClosure (svn/R-devel/src/main/eval.c:1039)
==21510==    by 0x4AA831: bcEval (svn/R-devel/src/main/eval.c:5496)
==21510==    by 0x4B178F: Rf_eval (svn/R-devel/src/main/eval.c:558)
==21510== 
==21510== Thread 2:
==21510== Invalid read of size 4
==21510==    at 0x1BD533F0: bitwise_distance_diploid._omp_fn.1 (packages/tests-vg/poppr/src/bitwise_distance.c:455)
==21510==    by 0x35D080DB65: ??? (in /usr/lib64/libgomp.so.1.0.0)
==21510==    by 0x35B5607529: start_thread (in /usr/lib64/libpthread-2.20.so)
==21510==    by 0x35B4F0022C: clone (in /usr/lib64/libc-2.20.so)
==21510==  Address 0x1d27d254 is 724 bytes inside a block of size 1,968 alloc'd
==21510==    at 0x4A06BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==21510==    by 0x4DE26D: GetNewPage (svn/R-devel/src/main/memory.c:865)
==21510==    by 0x4DFD71: Rf_allocVector3 (svn/R-devel/src/main/memory.c:2437)
==21510==    by 0x446671: Rf_allocVector (svn/R-devel/src/include/Rinlinedfuns.h:194)
==21510==    by 0x446671: do_nzchar (svn/R-devel/src/main/character.c:123)
==21510==    by 0x4A3DFA: bcEval (svn/R-devel/src/main/eval.c:5524)
==21510==    by 0x4B178F: Rf_eval (svn/R-devel/src/main/eval.c:558)
==21510==    by 0x4B6DBC: Rf_applyClosure (svn/R-devel/src/main/eval.c:1039)
==21510==    by 0x4AA831: bcEval (svn/R-devel/src/main/eval.c:5496)
==21510==    by 0x4B178F: Rf_eval (svn/R-devel/src/main/eval.c:558)
==21510==    by 0x4B6DBC: Rf_applyClosure (svn/R-devel/src/main/eval.c:1039)
==21510==    by 0x4AA831: bcEval (svn/R-devel/src/main/eval.c:5496)
==21510==    by 0x4B178F: Rf_eval (svn/R-devel/src/main/eval.c:558)
==21510== 
Loading required package: parallel
testthat results ================================================================
OK: 221 SKIPPED: 52 FAILED: 0
> 
> proc.time()
   user  system elapsed 
385.387  78.932 391.750 
==21510== 
==21510== HEAP SUMMARY:
==21510==     in use at exit: 237,692,259 bytes in 121,847 blocks
==21510==   total heap usage: 814,885 allocs, 693,038 frees, 1,242,042,234 bytes allocated
==21510== 
==21510== Thread 1:
==21510== 18,352 bytes in 31 blocks are possibly lost in loss record 4,689 of 5,786
==21510==    at 0x4A08946: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==21510==    by 0x35B46129E4: _dl_allocate_tls (in /usr/lib64/ld-2.20.so)
==21510==    by 0x35B5607EC7: pthread_create@@GLIBC_2.2.5 (in /usr/lib64/libpthread-2.20.so)
==21510==    by 0x35D080E0B6: ??? (in /usr/lib64/libgomp.so.1.0.0)
==21510==    by 0x35D0809BB9: GOMP_parallel (in /usr/lib64/libgomp.so.1.0.0)
==21510==    by 0x1BD50F08: bitwise_distance_haploid (packages/tests-vg/poppr/src/bitwise_distance.c:175)
==21510==    by 0x479A92: do_dotcall (svn/R-devel/src/main/dotcode.c:1251)
==21510==    by 0x4B1C3E: Rf_eval (svn/R-devel/src/main/eval.c:655)
==21510==    by 0x4B4C35: do_set (svn/R-devel/src/main/eval.c:2106)
==21510==    by 0x4B1AB0: Rf_eval (svn/R-devel/src/main/eval.c:627)
==21510==    by 0x4B36C8: do_begin (svn/R-devel/src/main/eval.c:1716)
==21510==    by 0x4B1AB0: Rf_eval (svn/R-devel/src/main/eval.c:627)
==21510== 
==21510== 60,656 bytes in 3,791 blocks are definitely lost in loss record 5,303 of 5,786
==21510==    at 0x4A08946: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==21510==    by 0x4D9140: R_chk_calloc (svn/R-devel/src/main/memory.c:3186)
==21510==    by 0x1BD55344: bruvo_dist (packages/tests-vg/poppr/src/poppr_distance.c:425)
==21510==    by 0x1BD56050: bruvo_distance (packages/tests-vg/poppr/src/poppr_distance.c:296)
==21510==    by 0x479A92: do_dotcall (svn/R-devel/src/main/dotcode.c:1251)
==21510==    by 0x4B1C3E: Rf_eval (svn/R-devel/src/main/eval.c:655)
==21510==    by 0x4B4C35: do_set (svn/R-devel/src/main/eval.c:2106)
==21510==    by 0x4B1AB0: Rf_eval (svn/R-devel/src/main/eval.c:627)
==21510==    by 0x4B36C8: do_begin (svn/R-devel/src/main/eval.c:1716)
==21510==    by 0x4B1AB0: Rf_eval (svn/R-devel/src/main/eval.c:627)
==21510==    by 0x4B6DBC: Rf_applyClosure (svn/R-devel/src/main/eval.c:1039)
==21510==    by 0x4B18E8: Rf_eval (svn/R-devel/src/main/eval.c:674)
==21510== 
==21510== 68,128 (34,064 direct, 34,064 indirect) bytes in 2,129 blocks are definitely lost in loss record 5,347 of 5,786
==21510==    at 0x4A08946: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==21510==    by 0x4D9140: R_chk_calloc (svn/R-devel/src/main/memory.c:3186)
==21510==    by 0x1BD55356: bruvo_dist (packages/tests-vg/poppr/src/poppr_distance.c:432)
==21510==    by 0x1BD56050: bruvo_distance (packages/tests-vg/poppr/src/poppr_distance.c:296)
==21510==    by 0x479A92: do_dotcall (svn/R-devel/src/main/dotcode.c:1251)
==21510==    by 0x4B1C3E: Rf_eval (svn/R-devel/src/main/eval.c:655)
==21510==    by 0x4B4C35: do_set (svn/R-devel/src/main/eval.c:2106)
==21510==    by 0x4B1AB0: Rf_eval (svn/R-devel/src/main/eval.c:627)
==21510==    by 0x4B36C8: do_begin (svn/R-devel/src/main/eval.c:1716)
==21510==    by 0x4B1AB0: Rf_eval (svn/R-devel/src/main/eval.c:627)
==21510==    by 0x4B6DBC: Rf_applyClosure (svn/R-devel/src/main/eval.c:1039)
==21510==    by 0x4B18E8: Rf_eval (svn/R-devel/src/main/eval.c:674)
==21510== 
==21510== LEAK SUMMARY:
==21510==    definitely lost: 94,720 bytes in 5,920 blocks
==21510==    indirectly lost: 34,064 bytes in 4,258 blocks
==21510==      possibly lost: 18,352 bytes in 31 blocks
==21510==    still reachable: 237,545,123 bytes in 111,638 blocks
==21510==         suppressed: 0 bytes in 0 blocks
==21510== Reachable blocks (those to which a pointer was found) are not shown.
==21510== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==21510== 
==21510== For counts of detected and suppressed errors, rerun with: -v
==21510== ERROR SUMMARY: 11 errors from 7 contexts (suppressed: 0 from 0)

Integrate bootstrap confidence intervals for diversity stats

Introduction

The poppr command returns a table that contains diversity summary statistics and linkage summary statistics. Currently, the only summary statistics that give a level of confidence are rarefaction (eMLG) and linkage (Ia and rbarD). It only makes sense to provide confidence intervals or standard errors for the other diversity statistics present in the diversity table.

Current implementation

A working example can be found in this bit of code. It performs bootstrapping using the boot package and returns confidence intervals. Since it uses the boot package, it has inherent multicore functionality.

Tasks

  • integrate bootstrapping for summary statistics
  • allow users to select confidence intervals or standard errors
  • add option to keep or remove analyses from the table
  • document function

Fix definition of Hexp

Currently, Hexp, produced by the function poppr is cited as Nei's expected heterozygosity (Nei 1978). Equation 2 in this paper translates to the implementation here if you consider x to be a genotype instead of an allele. How this came to be is a mystery. This is actually an unbiased measure of genotypic diversity representing the probability of drawing two genotypes from a population and having them be different. It's implementation in the function locus_table is accurate.

Tasks:

  • Change citation
  • Add average expected heterozygosity (equation 3 in Nei 1978) to poppr output
  • Fix equation in "algo" reference manual.

UBSAN shows a memory-access error

After submission of poppr version 1.1.0 to CRAN, a notification was sent from Dr. Ripley notifying me of an error in UBSAN while running tests:

Analytical value tests : poppr_distance.c:664:5: runtime error: index 1 out of bounds for type 'int [zerocatch[miss_ind]]'

This error stems from illegal memory allocation. I will sanitize the code and post a patch.

implement pgen and psex

The current implementation is internal and slightly broken. It would be good to implement this even if it's only in R.

This will be closed when the following tasks are complete:

  • function to estimate round-robin allele frequencies is in place
  • pgen works for genind objects
  • code is documented
  • tests are written
  • Stacy Krueger-Hadfield and Erik Sotka are added as contributors on the functions and DESCRIPTION
  • add NEWS
  • add functions to package man page
  • add functions to main vignette

Text encoding errors

Several files have UTF-8 em-dashes and directional quotes.

From Dr. Ripley:

There are also issues when checking in a Latin-1 locale on Linux:

  • checking whether package 'poppr' can be installed ... [10s/11s] WARNING
    Found the following significant warnings:
    Warning: unable to re-encode 'amova.r' line 175
    Warning: unable to re-encode 'distances.r' lines 158, 161, 167, 171, 178, 181, 187, 191, 197, 200
    Warning: unable to re-encode 'internal.r' lines 96, 111, 112
    Warning: unable to re-encode 'poppr.R' lines 65, 67, 68, 69, 70, 71, 76, 96

use stored function/object name as a memory for mlg.filter

Currently, mlg.filter has a default behavior of using nei.dist to calculate the distance matrix. If the user utilizes mlg.filter as an assignment method, currently @distname slot in the object carries the name of the object or the function and the @distargs slot contains the arguments for that function.

In this way there are a couple of procedures that could happen for this.

  1. check if @distname is a function or a distance matrix:
is.function(eval(x@mlg@distname))
is.dist(eval(x@mlg@distname))
  1. if @distname is a function, use do.call
do.call(eval(x@mlg@distname), c(x, x@mlg@distargs))

Parallelize bitwise.IA

bitwise.IA was designed with multi-threading in mind, but the functionality hasn't been incorporated into the function yet. It needs to be added before this function can be considered complete.

Hexp calcuation is erroneous

The problem:

After re-reading Nei (1978), I realized that my implementation of Hexp is:

N/(N - 1) * 1 - sum(p^2)

Where N is the number of allelic states and p is the vector of allele frequencies. Nei's definition is:

kn/(kn - 1) * 1 - sum(p^2)

Where n is the number of observed samples at a locus and k is the ploidy (to account for dosage).

User-facing impacts:

  • poppr()
  • locus_table()

What needs to be fixed:

  • internal function locus_table_pegas()

Impacts after fix:

  • Polyploids - Because the calculation is dependent on ploidy, this means that it would inappropriate to calculate this for polyploids due to ambiguous dosage.
  • locus_table(type = "genotype") - Hexp will change to unbiased simpson's index.

Unfortunately, I might have to wait until just before August to submit this patch, lest I anger our CRAN overlords.

Fix poppr for upcoming release of adegenet/pegas/hierfstat

The adegenet package is going through major changes on github and many functions in poppr will break due to these changes.

Three main things will change:

  1. the tab slot of genind objects will store data as integers as opposed to floats
  2. the ploidy slot will contain the ploidy for all samples
  3. genind and genlight objects will gain genclone's hierarchy slot

Changes that need to be implemented in poppr:

  • remove hierarchy slots from genclone objects as they will be inherited
  • remove hierarchy methods
  • change all references to $tab and @tab to tab().
  • change na.replace(object) in missingno to tab(object) and deprecate.
  • ensure that distance methods are compatible.
  • change SEXP pairdiffs to take integer input.
  • change SEXP permute_shuff to take integer input.
  • add tab method to bootgen objects.

This is not an exhaustive list as I know bugs will crop up as I encounter them, so I'm putting this here:

  • WRITE A TEST FOR EVERY BUG ENCOUNTERED

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.