Git Product home page Git Product logo

dartr's People

Contributors

biomatix avatar carlopacioni avatar green-striped-gecko avatar jdyen avatar mijangos81 avatar olivroy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dartr's Issues

gl2related deleteld

deleted gl2related.

it was not working well anyway and installation of the whole package was stopped because of that.

now deprecated until better version exists.

Bug in gl.filter.callrate -- subscript out of bounds

gl.filter.callrate(testset.gl, method="ind", t=0.95)
Reporting for a genlight object
Note: Missing values most commonly arise from restriction site mutation.
Initial no. of individuals = 250

?ui=2&ik=c826be832d&view=fimg&th=15b1d3647fb96366&attid=0 Show Traceback
Rerun with Debug
Error in x@gen[[1]] : subscript out of bounds

Filter for missing data per population

Hi Arthur and Bernd,

I am trying to use the DIYABC software and they require an input file which has been filtered for loci with missing values per population.

The error message I received was: "Loci 314 in population Bredbo has only missing values. This is not allowed. Please remove this locus from your data file."

Using dartR, I had filtered my dataset by call rate for loci and individuals but I was wondering if there is a way I can filter for call rate based on populations in dartR. If not, could you please suggest a possible workaround?

Thank you,

Yael

Pierre Feutry: Error installing

Error message below.
Any idea how to fix this? Cheers
Pierre

  • installing source package ‘dartR’ ...
    ** R
    ** data
    *** moving datasets to lazyload DB
    ** inst
    ** preparing package for lazy loading
    Warning: namespace ‘DBI’ is not available and has been replaced
    by .GlobalEnv when processing object ‘testset.gl’
    Warning: namespace ‘DBI’ is not available and has been replaced
    by .GlobalEnv when processing object ‘testset.gl’
    Warning: namespace ‘DBI’ is not available and has been replaced
    by .GlobalEnv when processing object ‘testset.gl’
    Warning: namespace ‘DBI’ is not available and has been replaced
    by .GlobalEnv when processing object ‘testset.gl’
    Warning: namespace ‘DBI’ is not available and has been replaced
    by .GlobalEnv when processing object ‘testset.gl’
    Error : .onLoad failed in loadNamespace() for 'rgl', details:
    call: dyn.load(file, DLLpath = DLLpath, ...)
    error: unable to load shared object '/Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgl/libs/rgl.so':
    dlopen(/Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgl/libs/rgl.so, 6): Library not loaded: /opt/X11/lib/libGLU.1.dylib
    Referenced from: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgl/libs/rgl.so
    Reason: image not found
    ERROR: lazy loading failed for package ‘dartR’
  • removing ‘/Library/Frameworks/R.framework/Versions/3.4/Resources/library/dartR’
    Installation failed: Command failed (1)

gl.report.monomorphs value incorrect

I think I may have found a bug in gl.report.monomorphs.
When running gl.report.monomorphs after running gl.filter.monomorphs on a genlight object the report indicates that there are still monomorphs present in the object.
Looking at the code in gl.report.monomorphs, it appears that when a loci has only 1 (heterozygote) values, or a combination of 1 and NA, (ie. no 0 or 2 present) it is being counted as monomorphic:
line 50: c[i] <- all(xmat[,i]==1,na.rm=TRUE)
which is different to the code in gl.filter.monomorphs used to determine monomorphs:
line 46: a[i] <- all(xmat[,i]==0,na.rm=TRUE) || all(xmat[,i]==2,na.rm=TRUE)
I have used a genlight object and excel to test this and can provide files if required.
Regards
Rob

adding color and shape options to gl.pcoa.plot

Given how frequent modifying color and shapes seems to appear in the dartR google group it is probably worth adding the options to gl.pcoa.plot()

There are three ways to do this with varying levels of flexibility and implementation cost.

  1. add scale_color/shape_manual to the current function, I did this for a previous project and the code is here: https://github.com/Maschette/redartR/blob/master/plot_pcoa.R it adds the options col and shape to the current function so these can be changed. It does also set the default theme to theme_bw because that's what I needed for a publication.

  2. add a new function which returns a ggplot object using ggbuilder which can be edited to modify a range of the ggplot settings. @raymondben had a first pass of this which I forked here: https://github.com/Maschette/redartR

  3. rewrite the function implementing the ggplot2 style guide (https://ggplot2.tidyverse.org/dev/articles/ggplot2-in-packages.html#referring-to-ggplot2-functions) in combination with ggbuilder to give a more flexible function. The advantage of this being people would be able to implement things such as:
    gl.pcoa.plot(glPca, gl)+scale_color_manual(values=...) to change things.

My recommendation for the short term would be implement 1 and explore 2-3.

Filter heterozygosity

Hi DARTr team,

Are we able to exclude loci based on heterozygosity?

How can we estimate heterozygosity across individuals per population?

Thanks, Jenny

Vignette: Add in a section on reporting and filtering on Linkage Disequilibrium

The Vignette does not currently have a section on analysis of Linkage disequilibrium. The vignette needs to provide advice on this issue (single population, sample size etc) and then how to report on departure from linkage equilibrium (with and without bonferonni correction) and then how to filter out all but one SNP in a linkage group.

Add ctb names

Olly Berry, Jason Bragg, Peter J. Unmack, Aaron T Adamack

gl.filter.hamming removes information from gl@other$loc.metrics (rdepth missing)

Hi,

I noticed that the gl.filter.hamming command removes information from gl@other$loc.metrics, when I went through the analysis.
I wanted to look at the final read depth of the SNPs that I had retained and compare it to the average read depth at the beginning, but I noticed that it was missing from the loc metrics after the hamming filter step. Basically its replaced with the MAF in the loc metrics.

I assume this information is just dropped then? Is there some way to retain it when using the command?

Thanks

dartR - Installation problems using Ubuntu

Hi,
I have recently started to use Linux (I'm still not very familiar with it) and I am having problems to install dartR. I'm using RStudio and when I try to follow the recommended steps I received the following message:

"package dartR is not available (for R version 3.2.3)"

I checked for updates of RStudio and it says that I am using the newest version. Therefore I am not sure how to fix this problem. Do you have any recommendations?

Thanks!!

ind.metrics File -- independent of order of OTUs

Currently the order of the individuals in the ind.metrics file needs to match the order in the DArT input file. Need to make it so that the order of the individuals in the ind.metrics file does not matter, while retaining the checks for all individuals in the DArT file present in the ind.metrics file, and vice versa.

gl.filter.repavg -- not working

Hi DARTR team,

The tutorial says "CloneID is essential (with its very special format), and dartR scripts for loading your data sets will terminate with an error message if this is not present."

I have DART data without 'CloneID' field. The data loaded OK into a genlight object. The console said
.
.
Try to add covariate file: xxx_2018_metadata.csv .
Ids of covariate file (at least a subset of) are matching!
Found 147 matching ids out of 147 ids provided in the covariate file. Subsetting snps now!.
Added pop factor.
Please note:there is no lat column
Please note:there is no lon column
Added id to the other$ind.metrics slot.
Added pop to the other$ind.metrics slot.
Warning message:
In .local(.Object, ...) :
Miss-formed strings in loc.all (must be e.g. 'c/g') - storing this argument in @other.

Could the lack of CloneID be causing gl.filter.repavg to return "[1] NA" when I request number of loci (nLoc) after using filter? The number of individuals (nInd) is correct after filtering.

I duplicated the AlleleID column and called it CloneID in the DART.csv but this didn't allow the filter to proceed.

Any ideas?

Thanks, Jenny

Outliers for downstream analysis

Hi,

I've run the gl.outflank and was able to produce a report on the outliers in my dataset of 40,746 SNPs. 303 loci were flagged as outliers. I'd like to now subset my data into outliers and non-outliers to run downstream analyses (e.g. PCA, etc.).

However, I've been unable to figure out how to pull out those 303 loci to run downstream analyses. Is this function already available, or do you have recommendations on how to do it? If it can't be done to the gl object, I'm assuming there is a way to add to the info of a vcf file, but it is beyond my abilities. Any advice would be much appreciated.

Please let me know if I need to clarify anything or provide further information. Thanks in advance for your help!

Renee Catullo: Add an option 5 to gl2fasta

Most phylogenetic methods that analyse SNPs (e.g. IQtree/SNAPP) function better if there are no constant sites. These programs define "constant" as no individual being homozygous for the minor allele. So it would be great to have option 5, which is option 3 but with no "constant" sites. IQtree fully rejects datasets with these SNPS, even though they are very useful for popgen.

gl.report.ld crash

Hi Bernd & Arthur,

I struggle to get the gl.report.ld function to work as it always crashes at some point. It is also not clear to me what the command is to restart the function. I tried to just use the same command as the previous, but it seems it starts from the beginning not the last completed chunk. I got 3 chunks, how to restart using the chunks?
Finally, in an old post I found the function gl.filter.ld, it seems it is gone in the current version. Is there a way to filter based on ld in dartR?

ld_rep <- gl.report.ld(gl, save = TRUE, nchunks = 4, name = ld_test, ncores = 16, chunkname = NULL, probar = TRUE)

The gl is relative big, but working on a machine with 16 cores and 128GB ram it shouldn't be too much of a problem I guess.

/// GENLIGHT OBJECT /////////

// 257 genotypes, 24,010 binary SNPs, size: 39 Mb
80052 (1.3 %) missing data

// Basic content
@gen: list of 257 SNPbin
@ploidy: ploidy of each individual (range: 2-2)

// Optional content
@ind.names: 257 individual labels
@loc.names: 24010 locus labels
@loc.all: 24010 alleles
@position: integer storing positions of the SNPs
@pop: population of each individual (group size range: 17-101)
@other: a list containing: loc.metrics latlong ind.metrics

sessioninfo::session_info()

Session info
setting value
version R version 3.5.2 (2018-12-20)
os Ubuntu 14.04.6 LTS
system x86_64, linux-gnu
ui RStudio
language (EN)
collate en_NZ.UTF-8
ctype en_NZ.UTF-8
tz Pacific/Auckland
date 2019-03-27

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
ade4 * 1.7-13 2018-08-31 [1] CRAN (R 3.5.2)
adegenet * 2.1.1 2018-02-02 [1] CRAN (R 3.5.2)
ape 5.3 2019-03-17 [1] CRAN (R 3.5.2)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.2)
backports 1.1.3 2018-12-14 [1] CRAN (R 3.5.2)
boot 1.3-20 2017-07-30 [4] CRAN (R 3.5.0)
broom 0.5.1 2018-12-05 [1] CRAN (R 3.5.2)
calibrate 1.7.2 2013-09-10 [1] CRAN (R 3.5.2)
callr 3.2.0 2019-03-15 [1] CRAN (R 3.5.2)
class 7.3-15 2019-01-01 [4] CRAN (R 3.5.2)
classInt 0.3-1 2018-12-18 [1] CRAN (R 3.5.2)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.2)
cluster 2.0.7-1 2018-04-09 [4] CRAN (R 3.5.0)
coda 0.19-2 2018-10-08 [1] CRAN (R 3.5.2)
codetools 0.2-16 2018-12-24 [4] CRAN (R 3.5.2)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.5.2)
combinat 0.0-8 2012-10-29 [1] CRAN (R 3.5.2)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.2)
crosstalk 1.0.0 2016-12-21 [1] CRAN (R 3.5.2)
dartR * 1.2.0 2019-03-25 [1] Github (10b4b5e)
data.table 1.12.0 2019-01-13 [1] CRAN (R 3.5.2)

Thanks,
Flo

Bug in gl.pcoa.plot -- no visible binding for global variable 'pt.labels'

gl.pcoa.plot: no visible binding for global variable 'pt.labels'
gl.percent.freq: no visible binding for global variable 'snp'

Those are ones which are not easy to fix for me (as I need your head, what kind of labels you wanted to supply etc.)

Can you check I think pt.labels are nowhere defind and therefore the error….

Join the read dart and dart2gl scripts into one gl.read.dart.r

There needs to be one script to take the DArT csv files (1-row or 2-row) and convert to a genlight object -- gl.read.dart(). At present, the user must read the DArT file in and then convert to a genlight object as a two step process. Leave the two scripts without export.

Vector Size cannot be NA?

Hey Bernd,

Having some trouble with the vector.

fmat <- as.matrix(ggbfmales)[father,]

fhet <- sum(fmat==1)
father.half <- ifelse(fmat==1, sample(c(0,2), fhet, replace = T), fmat)

Error in sample.int(length(x), size, replace, prob) :
vector size cannot be NA

Where am I going wrong here?

gl.diversity inoperable on Linux

Hi Bernd,

I am using a fresh install of R and RStudio on a brand new computer with Pop!_OS 18.04, and I noticed that when computing Shannon Diversity, the function outputs straight 1s for every metric. I have not had the same issue on my Mac, even with the same version of R and dartR. The remainder of dartR appears to function as normal, that is filtering and other functions produce identical outputs to my Mac. I have also used gl.diversity on Ubuntu/Pop!_OS before without any issue, though that was quite a while ago. I am using the same dataset that I provided you in 2018 to help write the function. Any ideas? Thanks in advance for helping with this.

gl.outflank settings

Hi Bernd & Arthur,
I was just running gl.outflank and realised that one of the settings "LeftTrimFraction" I'd written as "eftTrimFraction" - but its didn't give me any error. Is that because you've built in some cunning typo-tolerance, or its not reading that setting? Cheers, Olly

gl2treemix problems opening outfile

Hi there,

I was very excited to try out the new script for converting a gl to a treemix file but it doesn't seem to have worked. It took a long time to process (20812 snps in the gl) and it created a .gz file as intended but when I try to open the file, it says "does not appear to be a valid archive". The size of the .gz file is 1385KB - does that seem right?
I was a bit confused by the description of the gl2treemix function itself as it says "The file needs to be gzipped before it will be recognised by treemix." but the outfile we need to specify has to be xxx.gz so is this being done within the script?

Apologies if this is a silly question - I am new to this.

Thanks,

Yael

Outlier analysis

Hi ArthR and grubR,

For your digestion - Some things I've encountered and wasn't sure they needed tweaking or I just wasn't using them correctly:

  1. When I output results from the Outflank analysis (i.e. the spreadsheet with the He, Fst, outlier flags etc) I see that both alleles at each locus are present in the table and the total number of outlier loci identified and reported in the table is this number - i.e. 2 x the number of loci. Is there a reason for this? The stats for each allele are of course identical.
  2. I was interested in pulling out the sequences of the outlier loci so I could blast them for homology to genes of know function (long shot I know). I thought I'd take the list of outliers from dartR/Outflank and then use that as a lookup table in excel to pull out the sequences (sorry - I use excel). The thing is the locus names in the dart raw data file (CloneID) have a different format (e.g. 13451614|F|0-43:G>A-43:G>A) to the locus names in the genlight file (e.g. 13469410-11292-A/G.A). It seems as though the allele names have lost some of their content (i.e. their position [43 in the above example]) and gained a unique number that is probably their number in the sequence of loci in the whole dataset. This makes it unclear whether I'm looking at the same locus in the dart and genlight file because as you know there can be multiple snps with the same starting number in their CloneID. Have I missed how dartR/adegenet renames loci, or is that information still accessible?
  3. I'm sure you would do the above in a more elegant way than using Excel. Perhaps that is a suggestion for an addition to dartR - to enable pulling out the trimmed sequences after an outlier analysis so that they can be used in downstream analysis like blasting etc.

Adios,

Olly

Problems with gl2svdquartets

Hi,

I'm trying to use gl2svdquartets but I keep getting the following error message:

gl2svdquartets(gl, outfile="svd.nex", outpath=getwd(), method=1)
Starting gl2svdquartets: Create nexus file
Extacting SNP bases and creating records for each individual

Error in strsplit(snp, ">") : non-character argument

I'm converting my vcf file to genlight object using vcfR2genlight, and then using it to convert to the svdquartets format. Do have any suggestion about what can be causing this error message?

Thank you,
Ana

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.