green-striped-gecko / dartr Goto Github PK
View Code? Open in Web Editor NEWImporting and Analysing DArT type snp and silicodart data
License: GNU General Public License v3.0
Importing and Analysing DArT type snp and silicodart data
License: GNU General Public License v3.0
Automate the test examples script, to ensure all examples work with testset.gl
deleted gl2related.
it was not working well anyway and installation of the whole package was stopped because of that.
now deprecated until better version exists.
gl.filter.callrate(testset.gl, method="ind", t=0.95)
Reporting for a genlight object
Note: Missing values most commonly arise from restriction site mutation.
Initial no. of individuals = 250
?ui=2&ik=c826be832d&view=fimg&th=15b1d3647fb96366&attid=0 Show Traceback
Rerun with Debug
Error in x@gen[[1]] : subscript out of bounds
Hi Arthur and Bernd,
I am trying to use the DIYABC software and they require an input file which has been filtered for loci with missing values per population.
The error message I received was: "Loci 314 in population Bredbo has only missing values. This is not allowed. Please remove this locus from your data file."
Using dartR, I had filtered my dataset by call rate for loci and individuals but I was wondering if there is a way I can filter for call rate based on populations in dartR. If not, could you please suggest a possible workaround?
Thank you,
Yael
Output fastA records differ in length. Rarely, records are one base short. Suggest looking at script behaviour when SNPosition = 1 or SNPosition is the last base of the seqeunce tag. @green-striped-gecko
even if only individuals are used for labels
Change the name to gl.filter.secondaries to avoid confusion re duplicated sequence tags.
Error message below.
Any idea how to fix this? Cheers
Pierre
I think I may have found a bug in gl.report.monomorphs.
When running gl.report.monomorphs after running gl.filter.monomorphs on a genlight object the report indicates that there are still monomorphs present in the object.
Looking at the code in gl.report.monomorphs, it appears that when a loci has only 1 (heterozygote) values, or a combination of 1 and NA, (ie. no 0 or 2 present) it is being counted as monomorphic:
line 50: c[i] <- all(xmat[,i]==1,na.rm=TRUE)
which is different to the code in gl.filter.monomorphs used to determine monomorphs:
line 46: a[i] <- all(xmat[,i]==0,na.rm=TRUE) || all(xmat[,i]==2,na.rm=TRUE)
I have used a genlight object and excel to test this and can provide files if required.
Regards
Rob
Given how frequent modifying color and shapes seems to appear in the dartR google group it is probably worth adding the options to gl.pcoa.plot()
There are three ways to do this with varying levels of flexibility and implementation cost.
add scale_color/shape_manual to the current function, I did this for a previous project and the code is here: https://github.com/Maschette/redartR/blob/master/plot_pcoa.R it adds the options col and shape to the current function so these can be changed. It does also set the default theme to theme_bw because that's what I needed for a publication.
add a new function which returns a ggplot object using ggbuilder
which can be edited to modify a range of the ggplot settings. @raymondben had a first pass of this which I forked here: https://github.com/Maschette/redartR
rewrite the function implementing the ggplot2 style guide (https://ggplot2.tidyverse.org/dev/articles/ggplot2-in-packages.html#referring-to-ggplot2-functions) in combination with ggbuilder
to give a more flexible function. The advantage of this being people would be able to implement things such as:
gl.pcoa.plot(glPca, gl)+scale_color_manual(values=...)
to change things.
My recommendation for the short term would be implement 1 and explore 2-3.
Hi DARTr team,
Are we able to exclude loci based on heterozygosity?
How can we estimate heterozygosity across individuals per population?
Thanks, Jenny
Spaces in OTU names present a range of problems, particularly trailing spaces. We need to remove all spaces from OTU names at the point when the data are input from DArT.
The Vignette does not currently have a section on analysis of Linkage disequilibrium. The vignette needs to provide advice on this issue (single population, sample size etc) and then how to report on departure from linkage equilibrium (with and without bonferonni correction) and then how to filter out all but one SNP in a linkage group.
Minumum allele frequency
Olly Berry, Jason Bragg, Peter J. Unmack, Aaron T Adamack
Hi,
I noticed that the gl.filter.hamming
command removes information from gl@other$loc.metrics
, when I went through the analysis.
I wanted to look at the final read depth of the SNPs that I had retained and compare it to the average read depth at the beginning, but I noticed that it was missing from the loc metrics after the hamming filter step. Basically its replaced with the MAF in the loc metrics.
I assume this information is just dropped then? Is there some way to retain it when using the command?
Thanks
Hi,
I have recently started to use Linux (I'm still not very familiar with it) and I am having problems to install dartR. I'm using RStudio and when I try to follow the recommended steps I received the following message:
"package dartR is not available (for R version 3.2.3)"
I checked for updates of RStudio and it says that I am using the newest version. Therefore I am not sure how to fix this problem. Do you have any recommendations?
Thanks!!
Currently the order of the individuals in the ind.metrics file needs to match the order in the DArT input file. Need to make it so that the order of the individuals in the ind.metrics file does not matter, while retaining the checks for all individuals in the DArT file present in the ind.metrics file, and vice versa.
testset.gl does not have trimmed seqeunces in it, so some of the examples will fall over. May need to recreate the testset.gl.
Hi DARTR team,
The tutorial says "CloneID is essential (with its very special format), and dartR scripts for loading your data sets will terminate with an error message if this is not present."
I have DART data without 'CloneID' field. The data loaded OK into a genlight object. The console said
.
.
Try to add covariate file: xxx_2018_metadata.csv .
Ids of covariate file (at least a subset of) are matching!
Found 147 matching ids out of 147 ids provided in the covariate file. Subsetting snps now!.
Added pop factor.
Please note:there is no lat column
Please note:there is no lon column
Added id to the other$ind.metrics slot.
Added pop to the other$ind.metrics slot.
Warning message:
In .local(.Object, ...) :
Miss-formed strings in loc.all (must be e.g. 'c/g') - storing this argument in @other.
Could the lack of CloneID be causing gl.filter.repavg to return "[1] NA" when I request number of loci (nLoc) after using filter? The number of individuals (nInd) is correct after filtering.
I duplicated the AlleleID column and called it CloneID in the DART.csv but this didn't allow the filter to proceed.
Any ideas?
Thanks, Jenny
Homologous fragments are expected to be the same length, so the computations can be sped up by only comparing fragments that are the same length. Implement this change to the scripts.
Currently the reporting includes populations that are amalgamated on the basis of no fixed differences, but it does not list the populations that do not amalgamate. The end report needs to list all surviving OTUs.
Hi,
I've run the gl.outflank and was able to produce a report on the outliers in my dataset of 40,746 SNPs. 303 loci were flagged as outliers. I'd like to now subset my data into outliers and non-outliers to run downstream analyses (e.g. PCA, etc.).
However, I've been unable to figure out how to pull out those 303 loci to run downstream analyses. Is this function already available, or do you have recommendations on how to do it? If it can't be done to the gl object, I'm assuming there is a way to add to the info of a vcf file, but it is beyond my abilities. Any advice would be much appreciated.
Please let me know if I need to clarify anything or provide further information. Thanks in advance for your help!
Most phylogenetic methods that analyse SNPs (e.g. IQtree/SNAPP) function better if there are no constant sites. These programs define "constant" as no individual being homozygous for the minor allele. So it would be great to have option 5, which is option 3 but with no "constant" sites. IQtree fully rejects datasets with these SNPS, even though they are very useful for popgen.
Hi Bernd & Arthur,
I struggle to get the gl.report.ld function to work as it always crashes at some point. It is also not clear to me what the command is to restart the function. I tried to just use the same command as the previous, but it seems it starts from the beginning not the last completed chunk. I got 3 chunks, how to restart using the chunks?
Finally, in an old post I found the function gl.filter.ld, it seems it is gone in the current version. Is there a way to filter based on ld in dartR?
ld_rep <- gl.report.ld(gl, save = TRUE, nchunks = 4, name = ld_test, ncores = 16, chunkname = NULL, probar = TRUE)
The gl is relative big, but working on a machine with 16 cores and 128GB ram it shouldn't be too much of a problem I guess.
/// GENLIGHT OBJECT /////////
// 257 genotypes, 24,010 binary SNPs, size: 39 Mb
80052 (1.3 %) missing data// Basic content
@gen: list of 257 SNPbin
@ploidy: ploidy of each individual (range: 2-2)// Optional content
@ind.names: 257 individual labels
@loc.names: 24010 locus labels
@loc.all: 24010 alleles
@position: integer storing positions of the SNPs
@pop: population of each individual (group size range: 17-101)
@other: a list containing: loc.metrics latlong ind.metrics
sessioninfo::session_info()
Session info
setting value
version R version 3.5.2 (2018-12-20)
os Ubuntu 14.04.6 LTS
system x86_64, linux-gnu
ui RStudio
language (EN)
collate en_NZ.UTF-8
ctype en_NZ.UTF-8
tz Pacific/Auckland
date 2019-03-27─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
ade4 * 1.7-13 2018-08-31 [1] CRAN (R 3.5.2)
adegenet * 2.1.1 2018-02-02 [1] CRAN (R 3.5.2)
ape 5.3 2019-03-17 [1] CRAN (R 3.5.2)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.2)
backports 1.1.3 2018-12-14 [1] CRAN (R 3.5.2)
boot 1.3-20 2017-07-30 [4] CRAN (R 3.5.0)
broom 0.5.1 2018-12-05 [1] CRAN (R 3.5.2)
calibrate 1.7.2 2013-09-10 [1] CRAN (R 3.5.2)
callr 3.2.0 2019-03-15 [1] CRAN (R 3.5.2)
class 7.3-15 2019-01-01 [4] CRAN (R 3.5.2)
classInt 0.3-1 2018-12-18 [1] CRAN (R 3.5.2)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.2)
cluster 2.0.7-1 2018-04-09 [4] CRAN (R 3.5.0)
coda 0.19-2 2018-10-08 [1] CRAN (R 3.5.2)
codetools 0.2-16 2018-12-24 [4] CRAN (R 3.5.2)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.5.2)
combinat 0.0-8 2012-10-29 [1] CRAN (R 3.5.2)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.2)
crosstalk 1.0.0 2016-12-21 [1] CRAN (R 3.5.2)
dartR * 1.2.0 2019-03-25 [1] Github (10b4b5e)
data.table 1.12.0 2019-01-13 [1] CRAN (R 3.5.2)
Thanks,
Flo
Create gl.report.ld
Create gl.filter.ld
Bring gl2gi inside the ld scripts
gl.pcoa.plot: no visible binding for global variable 'pt.labels'
gl.percent.freq: no visible binding for global variable 'snp'
Those are ones which are not easy to fix for me (as I need your head, what kind of labels you wanted to supply etc.)
Can you check I think pt.labels are nowhere defind and therefore the error….
There needs to be one script to take the DArT csv files (1-row or 2-row) and convert to a genlight object -- gl.read.dart(). At present, the user must read the DArT file in and then convert to a genlight object as a two step process. Leave the two scripts without export.
When the script finds two fragments within the trheshold distance, it deletes the first. We need to amend the script so that it deletes the fragment with the lowest PIC | Call Rate | Reproducibility
Hey Bernd,
Having some trouble with the vector.
fmat <- as.matrix(ggbfmales)[father,]
fhet <- sum(fmat==1)
father.half <- ifelse(fmat==1, sample(c(0,2), fhet, replace = T), fmat)
Error in sample.int(length(x), size, replace, prob) :
vector size cannot be NA
Where am I going wrong here?
Hi Bernd,
I am using a fresh install of R and RStudio on a brand new computer with Pop!_OS 18.04, and I noticed that when computing Shannon Diversity, the function outputs straight 1s for every metric. I have not had the same issue on my Mac, even with the same version of R and dartR. The remainder of dartR appears to function as normal, that is filtering and other functions produce identical outputs to my Mac. I have also used gl.diversity on Ubuntu/Pop!_OS before without any issue, though that was quite a while ago. I am using the same dataset that I provided you in 2018 to help write the function. Any ideas? Thanks in advance for helping with this.
Hi Bernd & Arthur,
I was just running gl.outflank and realised that one of the settings "LeftTrimFraction" I'd written as "eftTrimFraction" - but its didn't give me any error. Is that because you've built in some cunning typo-tolerance, or its not reading that setting? Cheers, Olly
currently it is quite tedious to have two repositories (thought it would be easier, but we need to learn to use only one )
Hi there,
I was very excited to try out the new script for converting a gl to a treemix file but it doesn't seem to have worked. It took a long time to process (20812 snps in the gl) and it created a .gz file as intended but when I try to open the file, it says "does not appear to be a valid archive". The size of the .gz file is 1385KB - does that seem right?
I was a bit confused by the description of the gl2treemix function itself as it says "The file needs to be gzipped before it will be recognised by treemix." but the outfile we need to specify has to be xxx.gz so is this being done within the script?
Apologies if this is a silly question - I am new to this.
Thanks,
Yael
Hi DARTr team
Can we calculate the heterozygosity of single individuals (across large number of SNP loci)?
Thanks, Jenny
Hi ArthR and grubR,
For your digestion - Some things I've encountered and wasn't sure they needed tweaking or I just wasn't using them correctly:
Adios,
Hi,
I'm trying to use gl2svdquartets but I keep getting the following error message:
gl2svdquartets(gl, outfile="svd.nex", outpath=getwd(), method=1)
Starting gl2svdquartets: Create nexus file
Extacting SNP bases and creating records for each individual
Error in strsplit(snp, ">") : non-character argument
I'm converting my vcf file to genlight object using vcfR2genlight, and then using it to convert to the svdquartets format. Do have any suggestion about what can be causing this error message?
Thank you,
Ana
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.