waldronlab / cnvranger Goto Github PK
View Code? Open in Web Editor NEWFunctionality for CNV analysis
Home Page: https://bioconductor.org/packages/CNVRanger
Functionality for CNV analysis
Home Page: https://bioconductor.org/packages/CNVRanger
Hi, @lgeistlinger @viniciushdasilva
Perhaps we can rename the package to popRanges
?
What do you think?
Regards,
Marcel
I am contacting you because I am following the CNVRanger manual called "Summarization and quantitative trait analysis of CNV ranges" but I am unable to complete the final GWAS.
The problem is:
segs.pvalue.gr <- cnvGWAS(phen.info, chr.code.name=chr.code.name, method.m.test="none")
Error: logical subscript contains NAs
I have checked all the input files and there are no NA values in any of them.
I follow all the step without any problem, and with your files I can get the final results. I tried to make the GDS and import the LRR/BAF values, but an error ocurr "CNV.gds' has been created or opened."
Do you know which could be the problem?
I hope you can help me, thank you in advance!
Hi! I have used CNVRanger to analysis my data with different phenotype. Now, i have a question about the results. After i run the following code : "segs.pvalue.gr <- cnvGWAS(phen.info, chr.code.name=chr.code.name, method.m.test="none")", I get a the segs.pvalue.gr. However, about the results, i have questions. What the meaning of "SegName", and what can I know the results of CNV is AMP of DEL for corresponding Phenotype?
Thank you very much!
I have been have been trying to use the package CNVRanger.
The package works well except at the :
plotRecurrentRegions(cnvrs, genome="mm10", chr="chr1")
It shoots an error:
Error in .getGenomeFromGRange(range, defaults[["genome"]]) :
A genome must be supplied when creating this object.
The vignette, does not mention supplying genome information before this step.
Does the package not work for mm10?
Any other previous steps give me no error.
Could you please help me out on what might be going on here? As even in the vignette, the organism is mentioned for the first time only at this step.(
plotRecurrentRegions)
I will be grateful for any suggestions.
Thanks for your time
Best,
Yashna Paul
PhD student
DKFZ
Some comments on your .createFolderTree
function:
- always check file.exists
before dir.create
—> otherwise a warning is issued
- avoid redundant code (do case division at lowest possible level)
- unlist(list())
—> c()
- paste
-> file.path
for file paths
- vectorize
- if … else if … else
instead of if ... if ... if
- avoid code nesting … hardly readable —> make atomic statements
An improved implementation could look like this:
.createFolderTree <- function(name, folder=NULL){
# data dir
if(is.null(folder)) folder <- rappdirs::user_data_dir("cnvAnalyzeR")
if(!file.exists(folder)) dir.create(folder, recursive=TRUE)
# project directory
proj.dir <- file.path(folder, name)
if(!file.exists(proj.dir)) dir.create(proj.dir)
# subdirs
dirs <- c("Inputs", "PLINK", "Results")
all.paths <- file.path(proj.dir, dirs)
for(d in all.paths) if(!file.exists(d)) dir.create(d)
return(all.paths)
}
Thank you for making nice tool.
Is there any way for statistical test for binary phenotype, not quantitative phenotype?
I want to compare CNV profiles between samples with phenotype A group and samples with phenotype B group.
Hi @viniciushdasilva - I think it will be good to have a timeline for the migration. Will you be able to move things over until May 18th? That's the deadline for changes included in the upcoming Bioc3.13 release https://bioconductor.org/developers/release-schedule/? After that it will be another 6 months for the package being in the current state, so I think we should make an effort to meet the deadline.
I wonder, how this package describe Loss of Heterogeneity(LOH) ?
R
's own lm
be used instead of plink
(to avoid external dependencies)? Are results the same?VariantExperiment
?Please see https://yihui.name/rd2roxygen/ for a tutorial.
This package is available on CRAN
Hi everyone,
I'm pretty new to CNV ranger, however I was able to follow the vignette for most of the steps until the chapter 7 (CNV-expression association analysis).
I tried many times with my data but was always facing the same error:
Restricting analysis to 24 intersecting samples
Preprocessing RNA-seq data ...
Summarizing per-sample CN state in each CNV region
Excluding 4881 cnvrs not satisfying min.samples threshold
Error in .getStates(cnvrs, calls, multi.calls, min.samples, verbose = verbose) :
No CNV region satisfying min.samples threshold
I also tried to create a subset for testing since my dataset is pretty huge, restricting the analysis to chromosome 1 and 2 with the command:
sel.cnvrs <- subset(cnvrs, seqnames %in% paste0("chr", 1:2))
GRanges object with 4881 ranges and 3 metadata columns:
seqnames ranges strand | freq type pvalue
<Rle> <IRanges> <Rle> | <numeric> <character> <numeric>
[1] chr1 3073253-3074322 * | 3 both 0.911548
[2] chr1 3102016-3102125 * | 3 both 0.911548
[3] chr1 3205901-3671498 * | 3 both 0.753688
[4] chr1 3680155-3681788 * | 3 both 0.911548
[5] chr1 3752010-3754360 * | 3 both 0.911548
... ... ... ... . ... ... ...
[4877] chr2 181724042-181724942 * | 5 gain 0.265666
[4878] chr2 181763332-181827797 * | 5 gain 0.265666
[4879] chr2 181837854-181857461 * | 5 gain 0.265666
[4880] chr2 181864337-181870830 * | 5 gain 0.265666
[4881] chr2 181896823-181898712 * | 5 gain 0.265666
-------
seqinfo: 21 sequences from an unspecified genome; no seqlengths
and also reducing the number of min.samples when running the command:
res <- cnvEQTL(sel.cnvrs, grl, rse, window = "1Mbp", verbose = TRUE, min.samples = 2)
Just for better understanding i will print here my objects that i put inside the cnvEQTL function:
grl
GRangesList object of length 24:
$KPC438
GRanges object with 55235 ranges and 5 metadata columns:
seqnames ranges strand | Mean Gene GeneID state Phenotype
<Rle> <IRanges> <Rle> | <numeric> <character> <character> <numeric> <character>
[1] chr1 3073253-3074322 * | -0.1428 4933401J01Rik ENSMUSG00000102693 2 epithelial
[2] chr1 3102016-3102125 * | -0.1428 Gm26206 ENSMUSG00000064842 2 epithelial
[3] chr1 3205901-3671498 * | -0.1428 Xkr4 ENSMUSG00000051951 2 epithelial
[4] chr1 3252757-3253236 * | -0.1428 Gm18956 ENSMUSG00000102851 2 epithelial
[5] chr1 3365731-3368549 * | -0.1428 Gm37180 ENSMUSG00000103377 2 epithelial
... ... ... ... . ... ... ... ... ...
[55231] chrY 90603501-90605864 * | -1.4738 Gm28300 ENSMUSG00000099619 1 epithelial
[55232] chrY 90665346-90667625 * | -1.4738 Gm28301 ENSMUSG00000099399 1 epithelial
[55233] chrY 90752427-90755467 * | -1.4738 Gm21860 ENSMUSG00000095366 1 epithelial
[55234] chrY 90753057-90763485 * | -1.4738 Mid1-ps1 ENSMUSG00000095134 1 epithelial
[55235] chrY 90784738-90816465 * | -1.4738 Gm47283 ENSMUSG00000096768 1 epithelial
-------
seqinfo: 21 sequences from an unspecified genome; no seqlengths
<23 more elements>
rse
class: RangedSummarizedExperiment
dim: 2963 24
metadata(0):
assays(1): rcounts
rownames(2963): ENSMUSG00000051951 ENSMUSG00000025900 ... ENSMUSG00000038628 ENSMUSG00000098505
rowData names(0):
colnames(24): KPC438 KPC479B ... KPCZ532 KPCZ746
colData names(0):
However the result never changed and I'm always getting the same error.
Before sharing any data i would like to know if any of you ever got the same error as me
Thank you so much in advance
Sessioninfo:
R version 4.3.1 (2023-06-16)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS/LAPACK: /data/research/restools/miniconda/envs/R.4.3.1/lib/libopenblasp-r0.3.24.so; LAPACK version 3.11.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] ComplexHeatmap_2.16.0 EnsDb.Mmusculus.v79_2.99.0 ensembldb_2.24.1
[4] AnnotationFilter_1.24.0 GenomicFeatures_1.52.2 AnnotationDbi_1.62.2
[7] Gviz_1.44.2 SummarizedExperiment_1.30.2 Biobase_2.60.0
[10] MatrixGenerics_1.12.3 matrixStats_1.0.0 BSgenome.Mmusculus.UCSC.mm10_1.4.3
[13] BSgenome_1.68.0 rtracklayer_1.60.1 Biostrings_2.68.1
[16] XVector_0.40.0 regioneR_1.32.0 AnnotationHub_3.8.0
[19] BiocFileCache_2.8.0 dbplyr_2.3.4 tibble_3.2.1
[22] CNVRanger_1.16.5 RaggedExperiment_1.24.2 GenomicRanges_1.52.1
[25] GenomeInfoDb_1.36.4 IRanges_2.34.1 S4Vectors_0.38.2
[28] BiocGenerics_0.46.0
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 shape_1.4.6 rstudioapi_0.15.0 magrittr_2.0.3
[5] rmarkdown_2.25 GlobalOptions_0.1.2 BiocIO_1.10.0 zlibbioc_1.46.0
[9] vctrs_0.6.4 memoise_2.0.1 Rsamtools_2.16.0 RCurl_1.98-1.12
[13] base64enc_0.1-3 htmltools_0.5.6.1 S4Arrays_1.0.6 progress_1.2.2
[17] curl_5.1.0 Formula_1.2-5 htmlwidgets_1.6.2 cachem_1.0.8
[21] GenomicAlignments_1.36.0 iterators_1.0.14 mime_0.12 lifecycle_1.0.3
[25] pkgconfig_2.0.3 Matrix_1.6-1.1 R6_2.5.1 fastmap_1.1.1
[29] clue_0.3-65 GenomeInfoDbData_1.2.10 shiny_1.7.5.1 digest_0.6.33
[33] colorspace_2.1-0 Hmisc_5.1-1 RSQLite_2.3.1 filelock_1.0.2
[37] fansi_1.0.5 httr_1.4.7 abind_1.4-5 compiler_4.3.1
[41] doParallel_1.0.17 bit64_4.0.5 htmlTable_2.4.1 backports_1.4.1
[45] BiocParallel_1.34.2 DBI_1.1.3 biomaRt_2.56.1 rappdirs_0.3.3
[49] DelayedArray_0.26.7 rjson_0.2.21 tools_4.3.1 foreign_0.8-85
[53] interactiveDisplayBase_1.38.0 httpuv_1.6.12 nnet_7.3-19 glue_1.6.2
[57] restfulr_0.0.15 promises_1.2.1 checkmate_2.2.0 cluster_2.1.4
[61] generics_0.1.3 gtable_0.3.4 data.table_1.14.8 hms_1.1.3
[65] xml2_1.3.5 utf8_1.2.4 foreach_1.5.2 BiocVersion_3.17.1
[69] pillar_1.9.0 stringr_1.5.0 limma_3.56.2 later_1.3.1
[73] circlize_0.4.15 dplyr_1.1.3 lattice_0.22-5 deldir_1.0-9
[77] bit_4.0.5 biovizBase_1.48.0 tidyselect_1.2.0 locfit_1.5-9.8
[81] knitr_1.44 gridExtra_2.3 ProtGenerics_1.32.0 edgeR_3.42.4
[85] xfun_0.40 stringi_1.7.12 lazyeval_0.2.2 yaml_2.3.7
[89] evaluate_0.22 codetools_0.2-19 interp_1.1-4 BiocManager_1.30.22
[93] cli_3.6.1 rpart_4.1.21 xtable_1.8-4 munsell_0.5.0
[97] dichromat_2.0-0.1 Rcpp_1.0.11 png_0.1-8 XML_3.99-0.14
[101] parallel_4.3.1 ellipsis_0.3.2 ggplot2_3.4.4 blob_1.2.4
[105] prettyunits_1.2.0 jpeg_0.1-10 latticeExtra_0.6-30 bitops_1.0-7
[109] VariantAnnotation_1.46.0 scales_1.2.1 crayon_1.5.2 GetoptLong_1.0.5
[113] rlang_1.1.1 KEGGREST_1.40.1
Now that we have the core of the package, it's time to start getting familiar with building and installing of the package -- the ultimate test whether you package is working or not.
On the command line:
R CMD build cnvAnalyzeR
--> checks dependencies, builds the vignette, and creates the source tar.gz
R CMD check --no-build-vignettes cnvAnalyzeR_0.99.1.tar.gz
(resulting warnings
and errors
need to be resolved before submission to Bioconductor
)R CMD INSTALL cnvAnalyzeR_0.99.1.tar.gz
(installs the package on your system so that it can be loaded via library
from within R
)Corresponding functionality for building, checking, and installing is also available in devtools
and RStudio
(@LiNk-NY knows more about that).
Responsibilities as discussed:
DESCRIPTION (@lgeistlinger)
NAMESPACE
Looks good. You might want to rename importLRR_BAF to use 'camelCase' so
that it is consistent with other functions. (@viniciushdasilva rename to importLrrBaf
)
R/
Consider providing support for MultiAssayExperiment inputs that contain
appropriate data for running cnvExprAssoc. @lgeistlinger
.largest can be simplified to: @lgeistlinger
.largest <- function(scores, ranges, qranges) {
ind <- IRanges::which.max(GenomicRanges::width(ranges))
vapply(seq_along(scores), function(i) scores[[i]][ind[i]], scores[[1]])
}
Check on your @examples
sections in pheno_assoc.R
.
Can they be directly executed when starting R
and only loading your package via library
?
(This is automatically checked via R CMD check
and running examples for all exported functions are a prerequisite for acceptance in Bioconductor
)
Whenever example data is needed, make sure to place small data slices in our extdata
folder of the package. This data can then be incorporated in your @examples
sections via:
dat <- system.file("extdata/myData.txt", package="cnvAnalyzeR")
# do something
Note that there is a limit in package size:
https://www.bioconductor.org/developers/package-guidelines/#correctness
The source package resulting from running R CMD build should occupy less than 4MB on disk. The package should require less than 5 minutes to run R CMD check --no-build-vignettes. Using the --no-build-vignettes option ensures that the vignette is built only once.
I got error "'seqnames' must have the length of the object to construct (1) or length 1" when running cnvGWAS
.
I recoded the original chromosome names from chr1...chrX, chrY
to 1: 24 as required for integer and provided a chr.code.name
as this
chr.code.name <- data.frame(V1 = c(1:24),V2 = paste0('chr', c(1:22, 'X', 'Y')))
res_comp2 <- cnvGWAS(phen.info, chr.code.name = chr.code.name, method.m.test="BH")
Error in stop_if_wrong_length("'seqnames'", ans_len) :
'seqnames' must have the length of the object to construct (1) or length 1
Do you have some suggestions?
I get this error when trying to run cnvGWAS
:
> segs.pvalue.gr <- cnvGWAS(phen.info, chr.code.name=chr.code.name, method.m.test="none")
Error in stop_if_wrong_length("'seqnames'", ans_len) :
'seqnames' must have the length of the object to construct (1) or length 1
I've been searching on the internet, if someone had encountered the same error, but haven't found anything :C
Here's the pipeline I've been running before this step:
cnv.calls <- read.csv("cnvs.for.CNVRanger.csv", as.is=TRUE, sep = "\t")
cnv.calls <- GenomicRanges::makeGRangesListFromDataFrame(cnv.calls,
split.field="sample.id", keep.extra.columns=TRUE)
sort(cnv.calls)
phen.df <- read.csv("pheno.for.CNVRanger.csv", as.is=TRUE, sep = "\t")
re <- RaggedExperiment::RaggedExperiment(cnv.calls, colData=phen.df)
phen.info <- setupCnvGWAS("project1", phen.loc="pheno.for.CNVRanger.csv",
cnv.out.loc=re, map.loc = NULL)
chr.code.name <- data.frame(
V1 = c(23, 24),
V2 = c("X", "Y"))
segs.pvalue.gr <- cnvGWAS(phen.info, chr.code.name=chr.code.name, method.m.test="none")
Hope, you could help me out with this one :)
Dear Developers,
The functionality of populationRanges() looks very useful for our research, but I got the following problems while trying to build on section "Identifying recurrent regions" from your vignette. Can you help or suggest workarounds?
In addition to Gains/Losses we have CN-LOH type of CNVs. Technically it is "state=2", but it makes sense to include them as a third type of CNVs instead of just removing them. Do you have any suggestion how to do it?
Looking into your source code, populationRanges(..., est.recur = TRUE) calls .estimateRecurrence(pranges, grl) internally, which seems to give you a choice of mode = c("approx", "perm"). Unfortunately populationRanges() itself now has a parameter mode = c("density", "RO"), which makes methods "approx" and "perm" inaccessible.
It seems logical that populationRanges(..., mode = "RO", ro.thresh=0.1, classify.ranges=FALSE, est.recur = TRUE) would estimate recurrence without breaking CNVs into types. At least this is how the meaning of "classify.ranges=FALSE" is explained in your help.
But it will try to use types during .estimateRecurrence() stage anyway and die with "Error in seq_len(len - 1) : argument must be coercible to non-negative integer" even on toy GRangesList example from populationRanges() help.
We would like to process ~15000 CNVs.
But populationRanges(grl.IntOGen, mode = "RO", ro.thresh=0.1, classify.ranges=FALSE, est.recur = TRUE) keep dying with "Error in seq.default(1, length(both.scores) - 1, by = 2) : wrong sign in 'by' argument" even for the simplistic case of 3337 CNVs. We downloaded the list of genes from https://intogen.org/search, adding genomic regions using Bioconductor's Ensembl annotation and fabricated state=3 for ROLE=Act / state=1 for ROLE=LoF. I can upload you the 1.1M .rds file with GRangesList if you need reproducible example.
Thank you in advance,
Daniil Sarkisyan
Dear researcher,
Thank you so much for providing this useful tool!
I have two question about "Summarizing individual CNV calls across a population" in CNVRanger.
(1) Once we finish the step of "Trimming low-density areas", "Reciprocal overlap" and "Identifying recurrent regions", how can I get the list of the remaining ranges with the sample information. Currently, I only have the information on "Seqnames, ranges, strand, freq, type and/or pval" for each range, such as "chr1: 1896112-2004603". But I have no idea which samples these CNV exactly occurs in.
(2) I have got a lot of CNV regions in the population of healthy people and patients with a rare disease. In this case, among the three method for filtration: (1)"Trimming low-density areas", (2)"Reciprocal overlap" and (3)"Identifying recurrent regions", which one do you suggest to use?
Best
Hi:
Thank you very much for sharing a very useful tool, I really appreciate it.
First of all, I followed your instructions (https://bioconductor.org/packages/devel/bioc/vignettes/AnnotationHub/inst/doc/AnnotationHub-HOWTO.html),
and all worked well.
Then, I started to anlyze my data and got two errors.
1.
cnvrs <- populationRanges(grl, density=0.1, est.recur=TRUE)
seq.default(1, length(both.scores) - 1, by = 2) error:
wrong sign in "by" argument
This is sort of weird to me, because the default paremter (est.recur=FALSE) ran without any errors , which is for the "Trimming low-density areas" accourding to your website.
cnvrs <- populationRanges(grl, density=0.1)
On top of that, I have encountered the second error when I performed the below code to visualize the oncoprint.
2.
cnvOncoPrint(grl, cgenes)
normalizeDoubleBracketSubscript(i, x, exact = exact, allow.NA = TRUE, error:
subscript is out of bounds
I am analyzing human data and my session info are as follows.
3.
sessionInfo()
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=Japanese_Japan.utf8 LC_CTYPE=Japanese_Japan.utf8 LC_MONETARY=Japanese_Japan.utf8
[4] LC_NUMERIC=C LC_TIME=Japanese_Japan.utf8
time zone: Asia/Tokyo
tzcode source: internal
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] SummarizedExperiment_1.32.0 MatrixGenerics_1.14.0
[3] matrixStats_1.2.0 BSgenome.Hsapiens.UCSC.hg38_1.4.5
[5] BSgenome_1.70.2 rtracklayer_1.62.0
[7] BiocIO_1.12.0 Biostrings_2.70.3
[9] XVector_0.42.0 regioneR_1.34.0
[11] ensembldb_2.26.0 AnnotationFilter_1.26.0
[13] GenomicFeatures_1.54.4 AnnotationDbi_1.64.1
[15] Biobase_2.62.0 AnnotationHub_3.10.0
[17] BiocFileCache_2.10.1 dbplyr_2.5.0
[19] Gviz_1.46.1 CNVRanger_1.18.1
[21] RaggedExperiment_1.26.0 GenomicRanges_1.54.1
[23] GenomeInfoDb_1.38.8 IRanges_2.36.0
[25] S4Vectors_0.40.2 BiocGenerics_0.48.1
[27] dplyr_1.1.4
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 shape_1.4.6.1 rstudioapi_0.15.0
[4] magrittr_2.0.3 rmarkdown_2.26 GlobalOptions_0.1.2
[7] zlibbioc_1.48.2 vctrs_0.6.5 memoise_2.0.1
[10] Rsamtools_2.18.0 RCurl_1.98-1.14 base64enc_0.1-3
[13] BiocBaseUtils_1.4.0 htmltools_0.5.7 S4Arrays_1.2.1
[16] progress_1.2.3 curl_5.2.1 SparseArray_1.2.4
[19] Formula_1.2-5 htmlwidgets_1.6.4 cachem_1.0.8
[22] GenomicAlignments_1.38.2 iterators_1.0.14 mime_0.12
[25] lifecycle_1.0.4 pkgconfig_2.0.3 Matrix_1.6-1.1
[28] R6_2.5.1 fastmap_1.1.1 clue_0.3-65
[31] GenomeInfoDbData_1.2.11 shiny_1.8.0 digest_0.6.35
[34] colorspace_2.1-0 Hmisc_5.1-2 RSQLite_2.3.5
[37] filelock_1.0.3 fansi_1.0.6 httr_1.4.7
[40] abind_1.4-5 compiler_4.3.2 doParallel_1.0.17
[43] bit64_4.0.5 withr_3.0.0 htmlTable_2.4.2
[46] backports_1.4.1 BiocParallel_1.36.0 DBI_1.2.2
[49] biomaRt_2.58.2 rappdirs_0.3.3 DelayedArray_0.28.0
[52] rjson_0.2.21 tools_4.3.2 foreign_0.8-85
[55] interactiveDisplayBase_1.40.0 httpuv_1.6.14 nnet_7.3-19
[58] glue_1.7.0 restfulr_0.0.15 promises_1.2.1
[61] checkmate_2.3.1 cluster_2.1.4 generics_0.1.3
[64] gtable_0.3.4 data.table_1.15.2 hms_1.1.3
[67] xml2_1.3.6 utf8_1.2.4 foreach_1.5.2
[70] BiocVersion_3.18.1 pillar_1.9.0 stringr_1.5.1
[73] later_1.3.2 circlize_0.4.16 lattice_0.21-9
[76] deldir_2.0-4 bit_4.0.5 biovizBase_1.50.0
[79] tidyselect_1.2.1 ComplexHeatmap_2.18.0 knitr_1.45
[82] gridExtra_2.3 ProtGenerics_1.34.0 xfun_0.42
[85] stringi_1.8.3 lazyeval_0.2.2 yaml_2.3.8
[88] evaluate_0.23 codetools_0.2-19 interp_1.1-6
[91] tibble_3.2.1 BiocManager_1.30.22 cli_3.6.2
[94] rpart_4.1.21 xtable_1.8-4 munsell_0.5.0
[97] dichromat_2.0-0.1 Rcpp_1.0.12 png_0.1-8
[100] XML_3.99-0.16.1 parallel_4.3.2 ellipsis_0.3.2
[103] ggplot2_3.5.0 blob_1.2.4 prettyunits_1.2.0
[106] jpeg_0.1-10 latticeExtra_0.6-30 bitops_1.0-7
[109] VariantAnnotation_1.48.1 scales_1.3.0 purrr_1.0.2
[112] crayon_1.5.2 GetoptLong_1.0.5 rlang_1.1.3
[115] KEGGREST_1.42.0
Thank you in advance!
Hi @lgeistlinger,
In the help page of cnvEQTL
function it reads:
... it is typically recommended to exclude from the analysis (i) genes with fewer than r reads per million reads mapped ...
... Use the min.cpm and min.samples arguments, respectively....
I was not able to find the min.cpm argument. However, it seems that the number of genes in my output was drastically reduced in comparison with my input object (i.e. RangedSummarizedExperiment). Initially I had 23k but only 4k remained in the output.
Could you check on that issue please?
I have used Salmon quantification (https://combine-lab.github.io/salmon/) and not read counts per million. Should we think in a different filter parameter for Salmon quantification?
Best
Hi! I can't find how to add covariate file such as peer,batch...And it's neccessuary to add them,as cnveqtl doesn't support Negative cpm.so it is a good choice to add them in eqtl model but not to use residuals by peer (negative possibly)
if i can receive my answer,I'll appericiate it sincerely!thanks!
Demonstrate interface with oncoplot on a cnvrs x sample matrix.
Instead of CNVRs this could be eg CNV-containing genes - which is likely a frequent use case.
This should make use of RaggedExperiment::qreduceAssay.
Transform CNV calls to a RaggedExperiment as in CNVRanger's vignette, Section 4.3
Use your summarized population ranges (putatively mapped to altered genes as described #17 depending on what the rows of the oncoplot should be) as query regions for RaggedExperiment::qreduceAssay
Apply RaggedExperiment::qreduceAssay as described in RaggedExperiment's vignette, Section 6.4
This will result in a matrix (query regions x samples) that stores CN state in this region and should be ready for oncoplot.
@DarioS moving over from email to issue tracker:
If there are more than two copy number levels for a gene, CNVRanger does an ANOVA-like test, which is a test for differences in means. Is there a way to treat the copy number as a numeric measurement and test for linearity of response? I have copy number estimates from whole genome sequencing and a gene may have copy numbers such as two, four and nine in different samples and I want to know if increasing copy number is associated with increased abundance, rather than just different abundance. I am not interested in analysing distant copy numbers, but the copy number of the gene itself.
Hi,
I was following the vignette to get the association between the CNVs of my samples and the expression and came up with the following error:
res <- cnvEQTL(cnvrs, grl, rse, window = NULL, verbose = F)
Error in .testCnvExpr(y, cgenes[[i]], cnv.states[i, ], min.state.freq = min.samples, :
nr.states > 1 is not TRUE
For what understood this is related with the number of states left after filtering by the argument min.samples. So there need to be more that 1 state with more than "min.samples"(default=10) samples and if all the samples present a certain region the error will occur. This error can be bypassed the problem just setting the argument min.samples = 1, which it doesn't make much sense, or filtering the cnv regions like this: cnvrs <-subset(cnvrs, freq < ncol(rcounts)-min.samples)
It would be nice that you add this extra filtering step in the vignette or maybe it would be better to exclude just the problematic regions or throw a warning instead of an error.
Thanks for the package
From @lgeistlinger on August 8, 2017 17:54
(In CNV analysis, I assume also for similar data types such as somatic mutation), it is often of interest to summarize invidual calls across the
population, (i.e. define CNV regions), for subsequent association analysis with e.g. phenotype data.
https://www.ncbi.nlm.nih.gov/pubmed/19812545
https://www.ncbi.nlm.nih.gov/pubmed/22539667 (Figure 1 for a sketch of the concept)
In the simplest case, this just merges overlapping individual calls into summarized regions.
However, this typically inflates CNVR size and trimming low-density areas
(usually <10% of the total contributing individual calls within a summarized region)
is advisable.
Alternatively, a threshold for reciprocal overlap between any two individual calls
can also be applied.
That means such a function would take a RaggedExperiment as Input and would return
a GRanges object storing the summarized genomic regions.
grl <- GRangesList(
sample1 = GRanges( c("chr1:1-10", "chr2:15-18", "chr2:25-34") ),
sample2 = GRanges( c("chr1:1-10", "chr2:11-18" , "chr2:25-36") ),
sample3 = GRanges( c("chr1:2-11", "chr2:14-18", "chr2:26-36") ),
sample4 = GRanges( c("chr1:1-12", "chr2:18-35" ) ),
sample5 = GRanges( c("chr1:1-12", "chr2:11-17" , "chr2:26-34") ) ,
sample6 = GRanges( c("chr1:1-12", "chr2:12-18" , "chr2:25-35") )
)
re <- RaggedExperiment(grl)
populationRanges(re, density=0.1)
GRanges object with 3 ranges and 0 metadata columns:
seqnames ranges strand
[1] chr1 [ 1, 12] *
[2] chr2 [11, 18] *
[3] chr2 [25, 36] *
populationRanges(re, density=0)
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
[1] chr1 [ 1, 12] *
[2] chr2 [11, 36] *
populationRanges(re, density=1)
GRanges object with 13 ranges and 0 metadata columns:
seqnames ranges strand
[1] chr1 [ 1, 1] *
[2] chr1 [ 2, 10] *
[3] chr1 [11, 11] *
[4] chr1 [12, 12] *
[5] chr2 [11, 13] *
... ... ... ...
[9] chr2 [19, 24] *
[10] chr2 [25, 25] *
[11] chr2 [26, 34] *
[12] chr2 [35, 35] *
[13] chr2 [36, 36] *
Where as merging and disjoining are suited for the ideal case of matching calls between individuals, technical and biological variation typically results in data with all kinds of overlaps and fuzzy region borders. That would be a way to account for that.
Copied from original issue: Bioconductor/RaggedExperiment#9
Vignette Section 6 (Overlap analysis) would benefit from some basic overlap functionality in the beginning of the section, to demonstrate eg how to carry over CNV type from summarized population ranges (CNVRs) to some query regions such as protein coding genes, ie something along the lines:
genes <- genes(TxDb)
cnvrs <- populationRanges(grl)
hits <- findOverlaps(genes, cnvrs, ignore.strand=TRUE)
cgenes <- genes[queryHits(hits)]
cgenes$type <- cnvrs$type[subjectHits(hits)]
cgenes
@viniciushdasilva think about a small illustrative example analysis of your phenotype association functionality for the corresponding section in our vignette
.
Using an appropriate subset of the Nelore CNV calls the association analysis with MT as for the PONE paper might be nice.
In order to make the analysis fast and illustrative and the needed data slices small (as given in the extdata
folder), you might also only wanna use a subset of samples.
Hi @lgeistlinger, first of all thank you for providing such a cool tool! Unfortunately, I come across following issue:
Restricting analysis to 18 intersecting samples
Preprocessing RNA-seq data ...
Summarizing per-sample CN state in each CNV region
Excluding 492 cnvrs not satisfying min.samples threshold
Analyzing 25 regions with >=1 gene in the given window
1 of 25
Error in .testCnvExpr(y, cgenes[[i]], cnv.states[i, ], min.state.freq = min.samples, :
nr.states > 1 is not TRUE
This issue has been discussed before in issue #30 and in issue #40. I send you all the necessary data via mail (gmail account). It would be amazing if you could look into it. These are the packages used:
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] MultiAssayExperiment_1.28.0 SummarizedExperiment_1.32.0 MatrixGenerics_1.14.0
[4] matrixStats_1.2.0 plyranges_1.20.0 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[7] ensembldb_2.24.1 AnnotationFilter_1.24.0 GenomicFeatures_1.54.1
[10] AnnotationDbi_1.64.0 Biobase_2.62.0 **CNVRanger_1.19.0**
[13] devtools_2.4.5 usethis_2.2.2 circlize_0.4.15
[16] snpsel_0.0.0.9003 plotly_4.10.4 vcfR_1.15.0
[19] ComplexHeatmap_2.18.0 glue_1.7.0 lubridate_1.9.3
[22] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
[25] purrr_1.0.2 readr_2.1.5 tidyr_1.3.0
[28] tibble_3.2.1 ggplot2_3.4.4 tidyverse_2.0.0
[31] BSgenome_1.68.0 rtracklayer_1.62.0 Biostrings_2.70.1
[34] XVector_0.42.0 maftools_2.18.0 RaggedExperiment_1.26.0
[37] GenomicRanges_1.54.1 GenomeInfoDb_1.38.5 IRanges_2.36.0
[40] S4Vectors_0.40.2 AnnotationHub_3.8.0 BiocFileCache_2.10.1
[43] dbplyr_2.4.0 BiocGenerics_0.48.1
loaded via a namespace (and not attached):
[1] splines_4.3.1 later_1.3.2 BiocIO_1.12.0 bitops_1.0-7 filelock_1.0.3
[6] rpart_4.1.23 XML_3.99-0.16 lifecycle_1.0.4 edgeR_4.0.7 doParallel_1.0.17
[11] processx_3.8.3 vroom_1.6.5 lattice_0.22-5 MASS_7.3-60.0.1 backports_1.4.1
[16] magrittr_2.0.3 rmarkdown_2.25 Hmisc_5.1-1 limma_3.58.1 yaml_2.3.8
[21] remotes_2.4.2.1 httpuv_1.6.13 sessioninfo_1.2.2 pkgbuild_1.4.3 Gviz_1.44.2
[26] DBI_1.2.1 RColorBrewer_1.1-3 abind_1.4-5 pkgload_1.3.3 zlibbioc_1.48.0
[31] biovizBase_1.48.0 RCurl_1.98-1.14 nnet_7.3-19 VariantAnnotation_1.46.0 rappdirs_0.3.3
[36] GenomeInfoDbData_1.2.11 ggrepel_0.9.5 vegan_2.6-4 permute_0.9-7 codetools_0.2-19
[41] DelayedArray_0.28.0 xml2_1.3.6 DNAcopy_1.76.0 tidyselect_1.2.0 shape_1.4.6
[46] base64enc_0.1-3 GenomicAlignments_1.38.0 jsonlite_1.8.8 GetoptLong_1.0.5 Formula_1.2-5
[51] ellipsis_0.3.2 pinfsc50_1.3.0 survival_3.5-7 iterators_1.0.14 foreach_1.5.2
[56] tools_4.3.1 progress_1.2.3 Rcpp_1.0.12 gridExtra_2.3 BiocBaseUtils_1.4.0
[61] SparseArray_1.2.3 xfun_0.41 mgcv_1.9-1 withr_2.5.2 BiocManager_1.30.22
[66] fastmap_1.1.1 latticeExtra_0.6-30 fansi_1.0.6 callr_3.7.3 digest_0.6.34
[71] timechange_0.2.0 R6_2.5.1 mime_0.12 colorspace_2.1-0 gtools_3.9.5
[76] jpeg_0.1-10 dichromat_2.0-0.1 biomaRt_2.58.0 RSQLite_2.3.4 utf8_1.2.4
[81] generics_0.1.3 data.table_1.14.10 prettyunits_1.2.0 httr_1.4.7 htmlwidgets_1.6.4
[86] S4Arrays_1.2.0 pkgconfig_2.0.3 gtable_0.3.4 blob_1.2.4 htmltools_0.5.7
[91] profvis_0.3.8 ProtGenerics_1.32.0 clue_0.3-65 scales_1.3.0 png_0.1-8
[96] knitr_1.45 rstudioapi_0.15.0 tzdb_0.4.0 rjson_0.2.21 checkmate_2.3.1
[101] nlme_3.1-164 curl_5.2.0 cachem_1.0.8 GlobalOptions_0.1.2 BiocVersion_3.17.1
[106] parallel_4.3.1 miniUI_0.1.1.1 foreign_0.8-86 desc_1.4.3 restfulr_0.0.15
[111] pillar_1.9.0 vctrs_0.6.5 urlchecker_1.0.1 promises_1.2.1 xtable_1.8-4
[116] cluster_2.1.6 htmlTable_2.4.2 evaluate_0.23 locfit_1.5-9.8 cli_3.6.2
[121] compiler_4.3.1 Rsamtools_2.18.0 rlang_1.1.3 crayon_1.5.2 interp_1.1-5
[126] ps_1.7.5 fs_1.6.3 stringi_1.8.3 deldir_2.0-2 viridisLite_0.4.2
[131] BiocParallel_1.36.0 munsell_0.5.0 lazyeval_0.2.2 Matrix_1.6-5 hms_1.1.3
[136] bit64_4.0.5 statmod_1.5.0 KEGGREST_1.42.0 shiny_1.8.0 interactiveDisplayBase_1.38.0
[141] memoise_2.0.1 bit_4.0.5 ape_5.7-1
Thanks for your help!
Best,
Moritz
Thank you for providing this tool and it seems very useful especially for it's ability to work with any diploid species as compared to GISTIC. I am working with CNVkit and have segmented CNV calls from multiple samples. Please see the snapshot below:
Chromosome | start | end | sample | cn |
---|---|---|---|---|
chr1 | 7,435 | 1,129,028 | T10 | 2 |
chr1 | 1,129,028 | 122,678,785 | T10 | 2 |
chr2 | 100 | 4,368,377 | T10 | 2 |
chr2 | 4,368,377 | 4,440,739 | T10 | 2 |
chr2 | 4,440,739 | 4,469,134 | T10 | 2 |
chr2 | 4,469,134 | 4,657,825 | T10 | 2 |
chr1 | 7,435 | 122,678,785 | T2 | 2 |
chr2 | 100 | 14,439,486 | T2 | 1 |
chr2 | 14,439,486 | 14,449,562 | T2 | 2 |
chr2 | 14,449,562 | 21,877,348 | T2 | 1 |
chr2 | 21,877,348 | 21,881,012 | T2 | 2 |
chr2 | 21,881,012 | 23,457,436 | T2 | 1 |
chr2 | 23,457,436 | 23,464,764 | T2 | 2 |
chr2 | 23,464,764 | 26,984,008 | T2 | 1 |
chr2 | 26,984,008 | 26,990,420 | T2 | 3 |
chr1 | 7,435 | 122,678,785 | T3 | 2 |
chr2 | 100 | 2,517,195 | T3 | 2 |
chr2 | 2,517,195 | 4,747,590 | T3 | 2 |
chr2 | 4,747,590 | 4,808,045 | T3 | 2 |
chr2 | 4,808,045 | 5,288,931 | T3 | 3 |
chr2 | 5,288,931 | 5,322,822 | T3 | 2 |
chr2 | 5,322,822 | 5,533,496 | T3 | 3 |
chr2 | 5,533,496 | 5,841,062 | T3 | 1 |
chr2 | 5,841,062 | 5,924,417 | T3 | 2 |
chr2 | 5,924,417 | 6,366,842 | T3 | 2 |
chr2 | 6,366,842 | 18,423,139 | T3 | 2 |
chr2 | 18,423,139 | 18,980,063 | T3 | 2 |
chr2 | 18,980,063 | 20,078,338 | T3 | 2 |
chr2 | 20,078,338 | 21,716,133 | T3 | 2 |
chr2 | 21,716,133 | 23,024,171 | T3 | 2 |
chr1 | 7,435 | 122,678,785 | T4 | 2 |
chr2 | 100 | 2,426,513 | T4 | 2 |
chr2 | 2,426,513 | 3,714,372 | T4 | 2 |
chr2 | 3,714,372 | 4,259,376 | T4 | 2 |
chr2 | 4,259,376 | 4,308,839 | T4 | 2 |
chr2 | 4,308,839 | 4,348,226 | T4 | 2 |
chr2 | 4,348,226 | 4,447,151 | T4 | 2 |
chr2 | 4,447,151 | 4,560,732 | T4 | 2 |
chr2 | 4,560,732 | 4,621,186 | T4 | 2 |
chr2 | 4,621,186 | 5,070,013 | T4 | 2 |
chr2 | 5,070,013 | 5,088,332 | T4 | 2 |
chr2 | 5,088,332 | 5,272,443 | T4 | 2 |
chr2 | 5,272,443 | 5,668,855 | T4 | 2 |
chr2 | 5,668,855 | 5,682,595 | T4 | 3 |
chr2 | 5,682,595 | 5,777,858 | T4 | 2 |
chr2 | 5,777,858 | 5,896,937 | T4 | 2 |
This is my first project with CNVkit and the segments I got seems to be very large while the example shown for the CNVRanger has very short intervals.
Hello waldronlab,
I found a bug in CNVRanger
due to an unhandled edge case. When calling the populationRanges
function with default parameters, if a recurrent region has a majority of normal copy number (i.e., state == 2), then CNVRanger:::.getType
returns character(0)
, breaking the vapply statement in CNVRanger::.classifyRegs
:
types <- vapply(stateL, .getType, character(1), type.thresh = type.thresh,
USE.NAMES = FALSE)
Minimal reprex:
states <- rep(2, 10)
CNVRanger:::.getType(states, type.thresh=0.1)
# character(0)
For the sample case, the only situation where the function would return is if type.thresh=0
since both fract.amp
and fract.del
will be equal to zero.
Best,
Christopher Eeles
Software Developer
Haibe-Kains Lab @ PMCC
Some comments on your getPLINK
function:
getPLINK
does not need all.paths[c(1,3)]
)setwd
: use absolute file pathsPLINK.zip
)as.character
not needed for get_os
switch
when several options availableunzip
-> use exdir
argumentchmod 755
for executables (corresponds to rwxr-xr-x
)system2
instead of system
(avoids need for quoting)grepl
: logical grepping (TRUE
/ FALSE
)An improved implementation could look like this:
.getPLINK <- function(plink.path, version="1.07"){
plink.url <- "http://zzz.bwh.harvard.edu/plink/dist/plink-"
plink.url <- paste0(plink.url, version, "-")
plink.file <- file.path(plink.path, "PLINK.zip")
# get os-specific binary
os <- rappdirs:::get_os()
os <- switch(os,
unix = "x86_64",
mac = "mac-intel",
win = "dos",
os)
plink.url <- paste0(plink.url, os, ".zip")
# download & unzip
download.file(plink.url, plink.file)
unzip(plink.file, exdir=plink.path)
# remove zip
file.remove(plink.file)
# path to plink binary
plink.file <- dir(plink.path, pattern = "plink*")
plink.path <- file.path(plink.path, plink.file, "plink")
Sys.chmod(plink.path, mode = "755")
suppressWarnings(
res <- system2(plink.path, args="--noweb", stdout=TRUE, stderr=FALSE)
)
res <- any( grepl(version, res) )
return(res)
}
Dear @lgeistlinger,
moving from email where I tried to reach you for a specific error appeared when trying to implement CNVRanger for 34 WES samples from colorectal cancer patients:
briefly, based on an international collaborative project between DKFZ and other Greek research institutions regarding personalized medicine in cancer (http://www.accc.gr/aboutACCC_info.html), we have acquired around 34 samples with DNA and RNA which were sequenced and pre-processed in the DKFZ bioinformatics facility. Overall, my major goal is to investigate, if there any specific mutational patterns in any of the 3 separated groups of patients, that might interrelate the presence of specific mutational patterns (KRASonly, BRAFonly and wild type), as the ultimate goal is to investigate the molecular landscape of these 3 defined groups-we also utilize additionally public TCGA data to enhance our sample size-
My major question would be for the actual analysis of the CNV data from our 34 patients-as already copy number alterations were called with cnvkit, and as a further step purity and ploidy estimated; for a start, I tried to analyze all data collectively, based on the segmented log2 rations, as in your vignette, with the following code:
head(DT)
# A tibble: 6 x 5
file chromosome start end log2
<chr> <chr> <int> <int> <dbl>
1 CRC_ACCC_01 1 12080 3743427 0.191
2 CRC_ACCC_01 1 3745760 12942340 0.00996
3 CRC_ACCC_01 1 12942900 13802754 -0.278
4 CRC_ACCC_01 1 13803254 43766223 -0.0250
5 CRC_ACCC_01 1 43766723 44684459 0.107
6 CRC_ACCC_01 1 44684588 103345463 -0.0443
smean <- DT$log2
state <- round(2^smean * 2)
state[state > 4] <- 4
DT$log2 <- state
names(DT)[names(DT) == "log2"] <- "state"
DT <- DT[DT$state != 2,]
head(DT)
# A tibble: 6 x 5
file chromosome start end state
<chr> <chr> <int> <int> <dbl>
1 CRC_ACCC_01 1 109778935 109823258 3
2 CRC_ACCC_01 1 152080055 152329130 3
3 CRC_ACCC_01 2 10500 28856096 3
4 CRC_ACCC_01 2 28863705 71742899 3
5 CRC_ACCC_01 2 71743258 74786985 3
6 CRC_ACCC_01 2 74787273 92325671 3
grl <- GenomicRanges::makeGRangesListFromDataFrame(DT, split.field="file", keep.extra.columns=TRUE)
saveRDS(grl, file = "Input.test.grl.CRC.rds")
grl.2 <- populationRanges(grl, density=0.1, est.recur=FALSE)
sel.freq.grl <- subset(grl.2, freq > 1)
library(EnsDb.Hsapiens.v75)
human.hg19.genes <- ensembldb::genes(EnsDb.Hsapiens.v75)
sel.hg19.genes <- subset(human.hg19.genes, gene_biotype == "protein_coding")
olaps <- GenomicRanges::findOverlaps(sel.hg19.genes, sel.freq.grl, ignore.strand=TRUE)
qh <- S4Vectors::queryHits(olaps)
sh <- S4Vectors::subjectHits(olaps)
cgenes <- sel.hg19.genes[qh]
cgenes$type <- sel.freq.grl$type[sh]
However, initially in the following step of trying the visualization through the function cnvOncoPrint:
cnvOncoPrint(sel.freq.grl, cgenes, top.features=15, top.samples = 34)
Error in as(calls, "RaggedExperiment") :
no method or default for coercing "GRanges" to "RaggedExperiment"
cnvOncoPrint(grl, cgenes, top.features=15, top.samples = 34)
Following `at` are removed: GAIN1, GAIN2, because no color was defined for them.
Error: Length of `graphics` should be the same as number of labels.
Thus, in your opinion, how this error could be fixed? It is something wrong with any of the input files? For simplicity I have attached also the initial created GRangeslist object;
sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] EnsDb.Hsapiens.v75_2.99.0 ensembldb_2.17.4 AnnotationFilter_1.17.1
[4] GenomicFeatures_1.45.2 AnnotationDbi_1.55.1 Biobase_2.53.0
[7] CNVRanger_1.9.0 RaggedExperiment_1.17.4 GenomicRanges_1.45.0
[10] GenomeInfoDb_1.29.8 IRanges_2.27.2 S4Vectors_0.31.5
[13] BiocGenerics_0.39.2 data.table_1.14.2 forcats_0.5.1
[16] stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
[19] readr_2.0.2 tidyr_1.1.4 tibble_3.1.5
[22] ggplot2_3.3.5 tidyverse_1.3.1
With Kind Regards,
Efstathios
Hi!
I have obtained unexpected qqplots. Checking the code, I noticed that in the pheno_assoc.R file, at line 126, the qqplot are generated using the adjusted p-values.
print(.qqunifPlot(segs.pvalue.gr$MinPvalueAdjusted, auto.key = list(corner = c(0.95, 0.05))))
Here an example:
Shouldn't the QQ plot be generated using the unadjusted p-values instead of the adjusted p-values? In fact, if I try to manually plot the raw unadjusted pvalues, I obtain:
I might not have all the information, but based on my understanding, this plot appears to be accurate.
Thank you in advance,
Davide
Good work on the phenotype functionality so far and breaking things into pieces and functions!
Here are some more suggestions on writing robust and efficient code and coding style in general:
https://bioconductor.org/developers/how-to/efficient-code/
https://www.bioconductor.org/developers/how-to/coding-style/
@LiNk-NY any additional helpful resources you can link us to?
@viniciushdasilva : the file should be rather called pheno_assoc.R
, as we will have also association with other features such as expression and genomic regions, which will go in other R
files. Please rename.
Please also check on my suggestions on getPLINK
and createFolderTree
(separate issues) and resolve similar things accordingly throughout your code.
Some more specific suggestions:
base::function
not needed (is an exception as base
is always the when R
is installed) (also: GenomicRanges
and S4Vectors
as I plan to import them entirely)
setwd
/ getwd
—> absolute file paths!packageStartupMessage
—> message
1:n
—> use seq_len
and seq_along
for robustness insteadas.integer(as.character(x))
—> as.integer(x)
as.data.frame(cbind())
—> data.frame()
if(PlinkVer != "TRUE")
—> if(!PlinkVer)
read.table -> write.table
?? —> use strsplit
BiocParallel
: registered()
defaultsreturn
: unreachable statementallow.fork=T
—> allow.fork=TRUE
(no partial argument matching for sake of robustness)read.gdsn
: can this be vectorized?cnvGWAS
: far too huge function, break into modular pieces (helper functions)Let me know in case things are unclear
Dear authors,
Thanks for the great package! However, I am wondering if the association analysis implementation can include co-variables such as age, gender, and genotype PCs? I tried to consult the document and also the code but found it is not implemented.
Do you plan to add this function in the near future? Do you have any suggestions for including covariables in the current implementations?
Best regards,
Zhenhua
Dear @lgeistlinger,
When I try to plot recurrentRegions with plotRecurrentRegions() function for using hg19. I found that most chromosomes works perfect, and return with nice image. But in some of chromosomes, I get the following errors:
> plotRecurrentRegions(cnvrs, genome="hg19", chr="chr8")
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'rev': argument must be coercible to non-negative integer
> plotRecurrentRegions(cnvrs, genome="hg19", chr="chr9")
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'rev': argument must be coercible to non-negative integer
Have you ever met this problem, and do you know how to solve them?
Thank you and best wishes,
SJ
I started editing the overleaf document.
In particular, I filled in on Introduction and structured the Methods according to the package vignette.
When adding content, please make sure to follow the format style of the references (specifically: list up to three authors, if there are more use et al.)
Furthermore, according to
https://academic.oup.com/bioinformatics/pages/instructions_for_authors
we have
Application notes: approximately 1,300 words or 1,000 words plus one figure
where I would go for the second option, ie. including a figure.
I suggest a two panel figure with a) workflow diagram, and b) illustration of results for an example dataset.
Hello;
I am trying to do association analysis for with CNV using CNV ranger. I am getting the following issue;
segs.pvalue.gr <- cnvGWAS(phen.info, chr.code.name=chr.code.name, method.m.test="none")
sh: /Users/sara/Library/Application Support/CNVRanger/example/PLINK/plink-1.07-mac-intel/plink: Bad CPU type in executable
Error in file(file, "rt") : cannot open the connection
In addition: Warning messages:
1: In file.rename(from, to) :
cannot rename file '/Users/sara/Library/Application Support/CNVRanger/example/PLINK/plink-1.07-mac-intel/plink.gvar.summary' to '/Users/sara/Library/Application Support/CNVRanger/example/PLINK/plink-1.07-mac-intel/plink.gvar.summary.BT', reason 'No such file or directory'
2: In file(file, "rt") :
cannot open file '/Users/sara/Library/Application Support/CNVRanger/example/PLINK/plink-1.07-mac-intel/plink.gvar.summary.BT': No such file or directory
segs.pvalue.gr
Error: object 'segs.pvalue.gr' not found
I see that it can not access the PLINK 1.07. Can you help me to solve this issue? I am using the example data given with CNV Ranger.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.