waldronlab / cnvranger Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 7.0 16.24 MB

Functionality for CNV analysis

Home Page: https://bioconductor.org/packages/CNVRanger

R 100.00%

bioconductor-package

cnvranger's People

Contributors

Stargazers

Watchers

Forkers

inambioinfo hanielcedraz ajjeon ddalfovo marromesc hasanalanya zhenhua-zhang

cnvranger's Issues

package name ideas

Hi, @lgeistlinger @viniciushdasilva
Perhaps we can rename the package to popRanges?
What do you think?

Regards,
Marcel

Problem with cnvGWAS

I am contacting you because I am following the CNVRanger manual called "Summarization and quantitative trait analysis of CNV ranges" but I am unable to complete the final GWAS.

The problem is:

segs.pvalue.gr <- cnvGWAS(phen.info, chr.code.name=chr.code.name, method.m.test="none")
Error: logical subscript contains NAs

I have checked all the input files and there are no NA values in any of them.

I follow all the step without any problem, and with your files I can get the final results. I tried to make the GDS and import the LRR/BAF values, but an error ocurr "CNV.gds' has been created or opened."

Do you know which could be the problem?

I hope you can help me, thank you in advance!

Analysis of the results of CNV-GWAS

Hi! I have used CNVRanger to analysis my data with different phenotype. Now, i have a question about the results. After i run the following code : "segs.pvalue.gr <- cnvGWAS(phen.info, chr.code.name=chr.code.name, method.m.test="none")", I get a the segs.pvalue.gr. However, about the results, i have questions. What the meaning of "SegName", and what can I know the results of CNV is AMP of DEL for corresponding Phenotype?

Thank you very much!

plotRecurrentRegions: error when using mm10 assembly

I have been have been trying to use the package CNVRanger.
The package works well except at the :

plotRecurrentRegions(cnvrs, genome="mm10", chr="chr1")

It shoots an error:

Error in .getGenomeFromGRange(range, defaults[["genome"]]) :
A genome must be supplied when creating this object.

The vignette, does not mention supplying genome information before this step.
Does the package not work for mm10?
Any other previous steps give me no error.
Could you please help me out on what might be going on here? As even in the vignette, the organism is mentioned for the first time only at this step.(
plotRecurrentRegions)

I will be grateful for any suggestions.
Thanks for your time
Best,
Yashna Paul
PhD student
DKFZ

CNVassoc: createFolderTree

Hi @viniciushdasilva

Some comments on your .createFolderTree function:
- always check file.exists before dir.create —> otherwise a warning is issued
- avoid redundant code (do case division at lowest possible level)
- unlist(list()) —> c()
- paste -> file.path for file paths
- vectorize
- if … else if … else instead of if ... if ... if
- avoid code nesting … hardly readable —> make atomic statements

An improved implementation could look like this:

.createFolderTree <- function(name, folder=NULL){

    # data dir
    if(is.null(folder)) folder <- rappdirs::user_data_dir("cnvAnalyzeR")
    if(!file.exists(folder)) dir.create(folder, recursive=TRUE)

    # project directory
    proj.dir <- file.path(folder, name)
    if(!file.exists(proj.dir)) dir.create(proj.dir)

    # subdirs
    dirs <- c("Inputs", "PLINK", "Results")
    all.paths <- file.path(proj.dir, dirs)
    for(d in all.paths) if(!file.exists(d)) dir.create(d)
  
    return(all.paths)
}

Ranger vignette

Statistical test for Binary phenotype (A group- B group comparison)?

Thank you for making nice tool.

Is there any way for statistical test for binary phenotype, not quantitative phenotype?
I want to compare CNV profiles between samples with phenotype A group and samples with phenotype B group.

cnvGWAS migration

Hi @viniciushdasilva - I think it will be good to have a timeline for the migration. Will you be able to move things over until May 18th? That's the deadline for changes included in the upcoming Bioc3.13 release https://bioconductor.org/developers/release-schedule/? After that it will be another 6 months for the package being in the current state, so I think we should make an effort to meet the deadline.

LOH?

I wonder, how this package describe Loss of Heterogeneity(LOH) ?

Future TODOs

phenotype association: can R's own lm be used instead of plink (to avoid external dependencies)? Are results the same?
synchronize approach for phenotype association and expression association
can we make use of VariantExperiment?

roxygen documentation

Please see https://yihui.name/rd2roxygen/ for a tutorial.
This package is available on CRAN

cnvEQTL error in .getStates(), no CNV region satisfying min.samples threshold

Hi everyone,
I'm pretty new to CNV ranger, however I was able to follow the vignette for most of the steps until the chapter 7 (CNV-expression association analysis).

I tried many times with my data but was always facing the same error:

Restricting analysis to 24 intersecting samples
Preprocessing RNA-seq data ...
Summarizing per-sample CN state in each CNV region
Excluding 4881 cnvrs not satisfying min.samples threshold
Error in .getStates(cnvrs, calls, multi.calls, min.samples, verbose = verbose) : 
  No CNV region satisfying min.samples threshold

I also tried to create a subset for testing since my dataset is pretty huge, restricting the analysis to chromosome 1 and 2 with the command:

sel.cnvrs <- subset(cnvrs, seqnames %in% paste0("chr", 1:2))

GRanges object with 4881 ranges and 3 metadata columns:
         seqnames              ranges strand |      freq        type    pvalue
            <Rle>           <IRanges>  <Rle> | <numeric> <character> <numeric>
     [1]     chr1     3073253-3074322      * |         3        both  0.911548
     [2]     chr1     3102016-3102125      * |         3        both  0.911548
     [3]     chr1     3205901-3671498      * |         3        both  0.753688
     [4]     chr1     3680155-3681788      * |         3        both  0.911548
     [5]     chr1     3752010-3754360      * |         3        both  0.911548
     ...      ...                 ...    ... .       ...         ...       ...
  [4877]     chr2 181724042-181724942      * |         5        gain  0.265666
  [4878]     chr2 181763332-181827797      * |         5        gain  0.265666
  [4879]     chr2 181837854-181857461      * |         5        gain  0.265666
  [4880]     chr2 181864337-181870830      * |         5        gain  0.265666
  [4881]     chr2 181896823-181898712      * |         5        gain  0.265666
  -------
  seqinfo: 21 sequences from an unspecified genome; no seqlengths

and also reducing the number of min.samples when running the command:

res <- cnvEQTL(sel.cnvrs, grl, rse, window = "1Mbp", verbose = TRUE, min.samples = 2)

Just for better understanding i will print here my objects that i put inside the cnvEQTL function:

grl

GRangesList object of length 24:
$KPC438
GRanges object with 55235 ranges and 5 metadata columns:
          seqnames            ranges strand |      Mean          Gene             GeneID     state   Phenotype
             <Rle>         <IRanges>  <Rle> | <numeric>   <character>        <character> <numeric> <character>
      [1]     chr1   3073253-3074322      * |   -0.1428 4933401J01Rik ENSMUSG00000102693         2  epithelial
      [2]     chr1   3102016-3102125      * |   -0.1428       Gm26206 ENSMUSG00000064842         2  epithelial
      [3]     chr1   3205901-3671498      * |   -0.1428          Xkr4 ENSMUSG00000051951         2  epithelial
      [4]     chr1   3252757-3253236      * |   -0.1428       Gm18956 ENSMUSG00000102851         2  epithelial
      [5]     chr1   3365731-3368549      * |   -0.1428       Gm37180 ENSMUSG00000103377         2  epithelial
      ...      ...               ...    ... .       ...           ...                ...       ...         ...
  [55231]     chrY 90603501-90605864      * |   -1.4738       Gm28300 ENSMUSG00000099619         1  epithelial
  [55232]     chrY 90665346-90667625      * |   -1.4738       Gm28301 ENSMUSG00000099399         1  epithelial
  [55233]     chrY 90752427-90755467      * |   -1.4738       Gm21860 ENSMUSG00000095366         1  epithelial
  [55234]     chrY 90753057-90763485      * |   -1.4738      Mid1-ps1 ENSMUSG00000095134         1  epithelial
  [55235]     chrY 90784738-90816465      * |   -1.4738       Gm47283 ENSMUSG00000096768         1  epithelial
  -------
  seqinfo: 21 sequences from an unspecified genome; no seqlengths

<23 more elements>

rse

class: RangedSummarizedExperiment 
dim: 2963 24 
metadata(0):
assays(1): rcounts
rownames(2963): ENSMUSG00000051951 ENSMUSG00000025900 ... ENSMUSG00000038628 ENSMUSG00000098505
rowData names(0):
colnames(24): KPC438 KPC479B ... KPCZ532 KPCZ746
colData names(0):

However the result never changed and I'm always getting the same error.

Before sharing any data i would like to know if any of you ever got the same error as me

Thank you so much in advance

Sessioninfo:

R version 4.3.1 (2023-06-16)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS/LAPACK: /data/research/restools/miniconda/envs/R.4.3.1/lib/libopenblasp-r0.3.24.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ComplexHeatmap_2.16.0              EnsDb.Mmusculus.v79_2.99.0         ensembldb_2.24.1                  
 [4] AnnotationFilter_1.24.0            GenomicFeatures_1.52.2             AnnotationDbi_1.62.2              
 [7] Gviz_1.44.2                        SummarizedExperiment_1.30.2        Biobase_2.60.0                    
[10] MatrixGenerics_1.12.3              matrixStats_1.0.0                  BSgenome.Mmusculus.UCSC.mm10_1.4.3
[13] BSgenome_1.68.0                    rtracklayer_1.60.1                 Biostrings_2.68.1                 
[16] XVector_0.40.0                     regioneR_1.32.0                    AnnotationHub_3.8.0               
[19] BiocFileCache_2.8.0                dbplyr_2.3.4                       tibble_3.2.1                      
[22] CNVRanger_1.16.5                   RaggedExperiment_1.24.2            GenomicRanges_1.52.1              
[25] GenomeInfoDb_1.36.4                IRanges_2.34.1                     S4Vectors_0.38.2                  
[28] BiocGenerics_0.46.0               

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3            shape_1.4.6                   rstudioapi_0.15.0             magrittr_2.0.3               
  [5] rmarkdown_2.25                GlobalOptions_0.1.2           BiocIO_1.10.0                 zlibbioc_1.46.0              
  [9] vctrs_0.6.4                   memoise_2.0.1                 Rsamtools_2.16.0              RCurl_1.98-1.12              
 [13] base64enc_0.1-3               htmltools_0.5.6.1             S4Arrays_1.0.6                progress_1.2.2               
 [17] curl_5.1.0                    Formula_1.2-5                 htmlwidgets_1.6.2             cachem_1.0.8                 
 [21] GenomicAlignments_1.36.0      iterators_1.0.14              mime_0.12                     lifecycle_1.0.3              
 [25] pkgconfig_2.0.3               Matrix_1.6-1.1                R6_2.5.1                      fastmap_1.1.1                
 [29] clue_0.3-65                   GenomeInfoDbData_1.2.10       shiny_1.7.5.1                 digest_0.6.33                
 [33] colorspace_2.1-0              Hmisc_5.1-1                   RSQLite_2.3.1                 filelock_1.0.2               
 [37] fansi_1.0.5                   httr_1.4.7                    abind_1.4-5                   compiler_4.3.1               
 [41] doParallel_1.0.17             bit64_4.0.5                   htmlTable_2.4.1               backports_1.4.1              
 [45] BiocParallel_1.34.2           DBI_1.1.3                     biomaRt_2.56.1                rappdirs_0.3.3               
 [49] DelayedArray_0.26.7           rjson_0.2.21                  tools_4.3.1                   foreign_0.8-85               
 [53] interactiveDisplayBase_1.38.0 httpuv_1.6.12                 nnet_7.3-19                   glue_1.6.2                   
 [57] restfulr_0.0.15               promises_1.2.1                checkmate_2.2.0               cluster_2.1.4                
 [61] generics_0.1.3                gtable_0.3.4                  data.table_1.14.8             hms_1.1.3                    
 [65] xml2_1.3.5                    utf8_1.2.4                    foreach_1.5.2                 BiocVersion_3.17.1           
 [69] pillar_1.9.0                  stringr_1.5.0                 limma_3.56.2                  later_1.3.1                  
 [73] circlize_0.4.15               dplyr_1.1.3                   lattice_0.22-5                deldir_1.0-9                 
 [77] bit_4.0.5                     biovizBase_1.48.0             tidyselect_1.2.0              locfit_1.5-9.8               
 [81] knitr_1.44                    gridExtra_2.3                 ProtGenerics_1.32.0           edgeR_3.42.4                 
 [85] xfun_0.40                     stringi_1.7.12                lazyeval_0.2.2                yaml_2.3.7                   
 [89] evaluate_0.22                 codetools_0.2-19              interp_1.1-4                  BiocManager_1.30.22          
 [93] cli_3.6.1                     rpart_4.1.21                  xtable_1.8-4                  munsell_0.5.0                
 [97] dichromat_2.0-0.1             Rcpp_1.0.11                   png_0.1-8                     XML_3.99-0.14                
[101] parallel_4.3.1                ellipsis_0.3.2                ggplot2_3.4.4                 blob_1.2.4                   
[105] prettyunits_1.2.0             jpeg_0.1-10                   latticeExtra_0.6-30           bitops_1.0-7                 
[109] VariantAnnotation_1.46.0      scales_1.2.1                  crayon_1.5.2                  GetoptLong_1.0.5             
[113] rlang_1.1.1                   KEGGREST_1.40.1

Package building, checking, and installing

@viniciushdasilva

Now that we have the core of the package, it's time to start getting familiar with building and installing of the package -- the ultimate test whether you package is working or not.

On the command line:

R CMD build cnvAnalyzeR --> checks dependencies, builds the vignette, and creates the source tar.gz
optional (for now): R CMD check --no-build-vignettes cnvAnalyzeR_0.99.1.tar.gz (resulting warnings and errors need to be resolved before submission to Bioconductor)
R CMD INSTALL cnvAnalyzeR_0.99.1.tar.gz (installs the package on your system so that it can be loaded via library from within R)

Corresponding functionality for building, checking, and installing is also available in devtools and RStudio (@LiNk-NY knows more about that).

Bioc review

Responsibilities as discussed:

DESCRIPTION (@lgeistlinger)

Minor: Consider using an Authors@R field instead of Author and Maintainer fields
Add a BugReports field pointing to the GitHub issues page
BiocCheck will say to update your R dependency to 3.6.0

NAMESPACE

Looks good. You might want to rename importLRR_BAF to use 'camelCase' so
that it is consistent with other functions. (@viniciushdasilva rename to importLrrBaf)

Consider providing support for MultiAssayExperiment inputs that contain
appropriate data for running cnvExprAssoc. @lgeistlinger
.largest can be simplified to: @lgeistlinger

.largest <- function(scores, ranges, qranges) {
    ind <- IRanges::which.max(GenomicRanges::width(ranges))
    vapply(seq_along(scores), function(i) scores[[i]][ind[i]], scores[[1]]) 
}

Update the @param entry for all.paths @viniciushdasilva
Check inputs to avoid errors (plotManhattan) @viniciushdasilva
Consider using GRanges structures instead of data.frame in plotManhattan @viniciushdasilva
Avoid saving PDF files for the user by default, make the plot.pdf argument FALSE @viniciushdasilva
No need to use gc() explicitly @viniciushdasilva
Functions like setupCnvGWAS should make use of structures such as RaggedExperiment rather than a collection of files. @viniciushdasilva
Use names rather than integers to subset vectors and lists (ex. all.paths in prodGdsCnvLine:395) @viniciushdasilva
It is more efficient to make a call to rappdirs:::get_os() once than every time you need to check the OS, therefore, assign the result to an object and use the object for checking (prodGdsCnv). @viniciushdasilva
FWIW Perhaps GDSArray may help import GDS files (for importLRR_BAF)? @viniciushdasilva
Vectorize where possible and use maps for matching and replacing (e.g., in replace gain / loss sections of .recodeCNVgenotype) @lgeistlinger
Try to avoid reading in the data from a file and accept the dataset as an argument so that users can read as they choose. @viniciushdasilva
Use TRUE and FALSE and not character versions of TRUE and FALSE @viniciushdasilva

Examples

@viniciushdasilva

Check on your @examples sections in pheno_assoc.R.
Can they be directly executed when starting R and only loading your package via library?

(This is automatically checked via R CMD check and running examples for all exported functions are a prerequisite for acceptance in Bioconductor)

Whenever example data is needed, make sure to place small data slices in our extdata folder of the package. This data can then be incorporated in your @examples sections via:

dat <- system.file("extdata/myData.txt", package="cnvAnalyzeR") 
# do something

Note that there is a limit in package size:

https://www.bioconductor.org/developers/package-guidelines/#correctness

The source package resulting from running R CMD build should occupy less than 4MB on disk. The package should require less than 5 minutes to run R CMD check --no-build-vignettes. Using the --no-build-vignettes option ensures that the vignette is built only once.

Error in stop_if_wrong_length("'seqnames'", ans_len)

I got error "'seqnames' must have the length of the object to construct (1) or length 1" when running cnvGWAS.
I recoded the original chromosome names from chr1...chrX, chrY to 1: 24 as required for integer and provided a chr.code.name as this

chr.code.name <- data.frame(V1 = c(1:24),V2 = paste0('chr', c(1:22, 'X', 'Y')))
res_comp2 <- cnvGWAS(phen.info, chr.code.name = chr.code.name, method.m.test="BH")
Error in stop_if_wrong_length("'seqnames'", ans_len) :
'seqnames' must have the length of the object to construct (1) or length 1

Do you have some suggestions?

Error while running cnvGWAS

I get this error when trying to run cnvGWAS:

> segs.pvalue.gr <- cnvGWAS(phen.info, chr.code.name=chr.code.name, method.m.test="none")
Error in stop_if_wrong_length("'seqnames'", ans_len) :
  'seqnames' must have the length of the object to construct (1) or length 1

I've been searching on the internet, if someone had encountered the same error, but haven't found anything :C

Here's the pipeline I've been running before this step:

cnv.calls <- read.csv("cnvs.for.CNVRanger.csv", as.is=TRUE, sep = "\t")

cnv.calls <- GenomicRanges::makeGRangesListFromDataFrame(cnv.calls, 
                                                         split.field="sample.id", keep.extra.columns=TRUE)
sort(cnv.calls)


phen.df <- read.csv("pheno.for.CNVRanger.csv", as.is=TRUE, sep = "\t")

re <- RaggedExperiment::RaggedExperiment(cnv.calls, colData=phen.df)

phen.info <- setupCnvGWAS("project1", phen.loc="pheno.for.CNVRanger.csv",
                          cnv.out.loc=re, map.loc = NULL)

chr.code.name <- data.frame(   
  V1 = c(23, 24), 
  V2 = c("X", "Y"))

segs.pvalue.gr <- cnvGWAS(phen.info, chr.code.name=chr.code.name, method.m.test="none")

Hope, you could help me out with this one :)

populationRanges: multiple problems while I try to build on section "Identifying recurrent regions" of your vignette

Dear Developers,

The functionality of populationRanges() looks very useful for our research, but I got the following problems while trying to build on section "Identifying recurrent regions" from your vignette. Can you help or suggest workarounds?

In addition to Gains/Losses we have CN-LOH type of CNVs. Technically it is "state=2", but it makes sense to include them as a third type of CNVs instead of just removing them. Do you have any suggestion how to do it?
Looking into your source code, populationRanges(..., est.recur = TRUE) calls .estimateRecurrence(pranges, grl) internally, which seems to give you a choice of mode = c("approx", "perm"). Unfortunately populationRanges() itself now has a parameter mode = c("density", "RO"), which makes methods "approx" and "perm" inaccessible.
It seems logical that populationRanges(..., mode = "RO", ro.thresh=0.1, classify.ranges=FALSE, est.recur = TRUE) would estimate recurrence without breaking CNVs into types. At least this is how the meaning of "classify.ranges=FALSE" is explained in your help.
But it will try to use types during .estimateRecurrence() stage anyway and die with "Error in seq_len(len - 1) : argument must be coercible to non-negative integer" even on toy GRangesList example from populationRanges() help.
We would like to process ~15000 CNVs.
But populationRanges(grl.IntOGen, mode = "RO", ro.thresh=0.1, classify.ranges=FALSE, est.recur = TRUE) keep dying with "Error in seq.default(1, length(both.scores) - 1, by = 2) : wrong sign in 'by' argument" even for the simplistic case of 3337 CNVs. We downloaded the list of genes from https://intogen.org/search, adding genomic regions using Bioconductor's Ensembl annotation and fabricated state=3 for ROLE=Act / state=1 for ROLE=LoF. I can upload you the 1.1M .rds file with GRangesList if you need reproducible example.

Thank you in advance,
Daniil Sarkisyan

Summarizing individual CNV calls across a population

Dear researcher,

Thank you so much for providing this useful tool!

I have two question about "Summarizing individual CNV calls across a population" in CNVRanger.

(1) Once we finish the step of "Trimming low-density areas", "Reciprocal overlap" and "Identifying recurrent regions", how can I get the list of the remaining ranges with the sample information. Currently, I only have the information on "Seqnames, ranges, strand, freq, type and/or pval" for each range, such as "chr1: 1896112-2004603". But I have no idea which samples these CNV exactly occurs in.

(2) I have got a lot of CNV regions in the population of healthy people and patients with a rare disease. In this case, among the three method for filtration: (1)"Trimming low-density areas", (2)"Reciprocal overlap" and (3)"Identifying recurrent regions", which one do you suggest to use?

Best

Errors with populationRanges(grl, density=0.1, est.recur=TRUE) and cnvOncoPrint(grl, cgenes)

Hi:
Thank you very much for sharing a very useful tool, I really appreciate it.

First of all, I followed your instructions (https://bioconductor.org/packages/devel/bioc/vignettes/AnnotationHub/inst/doc/AnnotationHub-HOWTO.html),
and all worked well.

Then, I started to anlyze my data and got two errors.
1.
cnvrs <- populationRanges(grl, density=0.1, est.recur=TRUE)
seq.default(1, length(both.scores) - 1, by = 2) error:
wrong sign in "by" argument

This is sort of weird to me, because the default paremter (est.recur=FALSE) ran without any errors , which is for the "Trimming low-density areas" accourding to your website.
cnvrs <- populationRanges(grl, density=0.1)

On top of that, I have encountered the second error when I performed the below code to visualize the oncoprint.
2.
cnvOncoPrint(grl, cgenes)
normalizeDoubleBracketSubscript(i, x, exact = exact, allow.NA = TRUE, error:
subscript is out of bounds

I am analyzing human data and my session info are as follows.
3.

sessionInfo()
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=Japanese_Japan.utf8 LC_CTYPE=Japanese_Japan.utf8 LC_MONETARY=Japanese_Japan.utf8
[4] LC_NUMERIC=C LC_TIME=Japanese_Japan.utf8

time zone: Asia/Tokyo
tzcode source: internal

attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] SummarizedExperiment_1.32.0 MatrixGenerics_1.14.0
[3] matrixStats_1.2.0 BSgenome.Hsapiens.UCSC.hg38_1.4.5
[5] BSgenome_1.70.2 rtracklayer_1.62.0
[7] BiocIO_1.12.0 Biostrings_2.70.3
[9] XVector_0.42.0 regioneR_1.34.0
[11] ensembldb_2.26.0 AnnotationFilter_1.26.0
[13] GenomicFeatures_1.54.4 AnnotationDbi_1.64.1
[15] Biobase_2.62.0 AnnotationHub_3.10.0
[17] BiocFileCache_2.10.1 dbplyr_2.5.0
[19] Gviz_1.46.1 CNVRanger_1.18.1
[21] RaggedExperiment_1.26.0 GenomicRanges_1.54.1
[23] GenomeInfoDb_1.38.8 IRanges_2.36.0
[25] S4Vectors_0.40.2 BiocGenerics_0.48.1
[27] dplyr_1.1.4

loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 shape_1.4.6.1 rstudioapi_0.15.0
[4] magrittr_2.0.3 rmarkdown_2.26 GlobalOptions_0.1.2
[7] zlibbioc_1.48.2 vctrs_0.6.5 memoise_2.0.1
[10] Rsamtools_2.18.0 RCurl_1.98-1.14 base64enc_0.1-3
[13] BiocBaseUtils_1.4.0 htmltools_0.5.7 S4Arrays_1.2.1
[16] progress_1.2.3 curl_5.2.1 SparseArray_1.2.4
[19] Formula_1.2-5 htmlwidgets_1.6.4 cachem_1.0.8
[22] GenomicAlignments_1.38.2 iterators_1.0.14 mime_0.12
[25] lifecycle_1.0.4 pkgconfig_2.0.3 Matrix_1.6-1.1
[28] R6_2.5.1 fastmap_1.1.1 clue_0.3-65
[31] GenomeInfoDbData_1.2.11 shiny_1.8.0 digest_0.6.35
[34] colorspace_2.1-0 Hmisc_5.1-2 RSQLite_2.3.5
[37] filelock_1.0.3 fansi_1.0.6 httr_1.4.7
[40] abind_1.4-5 compiler_4.3.2 doParallel_1.0.17
[43] bit64_4.0.5 withr_3.0.0 htmlTable_2.4.2
[46] backports_1.4.1 BiocParallel_1.36.0 DBI_1.2.2
[49] biomaRt_2.58.2 rappdirs_0.3.3 DelayedArray_0.28.0
[52] rjson_0.2.21 tools_4.3.2 foreign_0.8-85
[55] interactiveDisplayBase_1.40.0 httpuv_1.6.14 nnet_7.3-19
[58] glue_1.7.0 restfulr_0.0.15 promises_1.2.1
[61] checkmate_2.3.1 cluster_2.1.4 generics_0.1.3
[64] gtable_0.3.4 data.table_1.15.2 hms_1.1.3
[67] xml2_1.3.6 utf8_1.2.4 foreach_1.5.2
[70] BiocVersion_3.18.1 pillar_1.9.0 stringr_1.5.1
[73] later_1.3.2 circlize_0.4.16 lattice_0.21-9
[76] deldir_2.0-4 bit_4.0.5 biovizBase_1.50.0
[79] tidyselect_1.2.1 ComplexHeatmap_2.18.0 knitr_1.45
[82] gridExtra_2.3 ProtGenerics_1.34.0 xfun_0.42
[85] stringi_1.8.3 lazyeval_0.2.2 yaml_2.3.8
[88] evaluate_0.23 codetools_0.2-19 interp_1.1-6
[91] tibble_3.2.1 BiocManager_1.30.22 cli_3.6.2
[94] rpart_4.1.21 xtable_1.8-4 munsell_0.5.0
[97] dichromat_2.0-0.1 Rcpp_1.0.12 png_0.1-8
[100] XML_3.99-0.16.1 parallel_4.3.2 ellipsis_0.3.2
[103] ggplot2_3.5.0 blob_1.2.4 prettyunits_1.2.0
[106] jpeg_0.1-10 latticeExtra_0.6-30 bitops_1.0-7
[109] VariantAnnotation_1.48.1 scales_1.3.0 purrr_1.0.2
[112] crayon_1.5.2 GetoptLong_1.0.5 rlang_1.1.3
[115] KEGGREST_1.42.0

Thank you in advance!

Is cnvEQTL function appropriated for transcript abundances from Salmon?

Hi @lgeistlinger,

In the help page of cnvEQTL function it reads:

... it is typically recommended to exclude from the analysis (i) genes with fewer than r reads per million reads mapped ...
... Use the min.cpm and min.samples arguments, respectively....

I was not able to find the min.cpm argument. However, it seems that the number of genes in my output was drastically reduced in comparison with my input object (i.e. RangedSummarizedExperiment). Initially I had 23k but only 4k remained in the output.

Could you check on that issue please?

I have used Salmon quantification (https://combine-lab.github.io/salmon/) and not read counts per million. Should we think in a different filter parameter for Salmon quantification?

Best

how to add Covariate data in CNVeqtl or CNVgwas

Hi! I can't find how to add covariate file such as peer,batch...And it's neccessuary to add them,as cnveqtl doesn't support Negative cpm.so it is a good choice to add them in eqtl model but not to use residuals by peer (negative possibly)
if i can receive my answer,I'll appericiate it sincerely!thanks!

Interface with Oncoplot

Demonstrate interface with oncoplot on a cnvrs x sample matrix.
Instead of CNVRs this could be eg CNV-containing genes - which is likely a frequent use case.

This should make use of RaggedExperiment::qreduceAssay.

Transform CNV calls to a RaggedExperiment as in CNVRanger's vignette, Section 4.3
Use your summarized population ranges (putatively mapped to altered genes as described #17 depending on what the rows of the oncoplot should be) as query regions for RaggedExperiment::qreduceAssay
Apply RaggedExperiment::qreduceAssay as described in RaggedExperiment's vignette, Section 6.4

This will result in a matrix (query regions x samples) that stores CN state in this region and should be ready for oncoplot.

cnvEQTL: test for linearity / correlation

@DarioS moving over from email to issue tracker:

If there are more than two copy number levels for a gene, CNVRanger does an ANOVA-like test, which is a test for differences in means. Is there a way to treat the copy number as a numeric measurement and test for linearity of response? I have copy number estimates from whole genome sequencing and a gene may have copy numbers such as two, four and nine in different samples and I want to know if increasing copy number is associated with increased abundance, rather than just different abundance. I am not interested in analysing distant copy numbers, but the copy number of the gene itself.

cnvEQTL error: "nr.states > 1 is not TRUE"

Hi,
I was following the vignette to get the association between the CNVs of my samples and the expression and came up with the following error:

res <- cnvEQTL(cnvrs, grl, rse, window = NULL, verbose = F)
Error in .testCnvExpr(y, cgenes[[i]], cnv.states[i, ], min.state.freq = min.samples,  : 
  nr.states > 1 is not TRUE

For what understood this is related with the number of states left after filtering by the argument min.samples. So there need to be more that 1 state with more than "min.samples"(default=10) samples and if all the samples present a certain region the error will occur. This error can be bypassed the problem just setting the argument min.samples = 1, which it doesn't make much sense, or filtering the cnv regions like this: cnvrs <-subset(cnvrs, freq < ncol(rcounts)-min.samples)

It would be nice that you add this extra filtering step in the vignette or maybe it would be better to exclude just the problematic regions or throw a warning instead of an error.
Thanks for the package

summarization of individual calls/ranges across population

From @lgeistlinger on August 8, 2017 17:54

(In CNV analysis, I assume also for similar data types such as somatic mutation), it is often of interest to summarize invidual calls across the
population, (i.e. define CNV regions), for subsequent association analysis with e.g. phenotype data.

https://www.ncbi.nlm.nih.gov/pubmed/19812545
https://www.ncbi.nlm.nih.gov/pubmed/22539667 (Figure 1 for a sketch of the concept)

In the simplest case, this just merges overlapping individual calls into summarized regions.
However, this typically inflates CNVR size and trimming low-density areas
(usually <10% of the total contributing individual calls within a summarized region)
is advisable.
Alternatively, a threshold for reciprocal overlap between any two individual calls
can also be applied.

That means such a function would take a RaggedExperiment as Input and would return
a GRanges object storing the summarized genomic regions.

an example:

grl <- GRangesList(
sample1 = GRanges( c("chr1:1-10", "chr2:15-18", "chr2:25-34") ),
sample2 = GRanges( c("chr1:1-10", "chr2:11-18" , "chr2:25-36") ),
sample3 = GRanges( c("chr1:2-11", "chr2:14-18", "chr2:26-36") ),
sample4 = GRanges( c("chr1:1-12", "chr2:18-35" ) ),
sample5 = GRanges( c("chr1:1-12", "chr2:11-17" , "chr2:26-34") ) ,
sample6 = GRanges( c("chr1:1-12", "chr2:12-18" , "chr2:25-35") )
)

re <- RaggedExperiment(grl)

Default (as you would typically like to have it in CNV analysis)

populationRanges(re, density=0.1)

GRanges object with 3 ranges and 0 metadata columns:
seqnames ranges strand
[1] chr1 [ 1, 12] *
[2] chr2 [11, 18] *
[3] chr2 [25, 36] *

density=0 merges all overlapping regions, equivalent to: reduce(unlist(grl))

populationRanges(re, density=0)

GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand

[1] chr1 [ 1, 12] *
[2] chr2 [11, 36] *

density=1 disjoins all overlapping regions, equivalent to: disjoin(unlist(grl))

populationRanges(re, density=1)

GRanges object with 13 ranges and 0 metadata columns:
seqnames ranges strand

[1] chr1 [ 1, 1] *
[2] chr1 [ 2, 10] *
[3] chr1 [11, 11] *
[4] chr1 [12, 12] *
[5] chr2 [11, 13] *
... ... ... ...
[9] chr2 [19, 24] *
[10] chr2 [25, 25] *
[11] chr2 [26, 34] *
[12] chr2 [35, 35] *
[13] chr2 [36, 36] *

Where as merging and disjoining are suited for the ideal case of matching calls between individuals, technical and biological variation typically results in data with all kinds of overlaps and fuzzy region borders. That would be a way to account for that.

Copied from original issue: Bioconductor/RaggedExperiment#9

Simple overlap analysis

Vignette Section 6 (Overlap analysis) would benefit from some basic overlap functionality in the beginning of the section, to demonstrate eg how to carry over CNV type from summarized population ranges (CNVRs) to some query regions such as protein coding genes, ie something along the lines:

genes <- genes(TxDb)
cnvrs <- populationRanges(grl)

hits <- findOverlaps(genes, cnvrs, ignore.strand=TRUE)
cgenes <- genes[queryHits(hits)]
cgenes$type <- cnvrs$type[subjectHits(hits)]
cgenes

vignette section: phenotype assocation

@viniciushdasilva think about a small illustrative example analysis of your phenotype association functionality for the corresponding section in our vignette.

Using an appropriate subset of the Nelore CNV calls the association analysis with MT as for the PONE paper might be nice.

In order to make the analysis fast and illustrative and the needed data slices small (as given in the extdata folder), you might also only wanna use a subset of samples.

cnvEQTL error: "nr.states > 1 is not TRUE

Hi @lgeistlinger, first of all thank you for providing such a cool tool! Unfortunately, I come across following issue:

Restricting analysis to 18 intersecting samples
Preprocessing RNA-seq data ...
Summarizing per-sample CN state in each CNV region
Excluding 492 cnvrs not satisfying min.samples threshold
Analyzing 25 regions with >=1 gene in the given window
1 of 25
Error in .testCnvExpr(y, cgenes[[i]], cnv.states[i, ], min.state.freq = min.samples,  : 
  nr.states > 1 is not TRUE

This issue has been discussed before in issue #30 and in issue #40. I send you all the necessary data via mail (gmail account). It would be amazing if you could look into it. These are the packages used:

attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] MultiAssayExperiment_1.28.0 SummarizedExperiment_1.32.0 MatrixGenerics_1.14.0
[4] matrixStats_1.2.0 plyranges_1.20.0 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[7] ensembldb_2.24.1 AnnotationFilter_1.24.0 GenomicFeatures_1.54.1
[10] AnnotationDbi_1.64.0 Biobase_2.62.0 **CNVRanger_1.19.0**
[13] devtools_2.4.5 usethis_2.2.2 circlize_0.4.15
[16] snpsel_0.0.0.9003 plotly_4.10.4 vcfR_1.15.0
[19] ComplexHeatmap_2.18.0 glue_1.7.0 lubridate_1.9.3
[22] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
[25] purrr_1.0.2 readr_2.1.5 tidyr_1.3.0
[28] tibble_3.2.1 ggplot2_3.4.4 tidyverse_2.0.0
[31] BSgenome_1.68.0 rtracklayer_1.62.0 Biostrings_2.70.1
[34] XVector_0.42.0 maftools_2.18.0 RaggedExperiment_1.26.0
[37] GenomicRanges_1.54.1 GenomeInfoDb_1.38.5 IRanges_2.36.0
[40] S4Vectors_0.40.2 AnnotationHub_3.8.0 BiocFileCache_2.10.1
[43] dbplyr_2.4.0 BiocGenerics_0.48.1

loaded via a namespace (and not attached):
[1] splines_4.3.1 later_1.3.2 BiocIO_1.12.0 bitops_1.0-7 filelock_1.0.3
[6] rpart_4.1.23 XML_3.99-0.16 lifecycle_1.0.4 edgeR_4.0.7 doParallel_1.0.17
[11] processx_3.8.3 vroom_1.6.5 lattice_0.22-5 MASS_7.3-60.0.1 backports_1.4.1
[16] magrittr_2.0.3 rmarkdown_2.25 Hmisc_5.1-1 limma_3.58.1 yaml_2.3.8
[21] remotes_2.4.2.1 httpuv_1.6.13 sessioninfo_1.2.2 pkgbuild_1.4.3 Gviz_1.44.2
[26] DBI_1.2.1 RColorBrewer_1.1-3 abind_1.4-5 pkgload_1.3.3 zlibbioc_1.48.0
[31] biovizBase_1.48.0 RCurl_1.98-1.14 nnet_7.3-19 VariantAnnotation_1.46.0 rappdirs_0.3.3
[36] GenomeInfoDbData_1.2.11 ggrepel_0.9.5 vegan_2.6-4 permute_0.9-7 codetools_0.2-19
[41] DelayedArray_0.28.0 xml2_1.3.6 DNAcopy_1.76.0 tidyselect_1.2.0 shape_1.4.6
[46] base64enc_0.1-3 GenomicAlignments_1.38.0 jsonlite_1.8.8 GetoptLong_1.0.5 Formula_1.2-5
[51] ellipsis_0.3.2 pinfsc50_1.3.0 survival_3.5-7 iterators_1.0.14 foreach_1.5.2
[56] tools_4.3.1 progress_1.2.3 Rcpp_1.0.12 gridExtra_2.3 BiocBaseUtils_1.4.0
[61] SparseArray_1.2.3 xfun_0.41 mgcv_1.9-1 withr_2.5.2 BiocManager_1.30.22
[66] fastmap_1.1.1 latticeExtra_0.6-30 fansi_1.0.6 callr_3.7.3 digest_0.6.34
[71] timechange_0.2.0 R6_2.5.1 mime_0.12 colorspace_2.1-0 gtools_3.9.5
[76] jpeg_0.1-10 dichromat_2.0-0.1 biomaRt_2.58.0 RSQLite_2.3.4 utf8_1.2.4
[81] generics_0.1.3 data.table_1.14.10 prettyunits_1.2.0 httr_1.4.7 htmlwidgets_1.6.4
[86] S4Arrays_1.2.0 pkgconfig_2.0.3 gtable_0.3.4 blob_1.2.4 htmltools_0.5.7
[91] profvis_0.3.8 ProtGenerics_1.32.0 clue_0.3-65 scales_1.3.0 png_0.1-8
[96] knitr_1.45 rstudioapi_0.15.0 tzdb_0.4.0 rjson_0.2.21 checkmate_2.3.1
[101] nlme_3.1-164 curl_5.2.0 cachem_1.0.8 GlobalOptions_0.1.2 BiocVersion_3.17.1
[106] parallel_4.3.1 miniUI_0.1.1.1 foreign_0.8-86 desc_1.4.3 restfulr_0.0.15
[111] pillar_1.9.0 vctrs_0.6.5 urlchecker_1.0.1 promises_1.2.1 xtable_1.8-4
[116] cluster_2.1.6 htmlTable_2.4.2 evaluate_0.23 locfit_1.5-9.8 cli_3.6.2
[121] compiler_4.3.1 Rsamtools_2.18.0 rlang_1.1.3 crayon_1.5.2 interp_1.1-5
[126] ps_1.7.5 fs_1.6.3 stringi_1.8.3 deldir_2.0-2 viridisLite_0.4.2
[131] BiocParallel_1.36.0 munsell_0.5.0 lazyeval_0.2.2 Matrix_1.6-5 hms_1.1.3
[136] bit64_4.0.5 statmod_1.5.0 KEGGREST_1.42.0 shiny_1.8.0 interactiveDisplayBase_1.38.0
[141] memoise_2.0.1 bit_4.0.5 ape_5.7-1

Thanks for your help!

Best,
Moritz

Questions - Input type

Thank you for providing this tool and it seems very useful especially for it's ability to work with any diploid species as compared to GISTIC. I am working with CNVkit and have segmented CNV calls from multiple samples. Please see the snapshot below:

Chromosome	start	end	sample	cn
chr1	7,435	1,129,028	T10	2
chr1	1,129,028	122,678,785	T10	2
chr2	100	4,368,377	T10	2
chr2	4,368,377	4,440,739	T10	2
chr2	4,440,739	4,469,134	T10	2
chr2	4,469,134	4,657,825	T10	2
chr1	7,435	122,678,785	T2	2
chr2	100	14,439,486	T2	1
chr2	14,439,486	14,449,562	T2	2
chr2	14,449,562	21,877,348	T2	1
chr2	21,877,348	21,881,012	T2	2
chr2	21,881,012	23,457,436	T2	1
chr2	23,457,436	23,464,764	T2	2
chr2	23,464,764	26,984,008	T2	1
chr2	26,984,008	26,990,420	T2	3
chr1	7,435	122,678,785	T3	2
chr2	100	2,517,195	T3	2
chr2	2,517,195	4,747,590	T3	2
chr2	4,747,590	4,808,045	T3	2
chr2	4,808,045	5,288,931	T3	3
chr2	5,288,931	5,322,822	T3	2
chr2	5,322,822	5,533,496	T3	3
chr2	5,533,496	5,841,062	T3	1
chr2	5,841,062	5,924,417	T3	2
chr2	5,924,417	6,366,842	T3	2
chr2	6,366,842	18,423,139	T3	2
chr2	18,423,139	18,980,063	T3	2
chr2	18,980,063	20,078,338	T3	2
chr2	20,078,338	21,716,133	T3	2
chr2	21,716,133	23,024,171	T3	2
chr1	7,435	122,678,785	T4	2
chr2	100	2,426,513	T4	2
chr2	2,426,513	3,714,372	T4	2
chr2	3,714,372	4,259,376	T4	2
chr2	4,259,376	4,308,839	T4	2
chr2	4,308,839	4,348,226	T4	2
chr2	4,348,226	4,447,151	T4	2
chr2	4,447,151	4,560,732	T4	2
chr2	4,560,732	4,621,186	T4	2
chr2	4,621,186	5,070,013	T4	2
chr2	5,070,013	5,088,332	T4	2
chr2	5,088,332	5,272,443	T4	2
chr2	5,272,443	5,668,855	T4	2
chr2	5,668,855	5,682,595	T4	3
chr2	5,682,595	5,777,858	T4	2
chr2	5,777,858	5,896,937	T4	2

This is my first project with CNVkit and the segments I got seems to be very large while the example shown for the CNVRanger has very short intervals.

What is the input type expected by CNVRanger (bin-level or segmented)? CNVkit has option to convert segmented calls to absolute integer copy numbers but not for the bin-level calls.
I am getting errors while running CNVRanger with above input file, Before I go into details, first I want to confirm my input is right.

`populationRanges` breaks if a recurrent region is mostly diploid

Hello waldronlab,

I found a bug in CNVRanger due to an unhandled edge case. When calling the populationRanges function with default parameters, if a recurrent region has a majority of normal copy number (i.e., state == 2), then CNVRanger:::.getType returns character(0), breaking the vapply statement in CNVRanger::.classifyRegs:

types <- vapply(stateL, .getType, character(1), type.thresh = type.thresh, 
    USE.NAMES = FALSE)

Minimal reprex:

states <- rep(2, 10)
CNVRanger:::.getType(states, type.thresh=0.1)
# character(0)

For the sample case, the only situation where the function would return is if type.thresh=0 since both fract.amp and fract.del will be equal to zero.

Best,
Christopher Eeles
Software Developer
Haibe-Kains Lab @ PMCC

error for cnvEQTL

Hi,

I am new to the CNVRanger software. When I running the code, I encountered some errors as follows:

So, how can I solve this problem?

Thanks!

CNVassoc: getPLINK

Hi @viniciushdasilva

Some comments on your getPLINK function:

give minimum needed information as function arguments
(getPLINK does not need all.paths[c(1,3)])
avoid setwd: use absolute file paths
plink version!!??
avoid repeated evaluation
find shared aspects between platforms (e.g. always PLINK.zip)
as.character not needed for get_os
switch when several options available
unzip -> use exdir argument
clean up: remove zip file after unzipping
chmod 755 for executables (corresponds to rwxr-xr-x)
use system2 instead of system (avoids need for quoting)
grepl: logical grepping (TRUE / FALSE)

An improved implementation could look like this:

.getPLINK <- function(plink.path, version="1.07"){

    plink.url <- "http://zzz.bwh.harvard.edu/plink/dist/plink-"
    plink.url <- paste0(plink.url, version, "-")

    plink.file <- file.path(plink.path, "PLINK.zip")

    # get os-specific binary 
    os <- rappdirs:::get_os()
    os <- switch(os,
                unix = "x86_64",
                mac = "mac-intel",
                win = "dos",
                os)

    plink.url <- paste0(plink.url, os, ".zip")

    # download & unzip
    download.file(plink.url, plink.file)
    unzip(plink.file, exdir=plink.path)

    # remove zip
    file.remove(plink.file)

    # path to plink binary
    plink.file <- dir(plink.path, pattern = "plink*")
    plink.path <- file.path(plink.path, plink.file, "plink")
    Sys.chmod(plink.path, mode = "755")

    suppressWarnings(
        res <- system2(plink.path, args="--noweb", stdout=TRUE, stderr=FALSE)
    )

    res <- any( grepl(version, res) )
    return(res)
}

Error with function cnvOncoPrint using a colorectal cancer dataset of 34 WES samples

Dear @lgeistlinger,

moving from email where I tried to reach you for a specific error appeared when trying to implement CNVRanger for 34 WES samples from colorectal cancer patients:

briefly, based on an international collaborative project between DKFZ and other Greek research institutions regarding personalized medicine in cancer (http://www.accc.gr/aboutACCC_info.html), we have acquired around 34 samples with DNA and RNA which were sequenced and pre-processed in the DKFZ bioinformatics facility. Overall, my major goal is to investigate, if there any specific mutational patterns in any of the 3 separated groups of patients, that might interrelate the presence of specific mutational patterns (KRASonly, BRAFonly and wild type), as the ultimate goal is to investigate the molecular landscape of these 3 defined groups-we also utilize additionally public TCGA data to enhance our sample size-

My major question would be for the actual analysis of the CNV data from our 34 patients-as already copy number alterations were called with cnvkit, and as a further step purity and ploidy estimated; for a start, I tried to analyze all data collectively, based on the segmented log2 rations, as in your vignette, with the following code:

head(DT)
# A tibble: 6 x 5
  file        chromosome    start       end     log2
  <chr>       <chr>         <int>     <int>    <dbl>
1 CRC_ACCC_01 1             12080   3743427  0.191  
2 CRC_ACCC_01 1           3745760  12942340  0.00996
3 CRC_ACCC_01 1          12942900  13802754 -0.278  
4 CRC_ACCC_01 1          13803254  43766223 -0.0250 
5 CRC_ACCC_01 1          43766723  44684459  0.107  
6 CRC_ACCC_01 1          44684588 103345463 -0.0443 

smean <- DT$log2
state <- round(2^smean * 2)
state[state > 4] <- 4
DT$log2 <- state
names(DT)[names(DT) == "log2"] <- "state"
DT <- DT[DT$state != 2,]
head(DT)
# A tibble: 6 x 5
  file        chromosome     start       end state
  <chr>       <chr>          <int>     <int> <dbl>
1 CRC_ACCC_01 1          109778935 109823258     3
2 CRC_ACCC_01 1          152080055 152329130     3
3 CRC_ACCC_01 2              10500  28856096     3
4 CRC_ACCC_01 2           28863705  71742899     3
5 CRC_ACCC_01 2           71743258  74786985     3
6 CRC_ACCC_01 2           74787273  92325671     3

grl <- GenomicRanges::makeGRangesListFromDataFrame(DT, split.field="file", keep.extra.columns=TRUE)

saveRDS(grl, file = "Input.test.grl.CRC.rds")

grl.2 <- populationRanges(grl, density=0.1, est.recur=FALSE)

sel.freq.grl <- subset(grl.2, freq > 1)

library(EnsDb.Hsapiens.v75)
human.hg19.genes <- ensembldb::genes(EnsDb.Hsapiens.v75)

sel.hg19.genes <- subset(human.hg19.genes, gene_biotype == "protein_coding")

olaps <- GenomicRanges::findOverlaps(sel.hg19.genes, sel.freq.grl, ignore.strand=TRUE)

qh <- S4Vectors::queryHits(olaps)

sh <- S4Vectors::subjectHits(olaps)

cgenes <- sel.hg19.genes[qh]

cgenes$type <- sel.freq.grl$type[sh]

However, initially in the following step of trying the visualization through the function cnvOncoPrint:

cnvOncoPrint(sel.freq.grl, cgenes, top.features=15, top.samples = 34)
Error in as(calls, "RaggedExperiment") : 
  no method or default for coercing "GRanges" to "RaggedExperiment"

cnvOncoPrint(grl, cgenes, top.features=15, top.samples = 34)
Following `at` are removed: GAIN1, GAIN2, because no color was defined for them.
Error: Length of `graphics` should be the same as number of labels.

Thus, in your opinion, how this error could be fixed? It is something wrong with any of the input files? For simplicity I have attached also the initial created GRangeslist object;

sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] EnsDb.Hsapiens.v75_2.99.0 ensembldb_2.17.4          AnnotationFilter_1.17.1  
 [4] GenomicFeatures_1.45.2    AnnotationDbi_1.55.1      Biobase_2.53.0           
 [7] CNVRanger_1.9.0           RaggedExperiment_1.17.4   GenomicRanges_1.45.0     
[10] GenomeInfoDb_1.29.8       IRanges_2.27.2            S4Vectors_0.31.5         
[13] BiocGenerics_0.39.2       data.table_1.14.2         forcats_0.5.1            
[16] stringr_1.4.0             dplyr_1.0.7               purrr_0.3.4              
[19] readr_2.0.2               tidyr_1.1.4               tibble_3.1.5             
[22] ggplot2_3.3.5             tidyverse_1.3.1

Input.test.grl.CRC.rds.zip

With Kind Regards,

Efstathios

QQ-plots using adjusted p-values

Hi!
I have obtained unexpected qqplots. Checking the code, I noticed that in the pheno_assoc.R file, at line 126, the qqplot are generated using the adjusted p-values.

print(.qqunifPlot(segs.pvalue.gr$MinPvalueAdjusted, auto.key = list(corner = c(0.95, 0.05))))
Here an example:

Shouldn't the QQ plot be generated using the unadjusted p-values instead of the adjusted p-values? In fact, if I try to manually plot the raw unadjusted pvalues, I obtain:

I might not have all the information, but based on my understanding, this plot appears to be accurate.

Thank you in advance,
Davide

pheno_assoc.R

Hi @viniciushdasilva

Good work on the phenotype functionality so far and breaking things into pieces and functions!

Here are some more suggestions on writing robust and efficient code and coding style in general:

https://bioconductor.org/developers/how-to/efficient-code/
https://www.bioconductor.org/developers/how-to/coding-style/

@LiNk-NY any additional helpful resources you can link us to?

@viniciushdasilva : the file should be rather called pheno_assoc.R, as we will have also association with other features such as expression and genomic regions, which will go in other Rfiles. Please rename.

Please also check on my suggestions on getPLINK and createFolderTree (separate issues) and resolve similar things accordingly throughout your code.

Some more specific suggestions:

base::function not needed (is an exception as base is always the when R is installed) (also: GenomicRanges and S4Vectors as I plan to import them entirely)
- no setwd / getwd —> absolute file paths!
- no nesting!
- packageStartupMessage —> message
- 1:n —> use seq_len and seq_along for robustness instead
- as.integer(as.character(x)) —> as.integer(x)
- as.data.frame(cbind()) —> data.frame()
- if(PlinkVer != "TRUE") —> if(!PlinkVer)
- read.table -> write.table?? —> use strsplit
- BiocParallel: registered() defaults
- code after return: unreachable statement
- allow.fork=T —> allow.fork=TRUE (no partial argument matching for sake of robustness)
- read.gdsn: can this be vectorized?
- cnvGWAS: far too huge function, break into modular pieces (helper functions)

Let me know in case things are unclear

Co-variables in association analyses

Dear authors,

Thanks for the great package! However, I am wondering if the association analysis implementation can include co-variables such as age, gender, and genotype PCs? I tried to consult the document and also the code but found it is not implemented.
Do you plan to add this function in the near future? Do you have any suggestions for including covariables in the current implementations?

Best regards,
Zhenhua

plotRecurrentRegions() doesn't work for some chromosomes.

Dear @lgeistlinger,

When I try to plot recurrentRegions with plotRecurrentRegions() function for using hg19. I found that most chromosomes works perfect, and return with nice image. But in some of chromosomes, I get the following errors:

> plotRecurrentRegions(cnvrs, genome="hg19", chr="chr8")
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'rev': argument must be coercible to non-negative integer

> plotRecurrentRegions(cnvrs, genome="hg19", chr="chr9")
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'rev': argument must be coercible to non-negative integer

Have you ever met this problem, and do you know how to solve them?

Thank you and best wishes,
SJ

Bioinformatics application note

Hi @viniciushdasilva

I started editing the overleaf document.

In particular, I filled in on Introduction and structured the Methods according to the package vignette.
When adding content, please make sure to follow the format style of the references (specifically: list up to three authors, if there are more use et al.)

Furthermore, according to

https://academic.oup.com/bioinformatics/pages/instructions_for_authors

we have

Application notes: approximately 1,300 words or 1,000 words plus one figure

where I would go for the second option, ie. including a figure.
I suggest a two panel figure with a) workflow diagram, and b) illustration of results for an example dataset.

no question

Issue with CNV GWAS

Hello;
I am trying to do association analysis for with CNV using CNV ranger. I am getting the following issue;

segs.pvalue.gr <- cnvGWAS(phen.info, chr.code.name=chr.code.name, method.m.test="none")
sh: /Users/sara/Library/Application Support/CNVRanger/example/PLINK/plink-1.07-mac-intel/plink: Bad CPU type in executable
Error in file(file, "rt") : cannot open the connection
In addition: Warning messages:
1: In file.rename(from, to) :
cannot rename file '/Users/sara/Library/Application Support/CNVRanger/example/PLINK/plink-1.07-mac-intel/plink.gvar.summary' to '/Users/sara/Library/Application Support/CNVRanger/example/PLINK/plink-1.07-mac-intel/plink.gvar.summary.BT', reason 'No such file or directory'
2: In file(file, "rt") :
cannot open file '/Users/sara/Library/Application Support/CNVRanger/example/PLINK/plink-1.07-mac-intel/plink.gvar.summary.BT': No such file or directory
segs.pvalue.gr
Error: object 'segs.pvalue.gr' not found

I see that it can not access the PLINK 1.07. Can you help me to solve this issue? I am using the example data given with CNV Ranger.

waldronlab / cnvranger Goto Github PK

cnvranger's People

Contributors

Stargazers

Watchers

Forkers

cnvranger's Issues

an example:

Default (as you would typically like to have it in CNV analysis)

density=0 merges all overlapping regions, equivalent to: reduce(unlist(grl))

density=1 disjoins all overlapping regions, equivalent to: disjoin(unlist(grl))

Recommend Projects

Recommend Topics

Recommend Org