mcorentin / vargen Goto Github PK

VarGen is an R package designed to get a list of variants related to a disease. It just need an OMIM morbid ID as input and optionally a list of tissues / gwas traits of interest to complete the results. You can also use your own customised list of genes. VarGen is capable of annotating the variants to help you identify the most impactful ones.

License: MIT License

R 100.00%

vargen's Issues

GWAS traits - ERROR- unable to find an inherited method for function ‘seqinfo’ for signature ‘"list"’

Dear all,

During the GWAS Traits step I am receiving this error and so far I didn't managed to understand the source of it.

In the vargen_data folder I correctly have the .tsv GWAS catalog : gwas_catalog_v1.0-associations_e108_r2022-11-01.tsv

gwas_cat <- create_gwas("./vargen_data/")
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘seqinfo’ for signature ‘"list"’

Many thanks for the help!

R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.0.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] vargen_0.2.1                devtools_2.4.5              usethis_2.1.6               data.table_1.14.4          
 [5] myvariant_1.26.0            VariantAnnotation_1.42.1    Rsamtools_2.12.0            Biostrings_2.64.1          
 [9] XVector_0.36.0              SummarizedExperiment_1.26.1 Biobase_2.56.0              MatrixGenerics_1.8.1       
[13] matrixStats_0.62.0          R.utils_2.12.1              R.oo_1.25.0                 R.methodsS3_1.8.2          
[17] rtracklayer_1.56.1          ggplot2_3.4.0               splitstackshape_1.4.8       stringr_1.4.1              
[21] httr_1.4.4                  jsonlite_1.8.3              gwascat_2.28.1              GenomicRanges_1.48.0       
[25] GenomeInfoDb_1.32.4         IRanges_2.30.1              S4Vectors_0.34.0            BiocGenerics_0.42.0        
[29] gtools_3.9.3                biomaRt_2.52.0             

loaded via a namespace (and not attached):
  [1] backports_1.4.1          Hmisc_4.7-1              BiocFileCache_2.4.0      plyr_1.8.7               splines_4.2.1           
  [6] BiocParallel_1.30.4      digest_0.6.30            htmltools_0.5.3          fansi_1.0.3              magrittr_2.0.3          
 [11] checkmate_2.1.0          memoise_2.0.1            BSgenome_1.64.0          cluster_2.1.4            tzdb_0.3.0              
 [16] remotes_2.4.2            readr_2.1.3              prettyunits_1.1.1        jpeg_0.1-9               colorspace_2.0-3        
 [21] blob_1.2.3               rappdirs_0.3.3           xfun_0.34                dplyr_1.0.10             callr_3.7.3             
 [26] crayon_1.5.2             RCurl_1.98-1.9           survival_3.4-0           glue_1.6.2               gtable_0.3.1            
 [31] zlibbioc_1.42.0          DelayedArray_0.22.0      pkgbuild_1.3.1           scales_1.2.1             DBI_1.1.3               
 [36] miniUI_0.1.1.1           Rcpp_1.0.9               xtable_1.8-4             progress_1.2.2           htmlTable_2.4.1         
 [41] foreign_0.8-83           bit_4.0.4                Formula_1.2-4            profvis_0.3.7            htmlwidgets_1.5.4       
 [46] RColorBrewer_1.1-3       ellipsis_0.3.2           urlchecker_1.0.1         pkgconfig_2.0.3          XML_3.99-0.12           
 [51] nnet_7.3-18              dbplyr_2.2.1             deldir_1.0-6             utf8_1.2.2               tidyselect_1.2.0        
 [56] rlang_1.0.6              later_1.3.0              AnnotationDbi_1.58.0     munsell_0.5.0            tools_4.2.1             
 [61] cachem_1.0.6             cli_3.4.1                generics_0.1.3           RSQLite_2.2.18           fastmap_1.1.0           
 [66] yaml_2.3.6               processx_3.8.0           knitr_1.40               bit64_4.0.5              fs_1.5.2                
 [71] purrr_0.3.5              KEGGREST_1.36.3          mime_0.12                xml2_1.3.3               compiler_4.2.1          
 [76] rstudioapi_0.14          filelock_1.0.2           curl_4.3.3               png_0.1-7                tibble_3.1.8            
 [81] stringi_1.7.8            ps_1.7.2                 GenomicFeatures_1.48.4   lattice_0.20-45          Matrix_1.5-1            
 [86] vctrs_0.5.0              pillar_1.8.1             lifecycle_1.0.3          BiocManager_1.30.19      snpStats_1.46.0         
 [91] bitops_1.0-7             httpuv_1.6.6             R6_2.5.1                 BiocIO_1.6.0             latticeExtra_0.6-30     
 [96] promises_1.2.0.1         gridExtra_2.3            sessioninfo_1.2.2        codetools_0.2-18         assertthat_0.2.1        
[101] pkgload_1.3.1            rjson_0.2.21             withr_2.5.0              GenomicAlignments_1.32.1 GenomeInfoDbData_1.2.8  
[106] parallel_4.2.1           hms_1.1.2                grid_4.2.1               rpart_4.1.19             shiny_1.7.3

Error on running Vargen pipeline

Hi!
On trying to run the VarGen pipeline as mentioned in the tutorial I am coming across two different error on running it two different times mentioned as follows:

#Error_1

> obesity_variants <- vargen_pipeline(vargen_dir = "./vargen_data/", 
+                                     omim_morbid_ids = "601665", 
+                                     gtex_tissues = adipose_tissues, 
+                                     gwas_traits = obesity_traits, 
+                                     verbose = T)
[1] "Connecting to the gene mart..."
Error in textConnection(bmResult) : invalid 'text' argument

#Error_2

> obesity_variants <- vargen_pipeline(vargen_dir = "./vargen_data/", 
+                                     omim_morbid_ids = "601665", 
+                                     gtex_tissues = adipose_tissues, 
+                                     gwas_traits = obesity_traits, 
+                                     verbose = T)
[1] "Connecting to the gene mart..."
[1] "Connecting to the snp mart..."
[1] "Building the gwascat object..."
[1] "Reading the enhancer tss association file for FANTOM5... './vargen_data//enhancer_tss_associations.bed'"
Error in file(file, "rt") : cannot open the connection
In addition: Warning messages:
1: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = "601665",  :
  Gene mart not provided (or not a valid Mart object).We used one from connect_to_gene_ensembl() instead.
2: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = "601665",  :
  Snp mart not provided (or not a valid Mart object).We used one from connect_to_snp_ensembl() instead.
3: In file(file, "rt") :
  cannot open file './vargen_data//enhancer_tss_associations.bed': No such file or directory

VarPhen pipeline - phenotypes

Due to a change in biomaRt, get_phenotype_terms() does not return a list of phenotypes anymore.

obesity_phens <- get_phenotype_terms(keyword = "obesity", snp_mart = snp_mart)

Error in biomaRt::getBM(attributes = c("chr_name", "chrom_start", "refsnp_id", :
Values argument contains no data.

However, get_variants_from_phenotypes() still works if the user knows the list of phenotypes to use as input.

Issues with annotation of variants

Hi!
I used the following command to run the VarGen pipeline but I am getting the following error:

> disease_variants <- vargen_pipeline(vargen_dir = "./vargen_data/", 
+                                     omim_morbid_ids = c("125853",
+                                                         "222100", 
+                                                         "601665", 
+                                                         "604302",
+                                                         "212750",
+                                                         "145500",
+                                                         "603813",
+                                                         "166710",
+                                                         "266600",
+                                                         "610938",
+                                                         "601367",
+                                                         "223100",
+                                                         "600807"),
+                                     gtex_tissues = gtex_tissue,
+                                     gwas_traits = disease_traits, 
+                                     verbose = T)
[1] "Connecting to the gene mart..."
[1] "Connecting to the snp mart..."
[1] "Building the gwascat object..."
[1] "Reading the enhancer tss association file for FANTOM5... './vargen_data//enhancer_tss_associations.bed'"
[1] "Starting the pipeline..."
[1] "Getting genes for OMIM: 125853"
[1] "Getting genes for OMIM: 222100"
[1] "Getting genes for OMIM: 601665"
[1] "Getting genes for OMIM: 604302"
[1] "Getting genes for OMIM: 212750"
[1] "Getting genes for OMIM: 145500"
[1] "Getting genes for OMIM: 603813"
[1] "Getting genes for OMIM: 166710"
[1] "Getting genes for OMIM: 266600"
[1] "Getting genes for OMIM: 610938"
[1] "Getting genes for OMIM: 601367"
[1] "Getting genes for OMIM: 223100"
[1] "Getting genes for OMIM: 600807"
[1] "Writing the list of genes to: .//genes_info.tsv"
[1] "Getting the GTEx variants..."
[1] "Loading GTEx lookup table... Please be patient"
|--------------------------------------------------|
|==================================================|
[1] "Number of GTEx ids removed (no corresponding rsid): 135"
Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__,  : 
  Join results in 1895121 rows; more than 471669 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.
In addition: Warning messages:
1: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = c("125853",  :
  Gene mart not provided (or not a valid Mart object).We used one from connect_to_gene_ensembl() instead.
2: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = c("125853",  :
  Snp mart not provided (or not a valid Mart object).We used one from connect_to_snp_ensembl() instead.
3: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
4: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
5: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
6: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
7: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE

Following gtex tissues were taken into consideration :

gtex_tissue <- select_gtex_tissues("./vargen_data/GTEx_Analysis_v8_eQTL/", 
                                   c("adipose",
                                     "artery_coronary",
                                     "pancreas", 
                                     "breast", 
                                     "heart", 
                                     "liver",
                                     "stomach",
                                     "lung",
                                     "fibroblasts",
                                     "small_intestine", "whole_blood", "brain"))

Also, when I am not including the gtex tissue in the pipeline I am able to run the pipeline with some warnings but unable to perfrom annotations:

Running Pipeline :

> disease_variants <- vargen_pipeline(vargen_dir = "./vargen_data/", 
+                                     omim_morbid_ids = c("125853",
+                                                         "222100", 
+                                                         "601665", 
+                                                         "604302",
+                                                         "212750",
+                                                         "145500",
+                                                         "603813",
+                                                         "166710",
+                                                         "266600",
+                                                         "610938",
+                                                         "601367",
+                                                         "223100",
+                                                         "600807"),
+                                     gwas_traits = disease_traits, 
+                                     verbose = T)
[1] "Connecting to the gene mart..."
[1] "Connecting to the snp mart..."
[1] "Building the gwascat object..."
[1] "Reading the enhancer tss association file for FANTOM5... './vargen_data//enhancer_tss_associations.bed'"
[1] "Starting the pipeline..."
[1] "Getting genes for OMIM: 125853"
[1] "Getting genes for OMIM: 222100"
[1] "Getting genes for OMIM: 601665"
[1] "Getting genes for OMIM: 604302"
[1] "Getting genes for OMIM: 212750"
[1] "Getting genes for OMIM: 145500"
[1] "Getting genes for OMIM: 603813"
[1] "Getting genes for OMIM: 166710"
[1] "Getting genes for OMIM: 266600"
[1] "Getting genes for OMIM: 610938"
[1] "Getting genes for OMIM: 601367"
[1] "Getting genes for OMIM: 223100"
[1] "Getting genes for OMIM: 600807"
[1] "Writing the list of genes to: .//genes_info.tsv"
[1] "No values for 'gtex_tissues', skipping GTEx step..."
[1] "Getting the gwas variants,,,"
[1] "Writing the variants to .//vargen_variants.tsv"
Warning messages:
1: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = c("125853",  :
  Gene mart not provided (or not a valid Mart object).We used one from connect_to_gene_ensembl() instead.
2: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = c("125853",  :
  Snp mart not provided (or not a valid Mart object).We used one from connect_to_snp_ensembl() instead.
3: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
4: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
5: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
6: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
7: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
8: In (function (seqlevels, genome, new_style)  :
  cannot switch GRCh38's seqlevels from NCBI to UCSC style

Performing annotation:

> disease_annotation <- annotate_variants(disease_variants$rsid, verbose = T)
Error in if (cj == upper) next : missing value where TRUE/FALSE needed
In addition: Warning message:
In (1:g) * nnm : NAs produced by integer overflow

"convert_gtex_to_rsids" does not work for InDels.

GTEx and Ensembl do not use the same position to refer to the same InDels.

For example with 1_760811_CTCTT_C_b37 (rs200712425).

GTEx format: CTCTT becomes C.
Ensembl format: TCTTTCTTT becomes TCTTT.

In both cases "TCTT" gets deleted, but GTEx refers it from the left (position 760811) and ensembl from the right (position 760812).

Since VarGen uses the position to translate the GTEx id to rsid, "convert_gtex_to_rsids" does not work for InDels.

mcorentin / vargen Goto Github PK

vargen's Issues

GWAS traits - ERROR- unable to find an inherited method for function ‘seqinfo’ for signature ‘"list"’

Error on running Vargen pipeline

VarPhen pipeline - phenotypes

Issues with annotation of variants

"convert_gtex_to_rsids" does not work for InDels.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent