Git Product home page Git Product logo

mcorentin / vargen Goto Github PK

View Code? Open in Web Editor NEW
13.0 5.0 3.0 1.71 MB

VarGen is an R package designed to get a list of variants related to a disease. It just need an OMIM morbid ID as input and optionally a list of tissues / gwas traits of interest to complete the results. You can also use your own customised list of genes. VarGen is capable of annotating the variants to help you identify the most impactful ones.

License: MIT License

R 100.00%

vargen's Issues

GWAS traits - ERROR- unable to find an inherited method for function ‘seqinfo’ for signature ‘"list"’

Dear all,

During the GWAS Traits step I am receiving this error and so far I didn't managed to understand the source of it.

In the vargen_data folder I correctly have the .tsv GWAS catalog : gwas_catalog_v1.0-associations_e108_r2022-11-01.tsv

gwas_cat <- create_gwas("./vargen_data/")
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘seqinfo’ for signature ‘"list"’

Many thanks for the help!

R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.0.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] vargen_0.2.1                devtools_2.4.5              usethis_2.1.6               data.table_1.14.4          
 [5] myvariant_1.26.0            VariantAnnotation_1.42.1    Rsamtools_2.12.0            Biostrings_2.64.1          
 [9] XVector_0.36.0              SummarizedExperiment_1.26.1 Biobase_2.56.0              MatrixGenerics_1.8.1       
[13] matrixStats_0.62.0          R.utils_2.12.1              R.oo_1.25.0                 R.methodsS3_1.8.2          
[17] rtracklayer_1.56.1          ggplot2_3.4.0               splitstackshape_1.4.8       stringr_1.4.1              
[21] httr_1.4.4                  jsonlite_1.8.3              gwascat_2.28.1              GenomicRanges_1.48.0       
[25] GenomeInfoDb_1.32.4         IRanges_2.30.1              S4Vectors_0.34.0            BiocGenerics_0.42.0        
[29] gtools_3.9.3                biomaRt_2.52.0             

loaded via a namespace (and not attached):
  [1] backports_1.4.1          Hmisc_4.7-1              BiocFileCache_2.4.0      plyr_1.8.7               splines_4.2.1           
  [6] BiocParallel_1.30.4      digest_0.6.30            htmltools_0.5.3          fansi_1.0.3              magrittr_2.0.3          
 [11] checkmate_2.1.0          memoise_2.0.1            BSgenome_1.64.0          cluster_2.1.4            tzdb_0.3.0              
 [16] remotes_2.4.2            readr_2.1.3              prettyunits_1.1.1        jpeg_0.1-9               colorspace_2.0-3        
 [21] blob_1.2.3               rappdirs_0.3.3           xfun_0.34                dplyr_1.0.10             callr_3.7.3             
 [26] crayon_1.5.2             RCurl_1.98-1.9           survival_3.4-0           glue_1.6.2               gtable_0.3.1            
 [31] zlibbioc_1.42.0          DelayedArray_0.22.0      pkgbuild_1.3.1           scales_1.2.1             DBI_1.1.3               
 [36] miniUI_0.1.1.1           Rcpp_1.0.9               xtable_1.8-4             progress_1.2.2           htmlTable_2.4.1         
 [41] foreign_0.8-83           bit_4.0.4                Formula_1.2-4            profvis_0.3.7            htmlwidgets_1.5.4       
 [46] RColorBrewer_1.1-3       ellipsis_0.3.2           urlchecker_1.0.1         pkgconfig_2.0.3          XML_3.99-0.12           
 [51] nnet_7.3-18              dbplyr_2.2.1             deldir_1.0-6             utf8_1.2.2               tidyselect_1.2.0        
 [56] rlang_1.0.6              later_1.3.0              AnnotationDbi_1.58.0     munsell_0.5.0            tools_4.2.1             
 [61] cachem_1.0.6             cli_3.4.1                generics_0.1.3           RSQLite_2.2.18           fastmap_1.1.0           
 [66] yaml_2.3.6               processx_3.8.0           knitr_1.40               bit64_4.0.5              fs_1.5.2                
 [71] purrr_0.3.5              KEGGREST_1.36.3          mime_0.12                xml2_1.3.3               compiler_4.2.1          
 [76] rstudioapi_0.14          filelock_1.0.2           curl_4.3.3               png_0.1-7                tibble_3.1.8            
 [81] stringi_1.7.8            ps_1.7.2                 GenomicFeatures_1.48.4   lattice_0.20-45          Matrix_1.5-1            
 [86] vctrs_0.5.0              pillar_1.8.1             lifecycle_1.0.3          BiocManager_1.30.19      snpStats_1.46.0         
 [91] bitops_1.0-7             httpuv_1.6.6             R6_2.5.1                 BiocIO_1.6.0             latticeExtra_0.6-30     
 [96] promises_1.2.0.1         gridExtra_2.3            sessioninfo_1.2.2        codetools_0.2-18         assertthat_0.2.1        
[101] pkgload_1.3.1            rjson_0.2.21             withr_2.5.0              GenomicAlignments_1.32.1 GenomeInfoDbData_1.2.8  
[106] parallel_4.2.1           hms_1.1.2                grid_4.2.1               rpart_4.1.19             shiny_1.7.3

Error on running Vargen pipeline

Hi!
On trying to run the VarGen pipeline as mentioned in the tutorial I am coming across two different error on running it two different times mentioned as follows:

#Error_1

> obesity_variants <- vargen_pipeline(vargen_dir = "./vargen_data/", 
+                                     omim_morbid_ids = "601665", 
+                                     gtex_tissues = adipose_tissues, 
+                                     gwas_traits = obesity_traits, 
+                                     verbose = T)
[1] "Connecting to the gene mart..."
Error in textConnection(bmResult) : invalid 'text' argument

#Error_2

> obesity_variants <- vargen_pipeline(vargen_dir = "./vargen_data/", 
+                                     omim_morbid_ids = "601665", 
+                                     gtex_tissues = adipose_tissues, 
+                                     gwas_traits = obesity_traits, 
+                                     verbose = T)
[1] "Connecting to the gene mart..."
[1] "Connecting to the snp mart..."
[1] "Building the gwascat object..."
[1] "Reading the enhancer tss association file for FANTOM5... './vargen_data//enhancer_tss_associations.bed'"
Error in file(file, "rt") : cannot open the connection
In addition: Warning messages:
1: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = "601665",  :
  Gene mart not provided (or not a valid Mart object).We used one from connect_to_gene_ensembl() instead.
2: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = "601665",  :
  Snp mart not provided (or not a valid Mart object).We used one from connect_to_snp_ensembl() instead.
3: In file(file, "rt") :
  cannot open file './vargen_data//enhancer_tss_associations.bed': No such file or directory

VarPhen pipeline - phenotypes

Due to a change in biomaRt, get_phenotype_terms() does not return a list of phenotypes anymore.

obesity_phens <- get_phenotype_terms(keyword = "obesity", snp_mart = snp_mart)

Error in biomaRt::getBM(attributes = c("chr_name", "chrom_start", "refsnp_id", :
Values argument contains no data.

However, get_variants_from_phenotypes() still works if the user knows the list of phenotypes to use as input.

Issues with annotation of variants

Hi!
I used the following command to run the VarGen pipeline but I am getting the following error:

> disease_variants <- vargen_pipeline(vargen_dir = "./vargen_data/", 
+                                     omim_morbid_ids = c("125853",
+                                                         "222100", 
+                                                         "601665", 
+                                                         "604302",
+                                                         "212750",
+                                                         "145500",
+                                                         "603813",
+                                                         "166710",
+                                                         "266600",
+                                                         "610938",
+                                                         "601367",
+                                                         "223100",
+                                                         "600807"),
+                                     gtex_tissues = gtex_tissue,
+                                     gwas_traits = disease_traits, 
+                                     verbose = T)
[1] "Connecting to the gene mart..."
[1] "Connecting to the snp mart..."
[1] "Building the gwascat object..."
[1] "Reading the enhancer tss association file for FANTOM5... './vargen_data//enhancer_tss_associations.bed'"
[1] "Starting the pipeline..."
[1] "Getting genes for OMIM: 125853"
[1] "Getting genes for OMIM: 222100"
[1] "Getting genes for OMIM: 601665"
[1] "Getting genes for OMIM: 604302"
[1] "Getting genes for OMIM: 212750"
[1] "Getting genes for OMIM: 145500"
[1] "Getting genes for OMIM: 603813"
[1] "Getting genes for OMIM: 166710"
[1] "Getting genes for OMIM: 266600"
[1] "Getting genes for OMIM: 610938"
[1] "Getting genes for OMIM: 601367"
[1] "Getting genes for OMIM: 223100"
[1] "Getting genes for OMIM: 600807"
[1] "Writing the list of genes to: .//genes_info.tsv"
[1] "Getting the GTEx variants..."
[1] "Loading GTEx lookup table... Please be patient"
|--------------------------------------------------|
|==================================================|
[1] "Number of GTEx ids removed (no corresponding rsid): 135"
Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__,  : 
  Join results in 1895121 rows; more than 471669 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.
In addition: Warning messages:
1: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = c("125853",  :
  Gene mart not provided (or not a valid Mart object).We used one from connect_to_gene_ensembl() instead.
2: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = c("125853",  :
  Snp mart not provided (or not a valid Mart object).We used one from connect_to_snp_ensembl() instead.
3: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
4: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
5: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
6: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
7: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE

Following gtex tissues were taken into consideration :

gtex_tissue <- select_gtex_tissues("./vargen_data/GTEx_Analysis_v8_eQTL/", 
                                   c("adipose",
                                     "artery_coronary",
                                     "pancreas", 
                                     "breast", 
                                     "heart", 
                                     "liver",
                                     "stomach",
                                     "lung",
                                     "fibroblasts",
                                     "small_intestine", "whole_blood", "brain"))

Also, when I am not including the gtex tissue in the pipeline I am able to run the pipeline with some warnings but unable to perfrom annotations:

Running Pipeline :

> disease_variants <- vargen_pipeline(vargen_dir = "./vargen_data/", 
+                                     omim_morbid_ids = c("125853",
+                                                         "222100", 
+                                                         "601665", 
+                                                         "604302",
+                                                         "212750",
+                                                         "145500",
+                                                         "603813",
+                                                         "166710",
+                                                         "266600",
+                                                         "610938",
+                                                         "601367",
+                                                         "223100",
+                                                         "600807"),
+                                     gwas_traits = disease_traits, 
+                                     verbose = T)
[1] "Connecting to the gene mart..."
[1] "Connecting to the snp mart..."
[1] "Building the gwascat object..."
[1] "Reading the enhancer tss association file for FANTOM5... './vargen_data//enhancer_tss_associations.bed'"
[1] "Starting the pipeline..."
[1] "Getting genes for OMIM: 125853"
[1] "Getting genes for OMIM: 222100"
[1] "Getting genes for OMIM: 601665"
[1] "Getting genes for OMIM: 604302"
[1] "Getting genes for OMIM: 212750"
[1] "Getting genes for OMIM: 145500"
[1] "Getting genes for OMIM: 603813"
[1] "Getting genes for OMIM: 166710"
[1] "Getting genes for OMIM: 266600"
[1] "Getting genes for OMIM: 610938"
[1] "Getting genes for OMIM: 601367"
[1] "Getting genes for OMIM: 223100"
[1] "Getting genes for OMIM: 600807"
[1] "Writing the list of genes to: .//genes_info.tsv"
[1] "No values for 'gtex_tissues', skipping GTEx step..."
[1] "Getting the gwas variants,,,"
[1] "Writing the variants to .//vargen_variants.tsv"
Warning messages:
1: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = c("125853",  :
  Gene mart not provided (or not a valid Mart object).We used one from connect_to_gene_ensembl() instead.
2: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = c("125853",  :
  Snp mart not provided (or not a valid Mart object).We used one from connect_to_snp_ensembl() instead.
3: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
4: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
5: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
6: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
7: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
8: In (function (seqlevels, genome, new_style)  :
  cannot switch GRCh38's seqlevels from NCBI to UCSC style

Performing annotation:

> disease_annotation <- annotate_variants(disease_variants$rsid, verbose = T)
Error in if (cj == upper) next : missing value where TRUE/FALSE needed
In addition: Warning message:
In (1:g) * nnm : NAs produced by integer overflow

"convert_gtex_to_rsids" does not work for InDels.

GTEx and Ensembl do not use the same position to refer to the same InDels.

For example with 1_760811_CTCTT_C_b37 (rs200712425).

  • GTEx format: CTCTT becomes C.
  • Ensembl format: TCTTTCTTT becomes TCTTT.

In both cases "TCTT" gets deleted, but GTEx refers it from the left (position 760811) and ensembl from the right (position 760812).

Since VarGen uses the position to translate the GTEx id to rsid, "convert_gtex_to_rsids" does not work for InDels.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.