rajlabmssm / echoannot Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 0.0 29.54 MB

echoverse module: Annotate fine-mapping results

R 100.00%

echoverse epigenomics fine-mapping genome-annotation

echoannot's People

Contributors

Stargazers

Watchers

echoannot's Issues

Datasets

PAINTOR

Find a way to distribute PAINTOR annotations:
https://github.com/gkichaev/PAINTOR_V3.0/wiki/2b.-Overlapping-annotations
https://ucla.app.box.com/s/x47apvgv51au1rlmuat8m4zdjhcniv2d

IMPACT

Annotations can be read directly from GitHub, but these files are large making this slow. Would be better to tabix-index them and upload them myself:
https://github.com/immunogenomics/IMPACT

Tabix-index versions now on Zenodo!
https://doi.org/10.5281/zenodo.7062238

Storage options

GitHub (LFS)

Google Drive

googledrive

Zenodo

50Gb per project, unlimited projects.
zen4r

FigShare

20Gb total, 500-file limit (across all projects!)
rfigshare

Open Science Framework

osfr

`MOTIFBREAKR_plot`: `Error in grid.Call.graphics(C_unsetviewport, as.integer(n)) : cannot pop the top-level viewport ('grid' and 'graphics' output mixed?)`

1. Bug description

Running MOTIFBREAKR_plot causes error in certain contexts. Documented here:
Simon-Coetzee/motifBreakR#31

Expected behaviour

Function should run in all contexts.

2. Reproducible example

Code

library(BSgenome) ## <-- IMPORTANT!
library(BSgenome.Hsapiens.UCSC.hg19) ## <-- IMPORTANT!
#### Example fine-mapping results ####
merged_DT <- echodata::get_Nalls2019_merged()
#### Run motif analyses ####
mb_res <- MOTIFBREAKR(rsid_list = c("rs11175620"),
                      # limit the number of datasets tested
                      # for demonstration purposes only
                      pwmList_max = 5,
                      calculate_pvals = FALSE)
plot_paths <- MOTIFBREAKR_plot(mb_res = mb_res)

Console output

When run via CRAN checks.

 dat is already a GRanges object.
   Plotting 1 unique RSID(s).
   Plotting motif disruption results: rs11175620
   Error in grid.Call.graphics(C_unsetviewport, as.integer(n)) : 
     cannot pop the top-level viewport ('grid' and 'graphics' output mixed?)
   Calls: MOTIFBREAKR_plot ... drawGD -> .local -> popViewport -> grid.Call.graphics
   Execution halted

When run in R console:

genome_build set to hg19 by default.
Loading required namespace: SNPlocs.Hsapiens.dbSNP144.GRCh37
Using genome_build hg19
+ MOTIFBREAKR:: Converting SNP list into motifbreakR input format.

3. Session info

R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets 
[7] methods   base     

other attached packages:
 [1] echoannot_0.99.10                
 [2] BSgenome.Hsapiens.UCSC.hg19_1.4.3
 [3] BSgenome_1.65.2                  
 [4] rtracklayer_1.57.0               
 [5] Biostrings_2.65.6                
 [6] XVector_0.37.1                   
 [7] GenomicRanges_1.49.1             
 [8] GenomeInfoDb_1.33.13             
 [9] IRanges_2.31.2                   
[10] S4Vectors_0.35.4                 
[11] BiocGenerics_0.43.4              

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                              
  [2] reticulate_1.26                         
  [3] R.utils_2.12.0                          
  [4] tidyselect_1.2.0                        
  [5] poweRlaw_0.70.6                         
  [6] RSQLite_2.2.18                          
  [7] AnnotationDbi_1.59.1                    
  [8] htmlwidgets_1.5.4                       
  [9] grid_4.2.1                              
 [10] BiocParallel_1.31.14                    
 [11] XGR_1.1.8                               
 [12] munsell_0.5.0                           
 [13] codetools_0.2-18                        
 [14] interp_1.1-3                            
 [15] DT_0.26                                 
 [16] colorspace_2.0-3                        
 [17] OrganismDbi_1.39.1                      
 [18] Biobase_2.57.1                          
 [19] filelock_1.0.2                          
 [20] knitr_1.40                              
 [21] supraHex_1.35.0                         
 [22] rstudioapi_0.14                         
 [23] DescTools_0.99.46                       
 [24] motifStack_1.41.1                       
 [25] MatrixGenerics_1.9.1                    
 [26] GenomeInfoDbData_1.2.9                  
 [27] bit64_4.0.5                             
 [28] echoconda_0.99.7                        
 [29] basilisk_1.9.11                         
 [30] vctrs_0.4.2                             
 [31] generics_0.1.3                          
 [32] xfun_0.34                               
 [33] biovizBase_1.45.0                       
 [34] BiocFileCache_2.5.2                     
 [35] R6_2.5.1                                
 [36] splitstackshape_1.4.8                   
 [37] grImport2_0.2-0                         
 [38] AnnotationFilter_1.21.0                 
 [39] bitops_1.0-7                            
 [40] cachem_1.0.6                            
 [41] reshape_0.8.9                           
 [42] DelayedArray_0.23.2                     
 [43] motifbreakR_2.11.2                      
 [44] assertthat_0.2.1                        
 [45] BiocIO_1.7.1                            
 [46] scales_1.2.1                            
 [47] nnet_7.3-18                             
 [48] rootSolve_1.8.2.3                       
 [49] gtable_0.3.1                            
 [50] lmom_2.9                                
 [51] ggbio_1.45.0                            
 [52] ensembldb_2.21.5                        
 [53] seqLogo_1.63.0                          
 [54] rlang_1.0.6                             
 [55] echodata_0.99.15                        
 [56] splines_4.2.1                           
 [57] lazyeval_0.2.2                          
 [58] dichromat_2.0-0.1                       
 [59] hexbin_1.28.2                           
 [60] checkmate_2.1.0                         
 [61] BiocManager_1.30.18                     
 [62] yaml_2.3.6                              
 [63] reshape2_1.4.4                          
 [64] GenomicFeatures_1.49.7                  
 [65] ggnetwork_0.5.10                        
 [66] backports_1.4.1                         
 [67] Hmisc_4.7-1                             
 [68] RBGL_1.73.0                             
 [69] tools_4.2.1                             
 [70] ggplot2_3.3.6                           
 [71] ellipsis_0.3.2                          
 [72] RColorBrewer_1.1-3                      
 [73] proxy_0.4-27                            
 [74] Rcpp_1.0.9                              
 [75] plyr_1.8.7                              
 [76] base64enc_0.1-3                         
 [77] progress_1.2.2                          
 [78] zlibbioc_1.43.0                         
 [79] purrr_0.3.5                             
 [80] RCurl_1.98-1.9                          
 [81] basilisk.utils_1.9.4                    
 [82] prettyunits_1.1.1                       
 [83] rpart_4.1.16                            
 [84] deldir_1.0-6                            
 [85] SummarizedExperiment_1.27.3             
 [86] ggrepel_0.9.1                           
 [87] cluster_2.1.4                           
 [88] fs_1.5.2                                
 [89] crul_1.3                                
 [90] magrittr_2.0.3                          
 [91] data.table_1.14.4                       
 [92] echotabix_0.99.8                        
 [93] dnet_1.1.7                              
 [94] openxlsx_4.2.5                          
 [95] MotifDb_1.39.0                          
 [96] mvtnorm_1.1-3                           
 [97] ProtGenerics_1.29.1                     
 [98] matrixStats_0.62.0                      
 [99] pkgload_1.3.0                           
[100] xtable_1.8-4                            
[101] patchwork_1.1.2                         
[102] hms_1.1.2                               
[103] XML_3.99-0.11                           
[104] jpeg_0.1-9                              
[105] readxl_1.4.1                            
[106] gridExtra_2.3                           
[107] compiler_4.2.1                          
[108] biomaRt_2.53.3                          
[109] tibble_3.1.8                            
[110] crayon_1.5.2                            
[111] R.oo_1.25.0                             
[112] htmltools_0.5.3                         
[113] tzdb_0.3.0                              
[114] TFBSTools_1.35.0                        
[115] Formula_1.2-4                           
[116] tidyr_1.2.1                             
[117] expm_0.999-6                            
[118] Exact_3.2                               
[119] DBI_1.1.3                               
[120] dbplyr_2.2.1                            
[121] MASS_7.3-58.1                           
[122] rappdirs_0.3.3                          
[123] boot_1.3-28                             
[124] ade4_1.7-19                             
[125] Matrix_1.5-1                            
[126] readr_2.1.3                             
[127] piggyback_0.1.4                         
[128] cli_3.4.1                               
[129] R.methodsS3_1.8.2                       
[130] Gviz_1.41.1                             
[131] parallel_4.2.1                          
[132] igraph_1.3.5                            
[133] SNPlocs.Hsapiens.dbSNP144.GRCh37_0.99.20
[134] pkgconfig_2.0.3                         
[135] TFMPvalue_0.0.9                         
[136] GenomicAlignments_1.33.1                
[137] dir.expiry_1.5.1                        
[138] RCircos_1.2.2                           
[139] foreign_0.8-83                          
[140] osfr_0.2.9                              
[141] xml2_1.3.3                              
[142] annotate_1.75.0                         
[143] DirichletMultinomial_1.39.0             
[144] stringr_1.4.1                           
[145] VariantAnnotation_1.43.3                
[146] digest_0.6.30                           
[147] pracma_2.4.2                            
[148] CNEr_1.33.0                             
[149] graph_1.75.0                            
[150] httpcode_0.3.0                          
[151] cellranger_1.1.0                        
[152] htmlTable_2.4.1                         
[153] gld_2.6.5                               
[154] restfulr_0.0.15                         
[155] curl_4.3.3                              
[156] gtools_3.9.3                            
[157] Rsamtools_2.13.4                        
[158] rjson_0.2.21                            
[159] lifecycle_1.0.3                         
[160] nlme_3.1-160                            
[161] jsonlite_1.8.3                          
[162] fansi_1.0.3                             
[163] downloadR_0.99.4                        
[164] pillar_1.8.1                            
[165] lattice_0.20-45                         
[166] GGally_2.1.2                            
[167] GO.db_3.16.0                            
[168] KEGGREST_1.37.3                         
[169] fastmap_1.1.0                           
[170] httr_1.4.4                              
[171] survival_3.4-0                          
[172] glue_1.6.2                              
[173] zip_2.2.1                               
[174] png_0.1-7                               
[175] bit_4.0.4                               
[176] Rgraphviz_2.41.1                        
[177] class_7.3-20                            
[178] stringi_1.7.8                           
[179] blob_1.2.3                              
[180] caTools_1.18.2                          
[181] latticeExtra_0.6-30                     
[182] memoise_2.0.1                           
[183] dplyr_1.0.10                            
[184] e1071_1.7-11                            
[185] ape_5.6-2

Call peaks from bigwig

Currently can only call peaks from bedGraph. Not sure if MACSr can take bigwig, but if not could export bigwig --> bedGraph with rtracklayer first.

roadmap error: `Cannot detect format (no extension found in file name)`

Reprex

Code

topSNPs <- echodata::topSNPs_Nalls2019
  fullSS_path <- echodata::example_fullSS(dataset = "Nalls2019")

  res <- echolocatoR::finemap_locus(
    fullSS_path = fullSS_path,
    topSNPs = topSNPs,
    # results_dir = "/Desktop/res",
    locus = "BST1",
    dataset_name = "Nalls2019",
    fullSS_genome_build = "hg19",
    zoom = c("1x","4x"),
    bp_distance = 25000,
    n_causal = 5,
    force_new_finemap = TRUE,
    plot_types = c("simple","fancy","LD"),
    roadmap = TRUE,
    roadmap_query = "E053",
    # nott_epigenome = TRUE,
    # nott_show_placseq = TRUE,
    munged = TRUE)

Console output

 ROADMAP:: 1 annotation(s) identified that match: E053
Querying subset from Roadmap API: E053 - 1/1
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E053
Preexisting file detected. Set force_overwrite=TRUE to override this.
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'con' in selecting a method for function 'import': Cannot detect format (no extension found in file name)

Session info

``` R version 4.2.1 (2022-06-23) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] echolocatoR_2.0.1 snpStats_1.46.0 Matrix_1.4-1 survival_3.4-0

loaded via a namespace (and not attached):
[1] rappdirs_0.3.3 rtracklayer_1.57.0 GGally_2.1.2
[4] R.methodsS3_1.8.2 ragg_1.2.2 tidyr_1.2.0
[7] echoLD_0.99.6 ggplot2_3.3.6 bit64_4.0.5
[10] knitr_1.40 irlba_2.3.5 DelayedArray_0.22.0
[13] R.utils_2.12.0 data.table_1.14.2 rpart_4.1.16
[16] KEGGREST_1.36.3 RCurl_1.98-1.8 AnnotationFilter_1.20.0
[19] generics_0.1.3 BiocGenerics_0.42.0 GenomicFeatures_1.48.3
[22] RSQLite_2.2.16 proxy_0.4-27 bit_4.0.4
[25] tzdb_0.3.0 xml2_1.3.3 SummarizedExperiment_1.26.1
[28] assertthat_0.2.1 viridis_0.6.2 gargle_1.2.0
[31] xfun_0.32 hms_1.1.2 evaluate_0.16
[34] fansi_1.0.3 restfulr_0.0.15 progress_1.2.2
[37] dbplyr_2.2.1 readxl_1.4.1 Rgraphviz_2.40.0
[40] igraph_1.3.4 DBI_1.1.3 htmlwidgets_1.5.4
[43] reshape_0.8.9 downloadR_0.99.4 stats4_4.2.1
[46] purrr_0.3.4 ellipsis_0.3.2 dplyr_1.0.9
[49] backports_1.4.1 biomaRt_2.52.0 deldir_1.0-6
[52] MatrixGenerics_1.8.1 MungeSumstats_1.5.9 vctrs_0.4.1
[55] Biobase_2.56.0 ensembldb_2.20.2 cachem_1.0.6
[58] BSgenome_1.64.0 checkmate_2.1.0 GenomicAlignments_1.32.1
[61] prettyunits_1.1.1 cluster_2.1.4 ape_5.6-2
[64] dir.expiry_1.4.0 lazyeval_0.2.2 crayon_1.5.1
[67] basilisk.utils_1.8.0 crul_1.2.0 labeling_0.4.2
[70] pkgconfig_2.0.3 GenomeInfoDb_1.32.3 nlme_3.1-159
[73] pkgload_1.3.0 ProtGenerics_1.28.0 XGR_1.1.8
[76] nnet_7.3-17 pals_1.7 rlang_1.0.4
[79] lifecycle_1.0.1 filelock_1.0.2 httpcode_0.3.0
[82] BiocFileCache_2.4.0 echotabix_0.99.7 dichromat_2.0-0.1
[85] cellranger_1.1.0 coloc_5.1.0.1 matrixStats_0.62.0
[88] graph_1.74.0 osfr_0.2.8 boot_1.3-28
[91] base64enc_0.1-3 png_0.1-7 viridisLite_0.4.1
[94] rjson_0.2.21 rootSolve_1.8.2.3 bitops_1.0-7
[97] R.oo_1.25.0 ggnetwork_0.5.10 Biostrings_2.64.1
[100] blob_1.2.3 mixsqp_0.3-43 stringr_1.4.1
[103] echoplot_0.99.4 dnet_1.1.7 readr_2.1.2
[106] jpeg_0.1-9 S4Vectors_0.34.0 echodata_0.99.12
[109] scales_1.2.1 memoise_2.0.1 magrittr_2.0.3
[112] plyr_1.8.7 hexbin_1.28.2 zlibbioc_1.42.0
[115] compiler_4.2.1 echoconda_0.99.7 BiocIO_1.6.0
[118] RColorBrewer_1.1-3 EnsDb.Hsapiens.v75_2.99.0 Rsamtools_2.12.0
[121] cli_3.3.0 XVector_0.36.0 echoannot_0.99.7
[124] patchwork_1.1.2 htmlTable_2.4.1 Formula_1.2-4
[127] MASS_7.3-58.1 tidyselect_1.1.2 stringi_1.7.8
[130] textshaping_0.3.6 yaml_2.3.5 supraHex_1.34.0
[133] latticeExtra_0.6-30 ggrepel_0.9.1 grid_4.2.1
[136] VariantAnnotation_1.42.1 tools_4.2.1 lmom_2.9
[139] parallel_4.2.1 rstudioapi_0.14 foreign_0.8-82
[142] piggyback_0.1.4 gridExtra_2.3 gld_2.6.5
[145] farver_2.1.1 RcppZiggurat_0.1.6 digest_0.6.29
[148] BiocManager_1.30.18 Rcpp_1.0.9 GenomicRanges_1.48.0
[151] OrganismDbi_1.38.1 httr_1.4.4 AnnotationDbi_1.58.0
[154] RCircos_1.2.2 ggbio_1.44.1 biovizBase_1.44.0
[157] colorspace_2.0-3 brio_1.1.3 XML_3.99-0.10
[160] fs_1.5.2 reticulate_1.25 IRanges_2.30.1
[163] splines_4.2.1 RBGL_1.72.0 expm_0.999-6
[166] seqminer_8.4 echofinemap_0.99.3 basilisk_1.8.1
[169] Exact_3.1 mapproj_1.2.8 systemfonts_1.0.4
[172] jsonlite_1.8.0 Rfast_2.0.6 testthat_3.1.4
[175] susieR_0.12.19 R6_2.5.1 Hmisc_4.7-1
[178] pillar_1.8.1 htmltools_0.5.3 glue_1.6.2
[181] fastmap_1.1.0 DT_0.24 BiocParallel_1.30.3
[184] class_7.3-20 codetools_0.2-18 maps_3.4.0
[187] mvtnorm_1.1-3 utf8_1.2.2 lattice_0.20-45
[190] tibble_3.1.8 curl_4.3.2 DescTools_0.99.45
[193] zip_2.2.0 openxlsx_4.2.5 interp_1.1-3
[196] rmarkdown_2.16 googleAuthR_2.0.0 munsell_0.5.0
[199] e1071_1.7-11 GenomeInfoDbData_1.2.8 reshape2_1.4.4
[202] gtable_0.3.0

</details>

Provide API access Dey_Deeplearning dataset

Break up and store files from Dey_DeepLearning.tgz in a GitHub repo so users don't have to download entire dataset (37GB).

Extend DEEPLEARNING. to connect to these remote resources by default.

Ideally, will tabix-index all files as well to improve querying speed.

Reference:

Evaluating the informativeness of deep learning annotations for human complex diseases

Fix `ROADMAP_query`

Because of issues with importing remote bgz files (either via echotabix or rtracklayer), I've gone back to downloading the entire bed.bgz file and then querying the local copy:
lawremi/rtracklayer#76

`echoannot::test_enrichment`: set seed

Set the seed as an argument for reproducible results.

Improve speed of variant annotation

Getting variant annotations using biomaRt can be quite slow. Either figure out a way to improve this speed or switch to another package, like AnnoVar.

Connect to ENCODE

Add API to import data from ENCODE directly.
DeepBlueR seems like a good candidate package to do this, but is extremely complicated to use:
https://www.bioconductor.org/packages/devel/bioc/vignettes/DeepBlueR/inst/doc/DeepBlueR.html

Add peak callers

Get XGR developers to update CRAN version

XGR is only available on GitHub again, it seems. Hasn't been updated since 2018?
hfang-bristol/XGR#13

This means if i try to submitechoannot to CRAN, it can't use XGR.

Bug annotating finemap results with roadmap

1. Bug description

I am trying to annotate finemapping results against brain tissue marks from ROADMAP, but there is this bug at the end of the query

Expected behaviour

2. Reproducible example

Code

(Please add the steps to reproduce the bug here. See here for an intro to making a reproducible example (i.e. reprex) and why they're important! This will help us to help you much faster.)

columnsnames = echodata::construct_colmap(munged= FALSE,
                                          CHR = "CHR", POS = "BP",
                                          SNP = "SNP", P = "P",
                                          Effect = "BETA", StdErr = "SE", 
                                          A1 = "A1", A2 = "A2",
                                          N = "N", N_cases = "N_CAS",
                                          N_controls = "N_CON", MAF = "MAF")
                                          #Freq = "FREQ", N = "N",
                                          #N_cases = NULL,
                                          #N_controls = NULL,
                                          #proportion_cases = NULL,
                                          #MAF = "calculate")
#tstat = NULL)

finemap_loci(# GENERAL ARGUMENTS 
  topSNPs = topSNPs,
  results_dir = fullRS_path,
  loci = topSNPs$Locus,
  dataset_name = "LID_COX",
  dataset_type = "GWAS",  
  
  force_new_subset = TRUE,
  force_new_LD = FALSE,
  force_new_finemap = FALSE,
  remove_tmps = FALSE,
  
  finemap_methods = c("FINEMAP","SUSIE"),
  
  # Munge full sumstats first
  munged = FALSE,
  colmap = columnsnames,
  # SUMMARY STATS ARGUMENTS
  fullSS_path = newSS_name_colmap,
  fullSS_genome_build = "hg19",
  query_by ="tabix",
  
  
  bp_distance = 500000*2,
  min_MAF = 0.001, 
  trim_gene_limits = FALSE,
  case_control = TRUE,
  
  # FINE-MAPPING ARGUMENTS
  ## General
  n_causal = 5,
  credset_thresh = .95,
  consensus_thresh = 2,
  
  
  # LD ARGUMENTS 
  LD_reference = "1KGphase3",#"UKB",
  superpopulation = "EUR",
  download_method = "axel",
  LD_genome_build = "hg19",
  leadSNP_LD_block = FALSE,
  
  #### PLotting args ####
  plot_types = c("fancy"),
  show_plot = TRUE,
  zoom = c("1x", "10x", "20x"),
  #zoom = "1x",
  tx_biotypes = NULL,
  nott_epigenome = FALSE,
  nott_show_placseq = FALSE,
  nott_binwidth = 200,
  nott_bigwig_dir = NULL,
  #xgr_libnames =c("ENCODE_TFBS_ClusteredV3_CellTypes", "TFBS_Conserved", "Uniform_TFBS"),
  
  
  roadmap = TRUE,
  roadmap_query = c("brain"),
  
  #### General args ####
  seed = 2022,
  nThread = 20,
  verbose = TRUE
)

Console output

┌──────────────────────────────────────────┐
│                                          │
│   )))> 🦇 TRIM22 [locus 1 / 1] 🦇 <(((   │
│                                          │
└──────────────────────────────────────────┘

────────────────────────────────────────────────────────────────────────────────

── Step 1 ▶▶▶ Query 🔎 ─────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────
+ Query Method: tabix
Constructing GRanges query using min/max ranges within a single chromosome.
query_dat is already a GRanges object. Returning directly.
========= echotabix::convert =========
Converting full summary stats file to tabix format for fast querying.
Inferred format: 'table'
Explicit format: 'table'
Inferring comment_char from tabular header: 'SNP'
Determining chrom type from file header.
Chromosome format: 1
Detecting column delimiter.
Identified column separator: \t
Sorting rows by coordinates via bash.
Searching for header row with grep.
( grep ^'SNP' .../QC_SNPs_COLMAP.txt; grep
    -v ^'SNP' .../QC_SNPs_COLMAP.txt | sort
    -k2,2n
    -k3,3n ) > .../filedd61f50654_sorted.tsv
Constructing outputs
Using existing bgzipped file: /home/rstudio/echolocatoR/echolocatoR_will/QC_SNPs_COLMAP.txt.bgz 
Set force_new=TRUE to override this.
Tabix-indexing file using: Rsamtools
Data successfully converted to bgzip-compressed, tabix-indexed format.
========= echotabix::query =========
query_dat is already a GRanges object. Returning directly.
Inferred format: 'table'
Querying tabular tabix file using: Rsamtools.
Checking query chromosome style is correct.
Chromosome format: 1
Retrieving data.
Converting query results to data.table.
Processing query: 11:4724803-6724803
Adding 'query' column to results.
Retrieved data with 7,013 rows
Saving query ==> /home/rstudio/echolocatoR/echolocatoR_will/RESULTS/GWAS/LID_COX/TRIM22/TRIM22_LID_COX_subset.tsv.gz
+ Query: 7,013 SNPs x 12 columns.
Standardizing summary statistics subset.
Standardizing main column names.
++ Preparing A1,A1 cols
++ Preparing MAF,Freq cols.
++ Removing SNPs with MAF== 0 | NULL | NA or >1
++ Preparing N_cases,N_controls cols.
++ Preparing proportion_cases col.
++ Calculating proportion_cases from N_cases and N_controls.
Loading required namespace: MungeSumstats
Preparing sample size column (N).
Using existing 'N' column.
+ Imputing t-statistic from Effect and StdErr.
+ leadSNP missing. Assigning new one by min p-value.
++ Ensuring Effect,StdErr,P are numeric.
++ Ensuring 1 SNP per row and per genomic coordinate.
++ Removing extra whitespace
+ Standardized query: 7,013 SNPs x 15 columns.
++ Saving standardized query ==> /home/rstudio/echolocatoR/echolocatoR_will/RESULTS/GWAS/LID_COX/TRIM22/TRIM22_LID_COX_subset.tsv.gz

────────────────────────────────────────────────────────────────────────────────

── Step 2 ▶▶▶ Extract Linkage Disequilibrium 🔗 ────────────────────────────────

────────────────────────────────────────────────────────────────────────────────
LD_reference identified as: 1kg.
Previously computed LD_matrix detected. Importing: /home/rstudio/echolocatoR/echolocatoR_will/RESULTS/GWAS/LID_COX/TRIM22/LD/TRIM22.1KGphase3_LD.RDS
LD_reference identified as: r.
Converting obj to sparseMatrix.
+ FILTER:: Filtering by LD features.

────────────────────────────────────────────────────────────────────────────────

── Step 3 ▶▶▶ Filter SNPs 🚰 ───────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────
FILTER:: Filtering by SNP features.
+ FILTER:: Removing SNPs with MAF < 0.001
+ FILTER:: Post-filtered data: 6997 x 15
+ Subsetting LD matrix and dat to common SNPs...
Removing unnamed rows/cols
Replacing NAs with 0
+ LD_matrix = 6997 SNPs.
+ dat = 6997 SNPs.
+ 6997 SNPs in common.
Converting obj to sparseMatrix.

────────────────────────────────────────────────────────────────────────────────

── Step 4 ▶▶▶ Fine-map 🔊 ──────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────
Gathering method sources.
Gathering method citations.
++ Previously multi-finemapped results identified. Importing: /home/rstudio/echolocatoR/echolocatoR_will/RESULTS/GWAS/LID_COX/TRIM22/Multi-finemap/1KGphase3_LD.Multi-finemap.tsv.gz
+ Fine-mapping with 'FINEMAP, SUSIE' completed:

────────────────────────────────────────────────────────────────────────────────

── Step 5 ▶▶▶ Plot 📈 ──────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────
+-------- Locus Plot:  TRIM22 --------+
+ support_thresh = 2
+ Calculating mean Posterior Probability (mean.PP)...
+ 2 fine-mapping methods used.
+ 3 Credible Set SNPs identified.
+ 0 Consensus SNPs identified.
+ Filling NAs in CS cols with 0.
+ Filling NAs in PP cols with 0.
LD_matrix detected. Coloring SNPs by LD with lead SNP.
++ echoplot:: GWAS full window track
++ echoplot:: GWAS track
++ echoplot:: Merged fine-mapping track
Melting PP and CS from 3 fine-mapping methods.
++ echoplot:: Adding Gene model track.
Converting dat to GRanges object.
Loading required namespace: EnsDb.Hsapiens.v75
max_transcripts= 1 . 
82  transcripts from  82  genes returned.
Loading required namespace: pals
Fetching data...OK
Parsing exons...OK
Defining introns...OK
Defining UTRs...OK
Defining CDS...OK
aggregating...
Done
Constructing graphics...
echoannot:: Plotting ROADMAP annotations.
Converting dat to GRanges object.
+ ROADMAP:: 13 annotation(s) identified that match: brain
Querying subset from Roadmap API: E053 - 1/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Querying subset from Roadmap API: E054 - 2/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E053
Preexisting file detected. Set force_overwrite=TRUE to override this.
Querying subset from Roadmap API: E067 - 3/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E054
Preexisting file detected. Set force_overwrite=TRUE to override this.
Querying subset from Roadmap API: E068 - 4/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E067
Preexisting file detected. Set force_overwrite=TRUE to override this.
Downloading Roadmap Chromatin Marks: E068
Preexisting file detected. Set force_overwrite=TRUE to override this.
Querying subset from Roadmap API: E069 - 5/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Querying subset from Roadmap API: E070 - 6/13
Downloading Roadmap Chromatin Marks: E069
Constructing GRanges query using min/max ranges across one or more chromosomes.
Preexisting file detected. Set force_overwrite=TRUE to override this.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E070
Querying subset from Roadmap API: E071 - 7/13
Preexisting file detected. Set force_overwrite=TRUE to override this.
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Querying subset from Roadmap API: E072 - 8/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E071
Preexisting file detected. Set force_overwrite=TRUE to override this.
Querying subset from Roadmap API: E073 - 9/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E072
Preexisting file detected. Set force_overwrite=TRUE to override this.
Downloading Roadmap Chromatin Marks: E073
Preexisting file detected. Set force_overwrite=TRUE to override this.
Querying subset from Roadmap API: E074 - 10/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E074
Preexisting file detected. Set force_overwrite=TRUE to override this.
Querying subset from Roadmap API: E081 - 11/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Querying subset from Roadmap API: E082 - 12/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E081
Preexisting file detected. Set force_overwrite=TRUE to override this.
Querying subset from Roadmap API: E125 - 13/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E082
Preexisting file detected. Set force_overwrite=TRUE to override this.
Downloading Roadmap Chromatin Marks: E125
Preexisting file detected. Set force_overwrite=TRUE to override this.
Annotating chromatin states.
unable to find an inherited method for function 'mcols' for signature '"try-error"'Locus TRIM22 complete in: 1.66 min

────────────────────────────────────────────────────────────────────────────────

── Step 6 ▶▶▶ Postprocess data 🎁 ──────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────
Returning results as nested list.
All loci done in: 1.66 min
$TRIM22
NULL

$merged_dat
Null data.table (0 rows and 0 cols)

Warning message:
In parallel::mclapply(seq_len(length(eid_list)), function(i) { :
  all scheduled cores encountered errors in user code

Data

rstudio@3e36fec3eec9:~/echolocatoR/echolocatoR_will/RESULTS/GWAS/LID_COX/TRIM22$ head ../../../../QC_SNPs_COLMAP.txt
SNP     CHR     BP      A1      A2      MAF     BETA    SE      P       N       N_CAS   N_CON
rs3131972       1       752721  A       G       0.1806  0.07177 0.1482  0.6281  2696    588     2108
rs11240777      1       798959  A       G       0.2068  0.02904 0.1454  0.8417  2572    510     2062
rs28482280      1       834056  C       A       0.01188 -1.013  0.6109  0.09743 2610    568     2042
rs7518581       1       834956  A       G       0.01187 -1.02   0.6109  0.09504 2612    570     2042
rs149737509     1       837657  C       G       0.01343 -0.609  0.578   0.292   2606    560     2046
rs28678693      1       838665  C       T       0.01331 -0.7232 0.5261  0.1693  2630    580     2050
rs28477624      1       838732  A       G       0.01257 -0.6727 0.5515  0.2225  2626    578     2048
rs28437697      1       838890  G       A       0.01257 -0.6727 0.5515  0.2225  2626    578     2048
rs28539852      1       838916  T       A       0.0126  -0.6755 0.551   0.2202  2620    576     2044

3. Session info

> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] SNPlocs.Hsapiens.dbSNP155.GRCh37_0.99.22 SNPlocs.Hsapiens.dbSNP144.GRCh37_0.99.20 BSgenome_1.65.2                         
 [4] rtracklayer_1.57.0                       Biostrings_2.65.3                        XVector_0.37.1                          
 [7] GenomicRanges_1.49.1                     GenomeInfoDb_1.33.5                      IRanges_2.31.2                          
[10] S4Vectors_0.35.3                         BiocGenerics_0.43.1                      forcats_0.5.2                           
[13] stringr_1.4.1                            dplyr_1.0.10                             purrr_0.3.4                             
[16] readr_2.1.2                              tidyr_1.2.0                              tibble_3.1.8                            
[19] ggplot2_3.3.6                            tidyverse_1.3.2                          data.table_1.14.2                       
[22] echolocatoR_2.0.1                       

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.3              GGally_2.1.2                R.methodsS3_1.8.2           ragg_1.2.2                 
  [5] echoLD_0.99.7               bit64_4.0.5                 knitr_1.40                  irlba_2.3.5                
  [9] DelayedArray_0.23.1         R.utils_2.12.0              rpart_4.1.16                KEGGREST_1.37.3            
 [13] RCurl_1.98-1.8              AnnotationFilter_1.21.0     generics_0.1.3              GenomicFeatures_1.49.6     
 [17] RSQLite_2.2.16              proxy_0.4-27                bit_4.0.4                   tzdb_0.3.0                 
 [21] xml2_1.3.3                  lubridate_1.8.0             SummarizedExperiment_1.27.2 assertthat_0.2.1           
 [25] viridis_0.6.2               gargle_1.2.0                xfun_0.32                   hms_1.1.2                  
 [29] fansi_1.0.3                 restfulr_0.0.15             progress_1.2.2              dbplyr_2.2.1               
 [33] readxl_1.4.1                Rgraphviz_2.41.1            igraph_1.3.4                DBI_1.1.3                  
 [37] htmlwidgets_1.5.4           reshape_0.8.9               downloadR_0.99.4            googledrive_2.0.0          
 [41] ellipsis_0.3.2              backports_1.4.1             biomaRt_2.53.2              deldir_1.0-6               
 [45] MatrixGenerics_1.9.1        MungeSumstats_1.5.13        vctrs_0.4.1                 Biobase_2.57.1             
 [49] ensembldb_2.21.4            cachem_1.0.6                withr_2.5.0                 checkmate_2.1.0            
 [53] GenomicAlignments_1.33.1    prettyunits_1.1.1           cluster_2.1.3               ape_5.6-2                  
 [57] dir.expiry_1.5.0            lazyeval_0.2.2              crayon_1.5.1                basilisk.utils_1.9.2       
 [61] crul_1.2.0                  labeling_0.4.2              pkgconfig_2.0.3             nlme_3.1-159               
 [65] ProtGenerics_1.29.0         XGR_1.1.8                   gitcreds_0.1.1              pals_1.7                   
 [69] nnet_7.3-17                 rlang_1.0.5                 lifecycle_1.0.1             filelock_1.0.2             
 [73] httpcode_0.3.0              BiocFileCache_2.5.0         modelr_0.1.9                echotabix_0.99.8           
 [77] dichromat_2.0-0.1           cellranger_1.1.0            coloc_5.1.0                 matrixStats_0.62.0         
 [81] graph_1.75.0                Matrix_1.4-1                osfr_0.2.8                  boot_1.3-28                
 [85] reprex_2.0.2                base64enc_0.1-3             googlesheets4_1.0.1         png_0.1-7                  
 [89] viridisLite_0.4.1           rjson_0.2.21                rootSolve_1.8.2.3           bitops_1.0-7               
 [93] R.oo_1.25.0                 ggnetwork_0.5.10            blob_1.2.3                  mixsqp_0.3-43              
 [97] echoplot_0.99.5             dnet_1.1.7                  jpeg_0.1-9                  echodata_0.99.14           
[101] scales_1.2.1                memoise_2.0.1               magrittr_2.0.3              plyr_1.8.7                 
[105] hexbin_1.28.2               zlibbioc_1.43.0             compiler_4.2.0              echoconda_0.99.7           
[109] BiocIO_1.7.1                RColorBrewer_1.1-3          catalogueR_1.0.0            EnsDb.Hsapiens.v75_2.99.0  
[113] Rsamtools_2.13.4            cli_3.3.0                   echoannot_0.99.7            patchwork_1.1.2            
[117] htmlTable_2.4.1             Formula_1.2-4               MASS_7.3-58.1               tidyselect_1.1.2           
[121] stringi_1.7.8               textshaping_0.3.6           yaml_2.3.5                  supraHex_1.35.0            
[125] latticeExtra_0.6-30         ggrepel_0.9.1               grid_4.2.0                  VariantAnnotation_1.43.3   
[129] tools_4.2.0                 lmom_2.9                    parallel_4.2.0              rstudioapi_0.14            
[133] foreign_0.8-82              piggyback_0.1.3             gridExtra_2.3               gld_2.6.5                  
[137] farver_2.1.1                digest_0.6.29               snpStats_1.47.1             BiocManager_1.30.18        
[141] Rcpp_1.0.9                  broom_1.0.1                 OrganismDbi_1.39.1          httr_1.4.4                 
[145] AnnotationDbi_1.59.1        RCircos_1.2.2               ggbio_1.45.0                biovizBase_1.45.0          
[149] colorspace_2.0-3            rvest_1.0.3                 XML_3.99-0.10               fs_1.5.2                   
[153] reticulate_1.26             splines_4.2.0               RBGL_1.73.0                 expm_0.999-6               
[157] gh_1.3.0                    echofinemap_0.99.3          basilisk_1.9.2              Exact_3.1                  
[161] mapproj_1.2.8               systemfonts_1.0.4           jsonlite_1.8.0              susieR_0.12.27             
[165] R6_2.5.1                    Hmisc_4.7-1                 pillar_1.8.1                htmltools_0.5.3            
[169] glue_1.6.2                  fastmap_1.1.0               DT_0.24                     BiocParallel_1.31.12       
[173] class_7.3-20                codetools_0.2-18            maps_3.4.0                  mvtnorm_1.1-3              
[177] utf8_1.2.2                  lattice_0.20-45             curl_4.3.2                  DescTools_0.99.46          
[181] zip_2.2.0                   openxlsx_4.2.5              interp_1.1-3                survival_3.3-1             
[185] googleAuthR_2.0.0           munsell_0.5.0               e1071_1.7-11                GenomeInfoDbData_1.2.8     
[189] haven_2.5.1                 reshape2_1.4.4              gtable_0.3.1

Add ENFORMER annotations

Would be great to access all genome-wide ENFORMER predictions via API. This should be possible since the predictions are shared as h5 files here. They're rather massive (14-42Gb each) but that should be mitigated by the h5 database format.

Alternatively, could extract the predictions on-the-fly from the pre-trained model. Usage examples here. But @Al-Murphy has mentioned that the pre-trained model they provide in the paper is not actually the one they describe in the paper.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.