drostlab / mytai Goto Github PK

Evolutionary Transcriptomics with R

Home Page: https://drostlab.github.io/myTAI/

License: GNU General Public License v2.0

R 95.64% C++ 4.36%

evolutionary-transcriptomics r transcriptome evolution evo-devo studies-transcriptomes biological-processes gene-sets mytai gene-expression

mytai's People

Contributors

Stargazers

Watchers

Forkers

academiq flopezo maggishaggy ljljolinq1010 altingia inambioinfo cnyuanh anandksrao dylansosa kullrich lotharukpongjs gallardoalba amit4mchiba lavakin josuebarrera

mytai's Issues

p-value in `PlotSignature()`as subtitle

Is your feature request related to a problem? Please describe.
The p values (e.g. p_flt) from the PlotSignature() function are added via ggplot2::annotate() and don't always fit the screen when making a small figure.

data(PhyloExpressionSetExample)
p <- PlotSignature(ExpressionSet = PhyloExpressionSetExample, permutations = 100)
p

Describe the solution you'd like
Perhaps the p-value can be passed as the subtitle for the plot. This will remove the issue of the text being cut off when making small figures. Here is an example implementation (without the preexisting p-value label).

p + ggplot2::labs(subtitle = ggplot2::ggplot_build(p)$data[[3]][["label"]])
# ggplot2::ggplot_build(p)$data[[3]][["label"]] extracts the annotation from the ggplot2 object.

However, this removes the flexibility users might have when they want to customise the subtitle themselves.
On the other hand, it also makes sense for the p-value to be presented in the subtitle. When the size is too big, the size of the subtitle can be changed.

p + ggplot2::labs(subtitle = ggplot2::ggplot_build(p)$data[[3]][["label"]]) + 
  ggplot2::theme(plot.subtitle = ggplot2::element_text(size=18, face="italic"))

Whereas it is non-trivial to edit (or remove in the case above) annotations (ggplot2::annotate("text", ...)) post-facto.
https://stackoverflow.com/questions/20083700/how-to-have-annotated-text-style-to-inherit-from-theme-set-options

(Let me know if otherwise)

Describe alternatives you've considered
Move the coordinate of the annotation. However, this can also lead to a similar issue (1. the label may not fit the plot & 2. it is unclear [to me] how annotations can be modified post-facto).

Additional context
Lines implicated in https://github.com/drostlab/myTAI/blob/master/R/PlotSignature.R

myTAI/R/PlotSignature.R

Lines 295 to 302 in e8c12b1

 TI.ggplot <- 

 TI.ggplot + ggplot2::annotate( 

 "text", 

 x = 2, 

 y = max(TI$TI) + (max(TI$TI) / 30), 

 label = paste0("p_flt = ", pval), 

 size = 6 

 )

myTAI/R/PlotSignature.R

Lines 310 to 317 in e8c12b1

 TI.ggplot <- 

 TI.ggplot + ggplot2::annotate( 

 "text", 

 x = 2, 

 y = max(TI$TI) + (max(TI$TI) / 30), 

 label = paste0("p_rht = ", pval), 

 size = 6 

 )

myTAI/R/PlotSignature.R

Lines 338 to 345 in e8c12b1

 TI.ggplot <- 

 TI.ggplot + ggplot2::annotate( 

 "text", 

 x = 2, 

 y = max(TI$TI) + (max(TI$TI) / 30), 

 label = paste0("p_reverse_hourglass = ", pval), 

 size = 6 

 )

etc.

OpenMP for parallelisation with the Apple M1 chip

Describe the bug
The same speed-up achieved via parallelisation with the Intel chip for Mac doesn't work with the M1 chip. The difference in chip affects the README.md and the src/Makevars :

myTAI/README.md

Lines 35 to 37 in 699b78f

 brew install llvm libomp 

 cd /usr/local/lib 

 ln -s /usr/local/opt/libomp/lib/libomp.dylib ./libomp.dylib

myTAI/src/Makevars

Lines 1 to 13 in 699b78f

 # Disable long types from C99 or CPP11 extensions 

 PKG_CPPFLAGS = -I../src -DRCPP_DEFAULT_INCLUDE_CALL=false -DCOMPILING_MYTAI -DBOOST_NO_INT64_T -DBOOST_NO_INTEGRAL_INT64_T -DBOOST_NO_LONG_LONG -DRCPP_USING_UTF8_ERROR_STRING -DRCPP_USE_UNWIND_PROTECT ${MYTAI_COMPILER_FLAGS} 

 OPENMP_SUPPORTED := $(shell $(CC) -fopenmp -dM -E - < /dev/null 2>&1 | grep -c "openmp") 

 LIBOMP_SUPPORTED := $(shell [ -d /usr/local/opt/libomp/include ] && echo 1) 

 ifeq ($(OPENMP_SUPPORTED),1) 

 ifeq ($(LIBOMP_SUPPORTED),1) 

 PKG_CPPFLAGS += -I/usr/local/opt/libomp/include 

 LDFLAGS=-L/usr/local/opt/libomp/lib 

 PKG_CXXFLAGS += -Xpreprocessor -fopenmp 

 PKG_LIBS += -lomp 

 endif 

 endif

With the M1 chip, /usr/local/opt/libomp/lib/libomp.dylib, /usr/local/opt/libomp/include and /usr/local/opt/libomp/lib do not exist.

$ ls /usr/local/opt/libomp/lib/libomp.dylib ls: /usr/local/opt/libomp/lib/libomp.dylib: No such file or directory

Instead the homologous locations are probably:

/usr/local/opt/libomp/lib/libomp.dylib -> /opt/homebrew/opt/libomp/lib/libomp.dylib
/usr/local/opt/libomp/include -> /opt/homebrew/opt/libomp/include
/usr/local/opt/libomp/lib -> /opt/homebrew/opt/libomp/lib

In an attempt so solve it, I installed the libraries via brew (arch -arm64 brew reinstall libomp) and changed the locations in the src/Makevars to correspond to the messages in the brew installation:

For compilers to find libomp you may need to set:
  export LDFLAGS="-L/opt/homebrew/opt/libomp/lib"
  export CPPFLAGS="-I/opt/homebrew/opt/libomp/include"

Thus for src/Makevars:

# Disable long types from C99 or CPP11 extensions
PKG_CPPFLAGS = -I../src -DRCPP_DEFAULT_INCLUDE_CALL=false -DCOMPILING_MYTAI -DBOOST_NO_INT64_T -DBOOST_NO_INTEGRAL_INT64_T -DBOOST_NO_LONG_LONG -DRCPP_USING_UTF8_ERROR_STRING -DRCPP_USE_UNWIND_PROTECT ${MYTAI_COMPILER_FLAGS}

OPENMP_SUPPORTED := $(shell $(CC) -fopenmp -dM -E - < /dev/null 2>&1 | grep -c "openmp")
LIBOMP_SUPPORTED := $(shell [ -d /opt/homebrew/opt/libomp/include ] && echo 1)
ifeq ($(OPENMP_SUPPORTED),1)
 ifeq ($(LIBOMP_SUPPORTED),1)
	PKG_CPPFLAGS += -I/opt/homebrew/opt/libomp/include
	LDFLAGS=-L/opt/homebrew/opt/libomp/lib
	PKG_CXXFLAGS += -Xpreprocessor -fopenmp
	PKG_LIBS += -lomp
 endif
endif

I also added the symlink as suggested in the README.md, with modifications I though were appropriate.

$ cd /usr/local/lib
$ ln -s /opt/homebrew/opt/libomp/lib/libomp.dylib ./libomp.dylib
$ ls -l libomp.dylib
lrwxr-xr-x  1 root  wheel  43 Aug 18 11:08 libomp.dylib -> /opt/homebrew/lib/gcc/current/libgomp.dylib

I then ran roxygen2::roxygenise(), which gave me the error at the end

─  DONE (myTAI)
Error in dyn.load(dll_copy_file) : 
  unable to load shared object '/var/folders/p0/9gxqqj352q50zc6ssdncrhv80007sq/T//RtmpNEFRUA/pkgload770229cb13/myTAI.so':
  dlopen(/var/folders/p0/9gxqqj352q50zc6ssdncrhv80007sq/T//RtmpNEFRUA/pkgload770229cb13/myTAI.so, 0x0006): symbol not found in flat namespace '___kmpc_barrier'

Is there a way to resolve this?

Expected behaviour
The same speed-up achieved via parallelisation with the Intel chip for Mac works with the M1 chip

Session info:

> utils::sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Ventura 13.4.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] testthat_3.1.10

loaded via a namespace (and not attached):
 [1] fs_1.6.3            usethis_2.2.0       devtools_2.4.5      doParallel_1.0.17  
 [5] RColorBrewer_1.1-3  rprojroot_2.0.3     tools_4.2.2         profvis_0.3.8      
 [9] backports_1.4.1     utf8_1.2.3          R6_2.5.1            nortest_1.0-4      
[13] colorspace_2.1-0    urlchecker_1.0.1    withr_2.5.0         tidyselect_1.2.0   
[17] gridExtra_2.3       prettyunits_1.1.1   processx_3.8.2      myTAI_1.0.1.9000   
[21] compiler_4.2.2      cli_3.6.1           xml2_1.3.4          desc_1.4.2         
[25] scales_1.2.1        readr_2.1.4         callr_3.7.3         stringr_1.5.0      
[29] digest_0.6.33       pkgconfig_2.0.3     htmltools_0.5.5     sessioninfo_1.2.2  
[33] fastmap_1.1.1       htmlwidgets_1.6.2   rlang_1.1.1         rstudioapi_0.14    
[37] shiny_1.7.4         generics_0.1.3      farver_2.1.1        dplyr_1.1.2        
[41] car_3.1-2           magrittr_2.0.3      Matrix_1.6-0        Rcpp_1.0.11        
[45] munsell_0.5.0       fansi_1.0.4         abind_1.4-5         lifecycle_1.0.3    
[49] stringi_1.7.12      carData_3.0-5       MASS_7.3-60         decor_1.0.1        
[53] brio_1.1.3          pkgbuild_1.4.0      plyr_1.8.8          grid_4.2.2         
[57] parallel_4.2.2      promises_1.2.0.1    crayon_1.5.2        miniUI_0.1.1.1     
[61] lattice_0.21-8      cowplot_1.1.1       splines_4.2.2       hms_1.1.3          
[65] knitr_1.43          ps_1.7.5            pillar_1.9.0        ggpubr_0.6.0       
[69] ggsignif_0.6.4      reshape2_1.4.4      codetools_0.2-19    pkgload_1.3.2.1    
[73] glue_1.6.2          remotes_2.4.2       vctrs_0.6.3         tzdb_0.4.0         
[77] httpuv_1.6.11       foreach_1.5.2       gtable_0.3.3        purrr_1.0.2        
[81] tidyr_1.3.0         cachem_1.0.8        ggplot2_3.4.3       cpp11_0.4.6        
[85] xfun_0.39           mime_0.12           xtable_1.8-4        broom_1.0.5        
[89] roxygen2_7.2.3      rstatix_0.7.2       later_1.3.1         survival_3.5-5     
[93] tibble_3.2.1        iterators_1.0.14    memoise_2.0.1       fitdistrplus_1.1-11
[97] ellipsis_0.3.2

TAI for transcriptomes in different conditions

Hi,

I know TAI is used to estimate transcriptome age in different stages of development. If it also can be used to estimate age in different conditions, for example in drought, cold. I sequenced 16 samples (8 control sample and 8 drought sample), I calculated TAI in 16 samples, and got perfect results. So, I just want to ask you if TAI is also suitable to evaluate age of different transcriptomes in different conditions.

Thanks,

Kai

`CollapseReplicates()` always returns `Phylostratum` as the first column.

Describe the bug
CollapseReplicates() returns Divergence.stratum as Phylostrata.

To Reproduce

> data("DivergenceExpressionSetExample")
> CollapseReplicates(ExpressionSet = DivergenceExpressionSetExample[1:5,1:9], 
+                    nrep          = c(2,2,3), 
+                    FUN           = mean, 
+                    stage.names   = c("S1","S2","S3"))
# A tibble: 5 × 5
  Phylostratum GeneID         S1    S2    S3
         <int> <fct>       <dbl> <dbl> <dbl>
1            1 at1g01050.1 1659. 1615. 1228.
2            1 at1g01120.1  816.  896.  869.
3            1 at1g01140.3  975. 1018.  997.
4            1 at1g01170.1 1202. 1219. 4824.
5            1 at1g01230.1  920.  949.  836.

Yet the input DivergenceExpressionSetExample has the first column Divergence.stratum.

> head(DivergenceExpressionSetExample)
  Divergence.stratum      GeneID    Zygote  Quadrant  Globular     Heart   Torpedo      Bent     Mature
1                  1 at1g01050.1 1501.0141 1817.3086 1665.3089 1564.7612 1496.3207 1114.6435  1071.6555
2                  1 at1g01120.1  844.0414  787.5929  859.6267  931.6180  942.8453  870.2625   792.7542
3                  1 at1g01140.3 1041.4291  908.3929 1068.8832  967.7490 1055.1901 1109.4662   825.4633
4                  1 at1g01170.1 1361.6646 1042.1991 1225.5625 1211.7386 1674.5224 2136.4284 10662.4763
5                  1 at1g01230.1  894.1276  946.6993  933.0931  965.1859  870.9218  843.1814   794.6536
6                  1 at1g01540.2 1464.3065 1451.4255 2378.7054 1993.9326 1800.2420 2119.9220  1020.2640

Expected behaviour
The column name used as the first column in the dataset that satisfies myTAI::is.ExpressionSet() should be returned.

Session info:

Please note session info in R

> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Ventura 13.0.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.38.3               SummarizedExperiment_1.28.0 Biobase_2.58.0              MatrixGenerics_1.10.0      
 [5] matrixStats_1.0.0           GenomicRanges_1.50.2        GenomeInfoDb_1.34.9         IRanges_2.32.0             
 [9] S4Vectors_0.36.2            BiocGenerics_0.44.0         myTAI_1.0.1.9000            lubridate_1.9.2            
[13] forcats_1.0.0               stringr_1.5.0               dplyr_1.1.2                 purrr_1.0.2                
[17] readr_2.1.4                 tidyr_1.3.0                 tibble_3.2.1                ggplot2_3.4.2              
[21] tidyverse_2.0.0            

loaded via a namespace (and not attached):
  [1] colorspace_2.1-0       ellipsis_0.3.2         rprojroot_2.0.3        XVector_0.38.0         fs_1.6.3              
  [6] rstudioapi_0.15.0      farver_2.1.1           remotes_2.4.2.1        bit64_4.0.5            AnnotationDbi_1.60.2  
 [11] fansi_1.0.4            codetools_0.2-19       splines_4.2.2          cachem_1.0.8           geneplotter_1.76.0    
 [16] knitr_1.43             pkgload_1.3.2.1        annotate_1.76.0        png_0.1-8              shiny_1.7.4.1         
 [21] compiler_4.2.2         httr_1.4.6             Matrix_1.5-4.1         fastmap_1.1.1          cli_3.6.1             
 [26] later_1.3.1            htmltools_0.5.5        prettyunits_1.1.1      tools_4.2.2            gtable_0.3.3          
 [31] glue_1.6.2             GenomeInfoDbData_1.2.9 Rcpp_1.0.11            Biostrings_2.66.0      vctrs_0.6.3           
 [36] iterators_1.0.14       xfun_0.40              ps_1.7.5               timechange_0.2.0       mime_0.12             
 [41] miniUI_0.1.1.1         lifecycle_1.0.3        devtools_2.4.5         XML_3.99-0.14          MASS_7.3-60           
 [46] zlibbioc_1.44.0        scales_1.2.1           vroom_1.6.3            hms_1.1.3              promises_1.2.1        
 [51] parallel_4.2.2         RColorBrewer_1.1-3     yaml_2.3.7             curl_5.0.1             memoise_2.0.1         
 [56] see_0.8.0              stringi_1.7.12         RSQLite_2.3.1          desc_1.4.2             foreach_1.5.2         
 [61] pkgbuild_1.4.2         BiocParallel_1.32.6    rlang_1.1.1            pkgconfig_2.0.3        bitops_1.0-7          
 [66] evaluate_0.21          lattice_0.21-8         htmlwidgets_1.6.2      labeling_0.4.2         bit_4.0.5             
 [71] processx_3.8.2         tidyselect_1.2.0       ggsci_3.0.0            magrittr_2.0.3         R6_2.5.1              
 [76] generics_0.1.3         profvis_0.3.8          DelayedArray_0.24.0    DBI_1.1.3              pillar_1.9.0          
 [81] withr_2.5.0            fitdistrplus_1.1-11    survival_3.5-5         KEGGREST_1.38.0        RCurl_1.98-1.12       
 [86] crayon_1.5.2           utf8_1.2.3             tzdb_0.4.0             rmarkdown_2.23         urlchecker_1.0.1      
 [91] usethis_2.2.2          locfit_1.5-9.8         grid_4.2.2             blob_1.2.4             callr_3.7.3           
 [96] digest_0.6.33          xtable_1.8-4           httpuv_1.6.11          munsell_0.5.0          sessioninfo_1.2.2

Retriving parent taxonomy for specific Taxonomy ID

Do you know if is possible for an specific TAX ID retrieve the corresponding parent?. It is clear how to retrieve for an specific parent all the children but not the way around.

Suggestion

Thanks for the package! works great.

I think an option avoiding the "which row do you want" when there are more than 2 matches (I had that with "Camel" and the Bactrian and dromedary ones. For example, if more than one row get the first one or assign a "MOre than one option" fill.

I get a loop stopped because of it.

Completely constructive mood. Thanks

Motivation - method bias

Hi,

I can understand that due to co-author binding you change the motivation to state only GenERA as gene-age classifier in first place.

However, in my opinion this is not valid scientific practise. In earlier versions the whole bunch of different software tools were mentioned which all produce a gene-age map to be used with myTAI. Now these can be found still here:
https://drostlab.github.io/myTAI/articles/Introduction.html#retrieval-of-phylogenetic-or-taxonomic-information

Since myTAI is not restricted to GenERA gene age maps, please change accordingly.

E.g. recently I extracted gene age maps for the whole eggnogg6 and plaza database, should we now mention all 5000x species as pre-calculated phylomaps?

Best regards

Kristian

What's the real mean of expression data/level

Dear Dr. Hajk-Georg Drost
Great work on the integration of gene evolution and transcriptome. However, I‘m confused about what's expression data, TPM, FPKM, or just read counts?
Many thanks for your help!
Yours sincerely,
Lei Chen

using an API key

Is there a way to add an API key to searches with myTAI? I am searching a large number of entries and not having an API key associated with my search is killing my script.

FlatLineTest `Error in gamma_MME[[3]] : subscript out of bounds. `

Describe the bug
When running the FlatLineTest with a high number of permutations (>50 000), an error Error in gamma_MME[[3]] : subscript out of bounds. is returned. To my knowledge, this is not the case for the other permutation tests.

To Reproduce
I first detected this issue with the tfStability() function when ran with the default test, TestStatistic = "FlatLineTest". One hundred thousand permutations were specified, i.e. permutations = 100000.
Using the PhyloExpressionSetExample,

> tfStability(
+     PhyloExpressionSetExample,
+     transforms = c("log2", "rank"),
+     permutations = 100000
+ )
Proceeding with the FlatLineTest

[ Number of Eigen threads that are employed on your machine: 8 ]

[ Computing age assignment permutations for test statistic ... ]
[=========================================] 100%    69.2%   
[ Computing variances of permuted transcriptome signatures ... ]


Total runtime of your permutation test: 7.252  seconds.Error in gamma_MME[[3]] : subscript out of bounds

Then, I checked the FlatLineTest().

> FlatLineTest(myTAI::tf(PhyloExpressionSetExample, FUN = log2), permutations = 100000)

[ Number of Eigen threads that are employed on your machine: 8 ]

[ Computing age assignment permutations for test statistic ... ]
[=========================================] 100%   .8%   
[ Computing variances of permuted transcriptome signatures ... ]


Total runtime of your permutation test: 6.979  seconds.Error in gamma_MME[[3]] : subscript out of bounds

Meanwhile, this behaviour doesn't happen with a lower (but still high) number of permutations (permutations = 20000).

> FlatLineTest(myTAI::tf(PhyloExpressionSetExample, FUN = log2), permutations = 20000)

[ Number of Eigen threads that are employed on your machine: 8 ]

[ Computing age assignment permutations for test statistic ... ]
[=========================================] 100%   
[ Computing variances of permuted transcriptome signatures ... ]


Total runtime of your permutation test: 15.191  seconds.$p.value
[1] 2.192776e-33

$std.dev
[1] 0.05381242 0.05381653 0.05382267 0.05382233 0.05381785 0.05382214 0.05382146

$ks.test

	Asymptotic one-sample Kolmogorov-Smirnov test

data:  filtered_vars
D = 0.020072, p-value = 2.391e-07
alternative hypothesis: two-sided

Expected behaviour
I would have expected the test to proceed ahead with 100,000 permutations just as it does with 20,000.

Screenshots or code
I think issue relates to:

myTAI/R/FlatLineTest.R

Lines 151 to 156 in 3168e75

 gamma_MME = GetGamma(var_values, permutations) 

 ### estimate shape: 

 shape <- gamma_MME[[1]] 

 ### estimate the rate: 

 rate <- gamma_MME[[2]] 

 ks_test = gamma_MME[[3]]

Session info:

Please note session info in R

> devtools::session_info()
─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.2 (2022-10-31)
 os       macOS Ventura 13.0.1
 system   x86_64, darwin17.0
 ui       RStudio
 language (EN)
 collate  en_GB.UTF-8
 ctype    en_GB.UTF-8
 tz       Europe/Berlin
 date     2023-07-25
 rstudio  2022.07.2+576 Spotted Wakerobin (desktop)
 pandoc   2.19.2 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/tools/ (via rmarkdown)

─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────
 package      * version    date (UTC) lib source
 bit            4.0.5      2022-11-15 [1] CRAN (R 4.2.0)
 bit64          4.0.5      2020-08-30 [1] CRAN (R 4.2.0)
 cachem         1.0.8      2023-05-01 [1] CRAN (R 4.2.0)
 callr          3.7.3      2022-11-02 [1] CRAN (R 4.2.0)
 cli            3.6.1      2023-03-23 [1] CRAN (R 4.2.0)
 codetools      0.2-19     2023-02-01 [1] CRAN (R 4.2.0)
 colorspace     2.1-0      2023-01-23 [1] CRAN (R 4.2.0)
 crayon         1.5.2      2022-09-29 [1] CRAN (R 4.2.0)
 devtools       2.4.5      2022-10-11 [1] CRAN (R 4.2.0)
 digest         0.6.33     2023-07-07 [1] CRAN (R 4.2.0)
 dplyr        * 1.1.2      2023-04-20 [1] CRAN (R 4.2.0)
 ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
 evaluate       0.21       2023-05-05 [1] CRAN (R 4.2.0)
 fansi          1.0.4      2023-01-22 [1] CRAN (R 4.2.0)
 farver         2.1.1      2022-07-06 [1] CRAN (R 4.2.0)
 fastmap        1.1.1      2023-02-24 [1] CRAN (R 4.2.0)
 fitdistrplus   1.1-11     2023-04-25 [1] CRAN (R 4.2.0)
 forcats      * 1.0.0      2023-01-29 [1] CRAN (R 4.2.0)
 foreach        1.5.2      2022-02-02 [1] CRAN (R 4.2.0)
 fs             1.6.3      2023-07-20 [1] CRAN (R 4.2.0)
 generics       0.1.3      2022-07-05 [1] CRAN (R 4.2.0)
 ggplot2      * 3.4.2      2023-04-03 [1] CRAN (R 4.2.0)
 ggsci          3.0.0      2023-03-08 [1] CRAN (R 4.2.0)
 glue           1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
 gtable         0.3.3      2023-03-21 [1] CRAN (R 4.2.0)
 hms            1.1.3      2023-03-21 [1] CRAN (R 4.2.0)
 htmltools      0.5.5      2023-03-23 [1] CRAN (R 4.2.0)
 htmlwidgets    1.6.2      2023-03-17 [1] CRAN (R 4.2.0)
 httpuv         1.6.11     2023-05-11 [1] CRAN (R 4.2.2)
 iterators      1.0.14     2022-02-05 [1] CRAN (R 4.2.0)
 knitr          1.43       2023-05-25 [1] CRAN (R 4.2.2)
 labeling       0.4.2      2020-10-20 [1] CRAN (R 4.2.0)
 later          1.3.1      2023-05-02 [1] CRAN (R 4.2.2)
 lattice        0.21-8     2023-04-05 [1] CRAN (R 4.2.0)
 lifecycle      1.0.3      2022-10-07 [1] CRAN (R 4.2.0)
 lubridate    * 1.9.2      2023-02-10 [1] CRAN (R 4.2.0)
 magrittr       2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
 MASS           7.3-60     2023-05-04 [1] CRAN (R 4.2.2)
 Matrix         1.5-4.1    2023-05-18 [1] CRAN (R 4.2.0)
 memoise        2.0.1      2021-11-26 [1] CRAN (R 4.2.0)
 mime           0.12       2021-09-28 [1] CRAN (R 4.2.0)
 miniUI         0.1.1.1    2018-05-18 [1] CRAN (R 4.2.0)
 munsell        0.5.0      2018-06-12 [1] CRAN (R 4.2.0)
 myTAI        * 1.0.1.9000 2023-07-25 [1] Github (drostlab/myTAI@3168e75)
 pillar         1.9.0      2023-03-22 [1] CRAN (R 4.2.0)
 pkgbuild       1.4.2      2023-06-26 [1] CRAN (R 4.2.0)
 pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
 pkgload        1.3.2.1    2023-07-08 [1] CRAN (R 4.2.0)
 prettyunits    1.1.1      2020-01-24 [1] CRAN (R 4.2.0)
 processx       3.8.2      2023-06-30 [1] CRAN (R 4.2.0)
 profvis        0.3.8      2023-05-02 [1] CRAN (R 4.2.0)
 promises       1.2.0.1    2021-02-11 [1] CRAN (R 4.2.0)
 ps             1.7.5      2023-04-18 [1] CRAN (R 4.2.0)
 purrr        * 1.0.1      2023-01-10 [1] CRAN (R 4.2.0)
 R6             2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
 Rcpp           1.0.11     2023-07-06 [1] CRAN (R 4.2.0)
 readr        * 2.1.4      2023-02-10 [1] CRAN (R 4.2.0)
 remotes        2.4.2.1    2023-07-18 [1] CRAN (R 4.2.2)
 rlang          1.1.1      2023-04-28 [1] CRAN (R 4.2.0)
 rmarkdown      2.23       2023-07-01 [1] CRAN (R 4.2.0)
 rstudioapi     0.15.0     2023-07-07 [1] CRAN (R 4.2.0)
 scales         1.2.1      2022-08-20 [1] CRAN (R 4.2.0)
 sessioninfo    1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
 shiny          1.7.4.1    2023-07-06 [1] CRAN (R 4.2.0)
 stringi        1.7.12     2023-01-11 [1] CRAN (R 4.2.0)
 stringr      * 1.5.0      2022-12-02 [1] CRAN (R 4.2.0)
 survival       3.5-5      2023-03-12 [1] CRAN (R 4.2.0)
 tibble       * 3.2.1      2023-03-20 [1] CRAN (R 4.2.0)
 tidyr        * 1.3.0      2023-01-24 [1] CRAN (R 4.2.0)
 tidyselect     1.2.0      2022-10-10 [1] CRAN (R 4.2.0)
 tidyverse    * 2.0.0      2023-02-22 [1] CRAN (R 4.2.0)
 timechange     0.2.0      2023-01-11 [1] CRAN (R 4.2.0)
 tzdb           0.4.0      2023-05-12 [1] CRAN (R 4.2.2)
 urlchecker     1.0.1      2021-11-30 [1] CRAN (R 4.2.0)
 usethis        2.2.2      2023-07-06 [1] CRAN (R 4.2.0)
 utf8           1.2.3      2023-01-31 [1] CRAN (R 4.2.0)
 vctrs          0.6.3      2023-06-14 [1] CRAN (R 4.2.0)
 vroom          1.6.3      2023-04-28 [1] CRAN (R 4.2.0)
 withr          2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
 xfun           0.39       2023-04-20 [1] CRAN (R 4.2.0)
 xtable         1.8-4      2019-04-21 [1] CRAN (R 4.2.0)
 yaml           2.3.7      2023-01-23 [1] CRAN (R 4.2.0)

 [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library

Additional note:
The total runtime report is wrong, in the condition when this error happens - these tests took much longer than the reported runtime. The reported runtime was correct (the moment it was printed on the screen) with permutations = 20000. By the way, I really like the progress bar.

Defining Divergence Stratum

Dear Dr. Drost,

I am writing to ask how to define Divergence Stratum. I am using the myTAI to calculate TDI of chili pepper genes. Ka and Ks values of each gene has been prepared using tomato genes as reference. But I don't know how to define Divergence Stratum. In your example, your assigned a value (1-10) into each gene (Row 1). Could you let me know how to define them? Thank you very much.

~Xiuxu

Citations in the DESCRIPTION file

Hi Hajk,

From the conda recipe file of #13 , I noticed that the citations in the package summary can be improved, i.e.

  summary: Investigate the evolution of biological processes by capturing evolutionary signatures in transcriptomes (Drost et al. (2017) <doi:10.1093/bioinformatics/btx835>). The aim of this tool is to provide a transcriptome analysis environment to quantify the average evolutionary age of genes contributing to a transcriptome of interest (Drost et al. (2016) <doi:10.1101/051565>).

This summary was most likely generated automatically from this line in the DESCRIPTION file.

myTAI/DESCRIPTION

Line 15 in 8e92367

 Description: Investigate the evolution of biological processes by capturing evolutionary signatures in transcriptomes (Drost et al. (2017) <doi:10.1093/bioinformatics/btx835>). The aim of this tool is to provide a transcriptome analysis environment to quantify the average evolutionary age of genes contributing to a transcriptome of interest (Drost et al. (2016) <doi:10.1101/051565>). 

Perhaps the reference to Drost et al (2017) can be changed to Drost et al (2018). Furthermore, I am unsure whether citation for the BioRxiv article (Drost et al. (2016) doi:10.1101/051565) is needed.

Let me know what you think.

Best,
Sodai

Conda package

Hi,

I'm trying to create the Bioconda package, but not sure why it not working. Have you experience on that? Here is the PR bioconda/bioconda-recipes#41628

Regards

Significance status of signature: NA

Dear Dr. Hajk-Georg Drost,
Let me thank you for the awesome and very important myTAI library. In my project, I used myTAI to determine the transcriptome age indices of the life cycle stages of different flatworm species. When considering the complete transcriptomes, no problems arose, and very interesting results were obtained. However we are faced with the NaNs produced warning during PlotSignature when analyzing the “reduced” dataset, including only genes of last common ancestor of the studied species:

Plot signature: ' TAI ' and test statistic: ' FlatLineTest ' running 1000 permutations.
$start.arg
$start.arg$shape
[1] Inf
$start.arg$rate
[1] Inf
$fix.arg
NULL
Significance status of signature: NA
Warning:
1: In dgamma(c(0.000122768450181325, 0.000122768450181325, 0.000122768450181325, :
NaN produced
2: In stats::pgamma(real.var, shape = shape, rate = rate, lower.tail = FALSE) :
NaN produced

Used code:

library(edgeR)
library(myTAI)
library(dplyr)
library(phylotools)

fgig_phylostratr <- read.csv2("../../Phylostratr/Fgigantica/Fgigantica_phylostratr_results.tsv", sep="\t", header = T)
fgig_digenea_ancestor <- get.fasta.name(infile="../../Ancestral_pyHAM/Fgigantica_100aa.Digenea_ancestor.fasta")
fgig_expr <- read.csv2("../../Expr_quant/Fgig_decontaminated_salmon_quant/Fgigantica_100aa_unaveraged_TPMs.tsv", sep="\t", header = T)

fgig_exp_filtered <- subset(fgig_expr, GeneIDs %in% fgig_digenea_ancestor)
fgig_exp_filtered_mut <- mutate_all(fgig_exp_filtered[, 1:length(colnames(fgig_exp_filtered))], function(x) as.numeric(as.character(x)))
fgig_exp_filtered_mut$GeneIDs <- fgig_exp_filtered$GeneIDs
fgig_exp_filtered_mut_mean <- data.frame("GeneIDs"=fgig_exp_filtered_mut$GeneIDs, "Egg"=rowMeans(fgig_exp_filtered_mut[, c("Fgig_egg_rep1", "Fgig_egg_rep2", "Fgig_egg_rep3")]),
"Miracidium"=rowMeans(fgig_exp_filtered_mut[, c("Fgig_miracidium_rep1", "Fgig_miracidium_rep2", "Fgig_miracidium_rep3")]),
"Redia"=rowMeans(fgig_exp_filtered_mut[, c("Fgig_redia_rep1", "Fgig_redia_rep2", "Fgig_redia_rep3")]),
"Cercaria"=rowMeans(fgig_exp_filtered_mut[, c("Fgig_cerc_rep1", "Fgig_cerc_rep2", "Fgig_cerc_rep3")]),
"Metacercaria"=rowMeans(fgig_exp_filtered_mut[, c("Fgig_metacerc_rep1", "Fgig_metacerc_rep2", "Fgig_metacerc_rep3")]),
"Juvenile_42_days"=rowMeans(fgig_exp_filtered_mut[, c("Fgig_juv_42d_rep1", "Fgig_juv_42d_rep2", "Fgig_juv_42d_rep3")]),
"Juvenile_70_days"=rowMeans(fgig_exp_filtered_mut[, c("Fgig_juv_70d_rep1", "Fgig_juv_70d_rep2", "Fgig_juv_70d_rep3")]),
"Adult"=rowMeans(fgig_exp_filtered_mut[, c("Fgig_adult_rep1", "Fgig_adult_rep2", "Fgig_adult_rep3")]))
fgig_phylostratr_filtered <- subset(fgig_phylostratr, qseqid %in% fgig_digenea_ancestor)

colnames(fgig_phylostratr_filtered) <- c("GeneIDs", "MRCA", "Phylostratum", "MRCA_name")
fgig_phylomap <- select(fgig_phylostratr_filtered, "Phylostratum", "GeneIDs")
fgig_phyloexp <- MatchMap(fgig_phylomap, fgig_exp_filtered_mut_mean)

fgig_phyloexp_tf <- tf(fgig_phyloexp, function(x) log2(x+1))
pdf("Fgigantica_logTPMs_TAI.digenea_ancestor.pdf", width = 14)
PlotSignature(ExpressionSet = fgig_phyloexp_tf, measure = "TAI",
TestStatistic = "FlatLineTest", xlab="F.gigantica complex life cycle", ylab="TAI: Digenean ancestor genome model", permutations = 1000)
dev.off()

As a result, on the created plot we see “p_flt = NaN”. I guess is that there is not enough observation count in fgig_phyloexp_tf (3772 observations) to run the tests. Is this possible or is there another explanation or assumption? I would be grateful for any help!
Thanks a lot!
Yours sincerely,
Maksim

LICENSE file missing

I couldn't find a License file. Can you please add one to the repo?

Thanks!

Phylostratum of other organism

First of all congratulations for a very vivid explanation of every single step. Now the problem. I am trying to generate phylostraum for Oryza sativa. I have modified the headers as suggested with following.
[Oryza sativa] | [Eukaryota; Virdiplantae; Streptophyta; Streptophytina; Embryophyta; Tracheophyta; Euphyllophyta; Spermatophyta; Magnoliophyta; Liliopsida, core Liliopsida; commelinids, Poales; Poaceae; Oryzae; Oryza].

But somehow the perl script does not assign these genes to any of the strata. Arabidopsis file works fine.

Could you please suggest something?

	TI.ggplot <-
	TI.ggplot + ggplot2::annotate(
	"text",
	x = 2,
	y = max(TI$TI) + (max(TI$TI) / 30),
	label = paste0("p_flt = ", pval),
	size = 6
	)

	brew install llvm libomp
	cd /usr/local/lib
	ln -s /usr/local/opt/libomp/lib/libomp.dylib ./libomp.dylib

	# Disable long types from C99 or CPP11 extensions
	PKG_CPPFLAGS = -I../src -DRCPP_DEFAULT_INCLUDE_CALL=false -DCOMPILING_MYTAI -DBOOST_NO_INT64_T -DBOOST_NO_INTEGRAL_INT64_T -DBOOST_NO_LONG_LONG -DRCPP_USING_UTF8_ERROR_STRING -DRCPP_USE_UNWIND_PROTECT ${MYTAI_COMPILER_FLAGS}

	OPENMP_SUPPORTED := $(shell $(CC) -fopenmp -dM -E - < /dev/null 2>&1 \| grep -c "openmp")
	LIBOMP_SUPPORTED := $(shell [ -d /usr/local/opt/libomp/include ] && echo 1)
	ifeq ($(OPENMP_SUPPORTED),1)
	ifeq ($(LIBOMP_SUPPORTED),1)
	PKG_CPPFLAGS += -I/usr/local/opt/libomp/include
	LDFLAGS=-L/usr/local/opt/libomp/lib
	PKG_CXXFLAGS += -Xpreprocessor -fopenmp
	PKG_LIBS += -lomp
	endif
	endif

	gamma_MME = GetGamma(var_values, permutations)
	### estimate shape:
	shape <- gamma_MME[[1]]
	### estimate the rate:
	rate <- gamma_MME[[2]]
	ks_test = gamma_MME[[3]]

drostlab / mytai Goto Github PK

mytai's People

Contributors

Stargazers

Watchers

Forkers

mytai's Issues

Recommend Projects

Recommend Topics

Recommend Org