Git Product home page Git Product logo

tbasco's Issues

Add association rules

Some kind of enhancement where parameters support and confidence are automatically set would be a nice addition

Parse KO database

As exemplified by the pyruvate oxidation pathways, which has many alternative sub-modules that may result in the same function, care needs to be taken when determining whether a module is complete. However, there is unlikely to be a single solution, as there are 4 different types of modules:

  1. pathway modules, 2) structural complexes, 3) functional sets, 4) signature modulse.

http://www.kegg.jp/kegg/module.html

Currently there is a hard filter for 75% 'complete' for a module. Completeness is determined by the percentage of total genes contained within a module. However, not all genes in a module are necessary for the function, and therefore this filter is too strict.

  1. For pathway modules, one simple possibility is to check the number of reactions in the pathway that are filled using the Reaction Modules database, and then filter based on the presence of 75% of reactions.

For other modules types, such as the structural complexes, perhaps we can continue using the 75% cut-off exists at the moment - however we can revisit this issue in the future

typos in Utility.R

Lines 256-269 are missing closing parentheses:

  random.genes.hexb <- hexbin(bkgd.individual.Zscores$zscores$`Random Genes`$PC,
    bkgd.individual.Zscores$zscores$`Random Genes`$NRED,
    ybnds = c(min(all_scores_y), max(all_scores_y)),
    xbnds = c(min(all_scores_x), max(all_scores_x)))

  random.annotated.genes.hexb <- hexbin(bkgd.individual.Zscores$zscores$`Random Annotated Genes`$PC,
    bkgd.individual.Zscores$zscores$`Random Annotated Genes`$NRED,
    ybnds = c(min(all_scores_y), max(all_scores_y)),
    xbnds = c(min(all_scores_x), max(all_scores_x)))

  random.identical.annotated.genes.hexb <- hexbin(bkgd.individual.Zscores$zscores$`Genes with the same annotation`$PC,
    bkgd.individual.Zscores$zscores$`Genes with the same annotation`$NRED,
    ybnds = c(min(all_scores_y), max(all_scores_y)),
    xbnds = c(min(all_scores_x), max(all_scores_x)))

TbasCO installs correctly with these fixes, but with the following warnings:

Note: possible error in 'Model_Module(RNAseq.data, ': unused arguments (Yrange, bkgd.traits) 
Note: possible error in 'Model_Module(RNAseq.data, ': unused arguments (Yrange, bkgd.traits) 
Note: possible error in 'Model_Module(RNAseq.data, ': unused arguments (Yrange, bkgd.traits) 
Note: possible error in 'Model_Module(RNAseq.data, ': unused arguments (Yrange, bkgd.traits) 
Note: possible error in 'Model_Module(RNAseq.data, ': unused arguments (Yrange, bkgd.traits) 
Note: possible error in 'Model_Module(RNAseq.data, ': unused arguments (Yrange, bkgd.traits) 
Note: possible error in 'Model_Module(RNAseq.data, ': unused arguments (Yrange, bkgd.traits) 
Note: break used in wrong context: no loop is visible 

Create Plotting Function to select specific genomes

Modify the Plot_Trait_Attribute_Expression function to provide a string of "genomes" to only look at those and not all genomes that have the trait attribute. For example, I modified the existing function to just look at the two Accumulibacter genomes by hard coding those genome names, would be nice to just provide a string.

Plot_Accumulibacter_Attribute_Expression <- function(trait.attribute = "M00793_1",
                                            trait.attributes.pruned,
                                            RNAseq.data) {
  t.data <- trait.attributes.pruned[[trait.attribute]]
  annotations <- RNAseq.data$features$annotation.db$module.dict[[trait.attribute]]
  n.att <- length(t.data)
  n.annotations <- length(annotations)
  
  
  
  
  
  plots <- list()
  
  for(i in 1:length(t.data)){
    
    plot.df <- matrix(ncol = 4, nrow = 0)
    trait.genomes <- c("3300026302-bin.3", "3300026286-bin.31")          #### specific genomes instead of all
    
    for(genome in trait.genomes){
      
      for(annotation in annotations){
        expr_values <- RNAseq.data$table[which(RNAseq.data$table$Annotation == annotation),]
        expr_values <- expr_values[ which(expr_values$Bin == genome), RNAseq.data$features$sample.columns]
        
        
        if(nrow(expr_values) >= 2) expr_values <- expr_values[1,]
        else if(nrow(expr_values) == 0){
          print(paste(genome, "lacks", annotation))
          next
        }
        
        
        if(! NA %in% expr_values){
          expr_values <- log2(expr_values)
          expr_values <- expr_values-min(expr_values)
        }
        
        
        for (timepoint in 1:length(expr_values)){
          
          tp_expr <- expr_values[timepoint] %>% as.numeric
          row <- c(genome,  annotation, paste0("TP",timepoint) , tp_expr)
          print(row)
          plot.df <- rbind(plot.df, row )
          
        }
        
      }
      
    }
    
    
    plot.df <- data.frame(plot.df, stringsAsFactors = FALSE)
    colnames(plot.df) <- c("Genome", "KID", "TimePoint", "Expr")
    
    plot.df$Expr %<>% as.numeric
    plots[[i]] <- ggplot(plot.df) + geom_line(aes(x = TimePoint, y = Expr, group = Genome, color = Genome)) + facet_wrap(~KID, nrow = 1)
    
    
    
  }
  gridExtra::grid.arrange(grobs = plots, nrow = 2)
  
  
}

Expanded Options Don't Exist

For these commands:

trait.attributes        <- Identify_Trait_Attributes(RNAseq.data = RNAseq.data, 
                                                     pairwise.distances = pairwise.distances,
                                                     annotation.db = expanded_annotation.db,
                                                     threads = 2)
trait.attributes.pruned <- Prune_Trait_Attributes(trait.attributes, bkgd.traits, 
                                                  RNAseq.data,
                                                  p.threshold = 0.05,
                                                  pairwise.distances = pairwise.distances,
                                                  annotation.db = expanded_annotation.db,
                                                  trait_presence_absence = trait_pa_expanded)

The variables expanded_annotation.db and trait_pa_expanded do not exist, and I can't find where they might be created. I instead do this:

trait.attributes <- Identify_Trait_Attributes(RNAseq.data = RNAseq.data,
pairwise.distances = pairwise.distances,
annotation.db = RNAseq.data$features$annotation.db,threads = 2)

trait.attributes.pruned <- Prune_Trait_Attributes(trait.attributes, 
bkgd.traits, 
RNAseq.data,
 p.threshold = 0.05,
pairwise.distances = pairwise.distances, 
bkgd.individual.Zscores = bkgd.individual.Zscores, 
annotation.db = RNAseq.data$features$annotation.db,
trait_presence_absence = RNAseq.data$features$trait_presence_absence)

This seems to create the trait attributes/pruned versions successfully, but then there might be downstream problems caused by this.

Plot_Trait_Attribute_Expression Gives Empty Plots

After loading the Plot_Trait_Attribute_Expression function successfully, and running this command for example: Plot_Trait_Attribute_Expression(trait.attribute='M00009_756', trait.attributes$trait.attribute, RNAseq.data), it draws the plot boxes for each KO in the module, but the plots are empty.

Filter for module completeness

If a genome only expresses a very small fraction of the full module, it may be marked as 'not expressing' that module.

e.g. from CTR:

  module_completion_table          <- Calc_module_completion(module_attribute_table,
                                                             matrix_features,
                                                             modules_to_KO_list)

  average_attribute_completeness   <- Calc_Ave_Attribute_Completion(module_attribute_table,
                                                                    module_completion_table)`

goFish function

A function that allows a user to identify gene(s)/KOs with a particular pattern of expression/ranks.

(Feature Request) command line implementation outside of R

Are there any plans to develop a command line version outside of R? I've gone through your tutorial and have pretty much all the necessary files from my github.com/jolespin/veba pipeline. If this software can take generalized input (eg, a table of genome Id, protein Id, kegg Ids) and counts tables, I can work on a walkthrough (potentially even include in a module) to use your tool downstream of the results that are processed by my tool or similar tools.

A command line option would also make it easier to run at scale on distributed systems.

Empty Trait Redundancy Dataframe

After loading the function Calc_TnA_redundancy and running TnA_redundancy <- Calc_TnA_redundancy(), I get the error:

Warning message:
In rev(as.numeric(Most_redundant_order)) : NAs introduced by coercion

And TnA_redundancy looks like this:

      all_bins            sum_traits sum_attributes
 [1,] "3300009517-bin.1"  "0"        "0"           
 [2,] "3300009517-bin.12" "0"        "0"           
 [3,] "3300009517-bin.13" "0"        "0"           
 [4,] "3300009517-bin.3"  "0"        "0"           
 [5,] "3300009517-bin.30" "0"        "0"           
 [6,] "3300009517-bin.31" "0"        "0"           
 [7,] "3300009517-bin.42" "0"        "0"           
 [8,] "3300009517-bin.47" "0"        "0"           
 [9,] "3300009517-bin.6"  "0"        "0"           
[10,] "3300009517-bin.7"  "0"        "0"           
[11,] "3300026282-bin.4"  "0"        "0"           
[12,] "3300026283-bin.28" "0"        "0"           
[13,] "3300026284-bin.9"  "0"        "0"           
[14,] "3300026288-bin.32" "0"        "0"           
[15,] "3300026288-bin.43" "0"        "0"           
[16,] "3300026289-bin.24" "0"        "0"           
[17,] "3300026299-bin.22" "0"        "0"           
[18,] "3300026302-bin.10" "0"        "0"           
[19,] "3300026302-bin.20" "0"        "0"           
[20,] "3300026302-bin.32" "0"        "0"           
[21,] "3300026302-bin.46" "0"        "0"           
[22,] "3300026302-bin.47" "0"        "0"           
[23,] "3300026302-bin.62" "0"        "0"           
[24,] "3300026303-bin.42" "0"        "0"           
[25,] "3300026303-bin.46" "0"        "0"     

Therefore leading to an empty plot when plotting traits/attributes.

ClusterPrune p-value

include line to adjust for multiple testing.

see:
p.adjust(p, method = p.adjust.methods, n = length(p))

whereby n would be the total number of module attributes

Filtering

The filtering module should be modular.
Possibly:

  • Give a few options
  • Make it user adaptable

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.