jorisvansteenbrugge / tbasco Goto Github PK
View Code? Open in Web Editor NEWLicense: GNU Affero General Public License v3.0
License: GNU Affero General Public License v3.0
Some kind of enhancement where parameters support
and confidence
are automatically set would be a nice addition
As exemplified by the pyruvate oxidation pathways, which has many alternative sub-modules that may result in the same function, care needs to be taken when determining whether a module is complete. However, there is unlikely to be a single solution, as there are 4 different types of modules:
http://www.kegg.jp/kegg/module.html
Currently there is a hard filter for 75% 'complete' for a module. Completeness is determined by the percentage of total genes contained within a module. However, not all genes in a module are necessary for the function, and therefore this filter is too strict.
For other modules types, such as the structural complexes, perhaps we can continue using the 75% cut-off exists at the moment - however we can revisit this issue in the future
Lines 256-269 are missing closing parentheses:
random.genes.hexb <- hexbin(bkgd.individual.Zscores$zscores$`Random Genes`$PC,
bkgd.individual.Zscores$zscores$`Random Genes`$NRED,
ybnds = c(min(all_scores_y), max(all_scores_y)),
xbnds = c(min(all_scores_x), max(all_scores_x)))
random.annotated.genes.hexb <- hexbin(bkgd.individual.Zscores$zscores$`Random Annotated Genes`$PC,
bkgd.individual.Zscores$zscores$`Random Annotated Genes`$NRED,
ybnds = c(min(all_scores_y), max(all_scores_y)),
xbnds = c(min(all_scores_x), max(all_scores_x)))
random.identical.annotated.genes.hexb <- hexbin(bkgd.individual.Zscores$zscores$`Genes with the same annotation`$PC,
bkgd.individual.Zscores$zscores$`Genes with the same annotation`$NRED,
ybnds = c(min(all_scores_y), max(all_scores_y)),
xbnds = c(min(all_scores_x), max(all_scores_x)))
TbasCO installs correctly with these fixes, but with the following warnings:
Note: possible error in 'Model_Module(RNAseq.data, ': unused arguments (Yrange, bkgd.traits)
Note: possible error in 'Model_Module(RNAseq.data, ': unused arguments (Yrange, bkgd.traits)
Note: possible error in 'Model_Module(RNAseq.data, ': unused arguments (Yrange, bkgd.traits)
Note: possible error in 'Model_Module(RNAseq.data, ': unused arguments (Yrange, bkgd.traits)
Note: possible error in 'Model_Module(RNAseq.data, ': unused arguments (Yrange, bkgd.traits)
Note: possible error in 'Model_Module(RNAseq.data, ': unused arguments (Yrange, bkgd.traits)
Note: possible error in 'Model_Module(RNAseq.data, ': unused arguments (Yrange, bkgd.traits)
Note: break used in wrong context: no loop is visible
distance?
Modify the Plot_Trait_Attribute_Expression
function to provide a string of "genomes" to only look at those and not all genomes that have the trait attribute. For example, I modified the existing function to just look at the two Accumulibacter genomes by hard coding those genome names, would be nice to just provide a string.
Plot_Accumulibacter_Attribute_Expression <- function(trait.attribute = "M00793_1",
trait.attributes.pruned,
RNAseq.data) {
t.data <- trait.attributes.pruned[[trait.attribute]]
annotations <- RNAseq.data$features$annotation.db$module.dict[[trait.attribute]]
n.att <- length(t.data)
n.annotations <- length(annotations)
plots <- list()
for(i in 1:length(t.data)){
plot.df <- matrix(ncol = 4, nrow = 0)
trait.genomes <- c("3300026302-bin.3", "3300026286-bin.31") #### specific genomes instead of all
for(genome in trait.genomes){
for(annotation in annotations){
expr_values <- RNAseq.data$table[which(RNAseq.data$table$Annotation == annotation),]
expr_values <- expr_values[ which(expr_values$Bin == genome), RNAseq.data$features$sample.columns]
if(nrow(expr_values) >= 2) expr_values <- expr_values[1,]
else if(nrow(expr_values) == 0){
print(paste(genome, "lacks", annotation))
next
}
if(! NA %in% expr_values){
expr_values <- log2(expr_values)
expr_values <- expr_values-min(expr_values)
}
for (timepoint in 1:length(expr_values)){
tp_expr <- expr_values[timepoint] %>% as.numeric
row <- c(genome, annotation, paste0("TP",timepoint) , tp_expr)
print(row)
plot.df <- rbind(plot.df, row )
}
}
}
plot.df <- data.frame(plot.df, stringsAsFactors = FALSE)
colnames(plot.df) <- c("Genome", "KID", "TimePoint", "Expr")
plot.df$Expr %<>% as.numeric
plots[[i]] <- ggplot(plot.df) + geom_line(aes(x = TimePoint, y = Expr, group = Genome, color = Genome)) + facet_wrap(~KID, nrow = 1)
}
gridExtra::grid.arrange(grobs = plots, nrow = 2)
}
For these commands:
trait.attributes <- Identify_Trait_Attributes(RNAseq.data = RNAseq.data,
pairwise.distances = pairwise.distances,
annotation.db = expanded_annotation.db,
threads = 2)
trait.attributes.pruned <- Prune_Trait_Attributes(trait.attributes, bkgd.traits,
RNAseq.data,
p.threshold = 0.05,
pairwise.distances = pairwise.distances,
annotation.db = expanded_annotation.db,
trait_presence_absence = trait_pa_expanded)
The variables expanded_annotation.db
and trait_pa_expanded
do not exist, and I can't find where they might be created. I instead do this:
trait.attributes <- Identify_Trait_Attributes(RNAseq.data = RNAseq.data,
pairwise.distances = pairwise.distances,
annotation.db = RNAseq.data$features$annotation.db,threads = 2)
trait.attributes.pruned <- Prune_Trait_Attributes(trait.attributes,
bkgd.traits,
RNAseq.data,
p.threshold = 0.05,
pairwise.distances = pairwise.distances,
bkgd.individual.Zscores = bkgd.individual.Zscores,
annotation.db = RNAseq.data$features$annotation.db,
trait_presence_absence = RNAseq.data$features$trait_presence_absence)
This seems to create the trait attributes/pruned versions successfully, but then there might be downstream problems caused by this.
After loading the Plot_Trait_Attribute_Expression
function successfully, and running this command for example: Plot_Trait_Attribute_Expression(trait.attribute='M00009_756', trait.attributes$trait.attribute, RNAseq.data)
, it draws the plot boxes for each KO in the module, but the plots are empty.
If a genome only expresses a very small fraction of the full module, it may be marked as 'not expressing' that module.
e.g. from CTR:
module_completion_table <- Calc_module_completion(module_attribute_table,
matrix_features,
modules_to_KO_list)
average_attribute_completeness <- Calc_Ave_Attribute_Completion(module_attribute_table,
module_completion_table)`
Probably by reusing the ctr version since that is quite a hassle.
A function that allows a user to identify gene(s)/KOs with a particular pattern of expression/ranks.
Are there any plans to develop a command line version outside of R? I've gone through your tutorial and have pretty much all the necessary files from my github.com/jolespin/veba pipeline. If this software can take generalized input (eg, a table of genome Id, protein Id, kegg Ids) and counts tables, I can work on a walkthrough (potentially even include in a module) to use your tool downstream of the results that are processed by my tool or similar tools.
A command line option would also make it easier to run at scale on distributed systems.
After loading the function Calc_TnA_redundancy
and running TnA_redundancy <- Calc_TnA_redundancy()
, I get the error:
Warning message:
In rev(as.numeric(Most_redundant_order)) : NAs introduced by coercion
And TnA_redundancy
looks like this:
all_bins sum_traits sum_attributes
[1,] "3300009517-bin.1" "0" "0"
[2,] "3300009517-bin.12" "0" "0"
[3,] "3300009517-bin.13" "0" "0"
[4,] "3300009517-bin.3" "0" "0"
[5,] "3300009517-bin.30" "0" "0"
[6,] "3300009517-bin.31" "0" "0"
[7,] "3300009517-bin.42" "0" "0"
[8,] "3300009517-bin.47" "0" "0"
[9,] "3300009517-bin.6" "0" "0"
[10,] "3300009517-bin.7" "0" "0"
[11,] "3300026282-bin.4" "0" "0"
[12,] "3300026283-bin.28" "0" "0"
[13,] "3300026284-bin.9" "0" "0"
[14,] "3300026288-bin.32" "0" "0"
[15,] "3300026288-bin.43" "0" "0"
[16,] "3300026289-bin.24" "0" "0"
[17,] "3300026299-bin.22" "0" "0"
[18,] "3300026302-bin.10" "0" "0"
[19,] "3300026302-bin.20" "0" "0"
[20,] "3300026302-bin.32" "0" "0"
[21,] "3300026302-bin.46" "0" "0"
[22,] "3300026302-bin.47" "0" "0"
[23,] "3300026302-bin.62" "0" "0"
[24,] "3300026303-bin.42" "0" "0"
[25,] "3300026303-bin.46" "0" "0"
Therefore leading to an empty plot when plotting traits/attributes.
include line to adjust for multiple testing.
see:
p.adjust(p, method = p.adjust.methods, n = length(p))
whereby n would be the total number of module attributes
Should the Z scores used, be based on random genes with the same annotation? or any random genes?
If we can incorporate a list of single copy genes, we may use those to filter out incomplete genomes upstream.
In the vignette, this line errors:
sbs.trait.attributes <- Traitattributes_To_Sbsmatrix(trait.attributes.pruned, RNAseq.data$features$bins)
Error in trait[[i]] : subscript out of bounds
Let the user decide to use the rank or correlation. Let set a margin
The filtering module should be modular.
Possibly:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.