Git Product home page Git Product logo

cola's Introduction

Anurag's github stats

Activities on my repos (R packages, 2013-04-18 ~ 2022-12-10): (code for generating this plot: https://jokergoo.github.io/spiralize_vignettes/examples.html#github-commits)

code
library(spiralize)
library(grid)

repos = c("GlobalOptions", "GetoptLong", "circlize", "bsub", "pkgndep", "ComplexHeatmap", "EnrichedHeatmap", 
    "HilbertCurve", "gtrellis", "cola", "simplifyEnrichment", "InteractiveComplexHeatmap", "spiralize", "rGREAT", "simona")

df_all = data.frame(commits = numeric(0), date = character(0), repo = character(0))
for(r in repos) {
    # go to each repo folder
    setwd(paste0("~/project/development/", r))
    df = read.table(pipe("git log --date=short --pretty=format:%ad | sort | uniq -c"))
    colnames(df) = c("commits", "date")
    df$repo = r

    df_all = rbind(df_all, df)
}

df_all$date = as.Date(df_all$date)

start = min(df_all$date)
end = max(df_all$date)

d = start + seq(1, end - start + 1) - 1
n = numeric(length(d))
nl = lapply(repos, function(x) numeric(length(d)))
names(nl) = repos

for(i in seq_len(nrow(df_all))) {
    ind = as.double(difftime(df_all[i, "date"], start), "days") + 1
    n[ind] = n[ind] + df_all[i, "commits"]

    nl[[ df_all[i, "repo"] ]][ind] = nl[[ df_all[i, "repo"] ]][ind] + df_all[i, "commits"]
}

calc_pt_size = function(x) {
    pt_size = x
    pt_size[pt_size > 20] = 20
    pt_size[pt_size < 2 & pt_size > 0] = 2
    pt_size
}
xlim = range(d)

pl = list()
pl[[1]] = grid.grabExpr({
    spiral_initialize_by_time(xlim, verbose = FALSE, normalize_year = TRUE)
    spiral_track()
    spiral_points(d, 0.5, pch = 16, size = unit(calc_pt_size(n), "pt"))
    grid.text("All packages", x = 0, y = 1, just = c("left", "top"), gp = gpar(fontsize = 14))

    for(t in c("2013-01-01", "2014-01-01", "2015-01-01", "2016-01-01", "2017-01-01",
               "2018-01-01", "2019-01-01", "2020-01-01", "2021-01-01", "2022-01-01", "2023-01-01")) {
        spiral_text(t, 0.5, gsub("-\\d+-\\d+$", "", as.character(t)), gp = gpar(fontsize = 8), facing = "inside")
    }
})

for(i in order(sapply(nl, sum), decreasing = TRUE)) {
    pl[[ names(nl)[i] ]] = grid.grabExpr({
        spiral_initialize_by_time(xlim, verbose = FALSE, normalize_year = TRUE)
        spiral_track()
        spiral_points(d, 0.5, pch = 16, size = unit(calc_pt_size(nl[[i]]), "pt"))
        grid.text(names(nl)[i], x = 0, y = 1, just = c("left", "top"), gp = gpar(fontsize = 14))

        for(t in c("2013-01-01", "2014-01-01", "2015-01-01", "2016-01-01", "2017-01-01",
                   "2018-01-01", "2019-01-01", "2020-01-01", "2021-01-01", "2022-01-01", "2023-01-01")) {
            spiral_text(t, 0.5, gsub("-\\d+-\\d+$", "", as.character(t)), gp = gpar(fontsize = 8), facing = "inside")
        }
    })
}

library(cowplot)
png("~/test.png", 300*4*1.5, 300*4*1.5, res = 72*1.5)
plot_grid(plotlist = pl, ncol = 4)
dev.off()

test

cola's People

Contributors

jokergoo avatar jwokaty avatar lshep avatar nturaga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cola's Issues

reduce the duplicated generation of plots

For each plot, e.g. consensus heatmap or signature heatmap, each was generated for three times. Especially for signature heatmaps, this will increase the running time three times. Think about a ways to simplify this.

samples with small silhoutte scores

In hierarchical partitioning, in each iteration, should all the samples be used or the samples with silhouette scores larger than the cutoff?

random sample significant rows

When there are more than 2000 signature rows, should we select the top 2k rows or random sample 2k rows from all significant rows?

check PAC and concordance

            best_k   cophcor        PAC mean_silhouette concordance
ATC:NMF          2 0.9886968 0.06366852       0.9611175   0.9841618 **
MAD:kmeans       4 0.9959875 0.02129834       0.9598491   0.9797399 **
sd:NMF           4 0.9934111 0.03979490       0.9503534   0.9779191 **
cv:skmeans       2 0.9809859 0.13412099       0.9331213   0.9726301 **  <--
MAD:NMF          4 0.9903000 0.05275786       0.9348654   0.9721387 **
sd:kmeans        4 0.9919949 0.04824628       0.9337733   0.9665029 **
ATC:skmeans      4 0.9866620 0.07423391       0.9187754   0.9612428 **
MAD:pam          3 0.9842428 0.11316332       0.9065556   0.9576879 **
cv:NMF           2 0.9689004 0.19324018       0.8955448   0.9570809 ** <--
ATC:pam          2 0.9692288 0.19000105       0.8766683   0.9548844 ** <--
sd:pam           3 0.9788987 0.15456213       0.8897471   0.9515607 **

do not group signature genes

because for some cases, it is difficult to say it is a subgroup1 signatures, so we just make heatmap
for all differential genes.

get_signatures: split rows

Since rows show difference between subgroups, it should be easy to find a optimized k if do k-means on rows.

Assign TEMPLATE_DIR on load rather than at installation

This

TEMPLATE_DIR = system.file("extdata", package = "cola")

assigns TEMPLATE_DIR to the path when the package is being installed, rather than when it is loaded or attached. This will cause problems in a future R where 'staged install' is implemented. A better practice is along the lines of

TEMPLATE_DIR <- NULL

.onLoad <- function(...) {
    TEMPLATE_DIR <- system.file(package="cola", ...
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.