hugheylab / simphony Goto Github PK
View Code? Open in Web Editor NEWR package for simulating rhythmic data
Home Page: https://simphony.hugheylab.org
R package for simulating rhythmic data
Home Page: https://simphony.hugheylab.org
For consistency and readability, you could probably create new variables timePoints1 and timePoints2, and use those to calculate timeCourse1 and timeCourse2.
I think there are a couple problems, it should be (double-check against the photo you took of my whiteboard):
stats::rnbinom(length(mu), mu = 2^mu, size = 1/sampleDispersion(mu))
You can verify that this gives the expected variance (mu + disp * mu^2
). To be consistent with Polyester, the sampleDispersion
function should change accordingly.
There should be a new argument to simphony called timeRange
, which is a vector of two values. The first is the minimum time to be sampled, and the second is the maximum time to be sampled.
timepointsType
is 'auto'
, the timeRange
argument should not be used.interval
does not sum to the provided max.Darwin: Jordan and I were thinking this would be a good way to familiarize yourself with the code.
We need it for expression values generated by both Gaussian and negative binomial.
A column for mean phase should be included in rhythmicGroups, defaulting to 0 if it is not included.
makefunc = function(x) {x; function(m) x}
To have lintr ignore any of these issues, add the appropriate lines of those shown below to lint_ignore.csv at the top-level of the repository:
filename,line_number,message,line
R/internals.R,63,Compound semicolons are discouraged. Replace them by a newline.,makefunc = function(x) {x; function(m) x}
Warning, lint_ignore.csv contains lines not found in the current code:
R/accessories.R line 58: Variable and function name style should be camelCase.
r .N = time = mu = base = amp = rhyFunc = phase = period = NULL
R/accessories.R line 119: Variable and function name style should be camelCase.
r abund = .N = mu = sd = dispFunc = NULL
R/internals.R line 7: Variable and function name style should be camelCase.
r group = .N = phase = period = sd = NULL
R/internals.R line 132: Variable and function name style should be camelCase.
r featureGroups = .N = cond = ..cond = feature = NULL
We should consider adding the following lines to the DESCRIPTION file. See the limorhyde or deltaccd repo for comparison and for typical orders (e.g., Imports before Suggests). You can tweak the Depends line, if you're still using R 3.4.something.
Depends: R (>= 3.5.0)
Roxygen: list(markdown = TRUE)
LazyData: true
URL: https://github.com/hugheylab/simphony
It's also important to put a minimum version of each imported and suggested package. I would set the minimum to whatever version you've been using to develop the package.
The trick is that a data.frame is also a list. I think we should instead check whether is.data.frame(exprGroupsList)
.
Create a new function which does the work of filtering out non-DR groups in a cross-join, when provided vectors of dAmp, meanAmp, dPhase.
.N = time = mu = base = amp = rhyFunc = phase = period = NULL
for (i in 1:nrow(d)) {
abund = .N = mu = sd = dispFunc = NULL
for (ii in 1:length(cols)) {
group = .N = phase = period = sd = NULL
for (i in 1:nrow(featureGroups)) {
for (i in 1:nrow(featureGroups)) {
sm = foreach(cond = 1:nrow(times), .combine = rbind) %do% {
featureGroups = .N = cond = ..cond = feature = NULL
test_check(%22simphony%22)
expectedAbund = foreach(r = 1:nrow(simData$abundData), .combine = rbind) %do% {
for(timeNow in unique(simData$sampleMetadata$time)) {
for(groupNow in unique(simData$featureMetadata$group)) {
featureGroups = data.table::data.table(amp = function(t) 5 * 2^(-t/12),
featureGroups = data.table::data.table(amp = function(t) 5 * 2^(-t/12),
base = function(t) 4 * 2^(-t/12))
base = function(t) 4 * 2^(-t/12))
featureGroups = data.table(amp = c(function(tt) 3, function(tt) 3 * 2^(-tt / 24)),
kable(simData$sampleMetadata[1:3,])
labs(x = expression('Rhythm amplitude '*(log[2]~counts)), y = 'P-value of rhythmicity')
labs(x = expression('Rhythm amplitude '*(log[2]~counts)), y = 'P-value of rhythmicity')
This could be done more simply by nGroups = sapply(exprGroupsList, nrow)
.
To prevent this behavior, we could make the first line:
exprGroups = data.table::copy(data.table::data.table(exprGroups))
foreach
returns a list by default, so .combine = list
can safely be omitted.
.N = time = mu = base = amp = rhyFunc = phase = period = NULL
for (i in 1:nrow(d)) {
abund = .N = mu = sd = dispFunc = NULL
for (ii in 1:length(cols)) {
group = .N = phase = period = sd = NULL
for (i in 1:nrow(featureGroups)) {
for (i in 1:nrow(featureGroups)) {
sm = foreach(cond = 1:nrow(times), .combine = rbind) %do% {
featureGroups = .N = cond = ..cond = feature = NULL
test_check("simphony")
expectedAbund = foreach(r = 1:nrow(simData$abundData), .combine = rbind) %do% {
for(timeNow in unique(simData$sampleMetadata$time)) {
for(groupNow in unique(simData$featureMetadata$group)) {
featureGroups = data.table::data.table(amp = function(t) 5 * 2^(-t/12),
featureGroups = data.table::data.table(amp = function(t) 5 * 2^(-t/12),
base = function(t) 4 * 2^(-t/12))
base = function(t) 4 * 2^(-t/12))
featureGroups = data.table(amp = c(function(tt) 3, function(tt) 3 * 2^(-tt / 24)),
kable(simData$sampleMetadata[1:3,])
labs(x = expression('Rhythm amplitude '*(log[2]~counts)), y = 'P-value of rhythmicity')
labs(x = expression('Rhythm amplitude '*(log[2]~counts)), y = 'P-value of rhythmicity')
Simphony is a general tool for simulating rhythmic data. Simulating gene expression is only one use case. In addition exprData
does not mean experiment and is not related to the type of R object called an expression. We'd need to change the code and the documentation.
Suggested changes to variable names:
gene -> feature
expr -> meas (?)
exprGroups
should be the main parameter determining the properties of simulated genes.
Columns should include:
geneFrac
- The fraction of total genes which fall into this groupmeanExpr
- The mean value of expression for genes in this groupdExpr
- The difference in expression as measured between the two conditions of this groupmeanAmp
- The mean value of the amplitude of the rhythmic component for genes in this groupdAmp
The difference in amplitude of the rhythmic component as measured for the genes in this groupmeanPhase
- The mean value of the amplitude of the rhythmic component for genes in this groupdPhase
The difference in phase of the rhythmic component as measured for the genes in this grouprhyFunction
- The function to sample from for the rhythmic component of a geneperiod
- The period of the rhythmic component for this gene group.N = time = mu = base = amp = rhyFunc = phase = period = NULL
abund = .N = mu = sd = dispFunc = NULL
group = .N = phase = period = sd = NULL
featureGroups = .N = cond = ..cond = feature = NULL
To have lintr ignore any of these issues, add the appropriate lines of those shown below to lint_ignore.csv at the top-level of the repository:
filename,line_number,message,line
R/accessories.R,58,Variable and function name style should be camelCase.,.N = time = mu = base = amp = rhyFunc = phase = period = NULL
R/accessories.R,119,Variable and function name style should be camelCase.,abund = .N = mu = sd = dispFunc = NULL
R/internals.R,7,Variable and function name style should be camelCase.,group = .N = phase = period = sd = NULL
R/internals.R,132,Variable and function name style should be camelCase.,featureGroups = .N = cond = ..cond = feature = NULL
We talked about this, I'm just creating an issue so we don't forget.
I'm increasingly unconvinced that we've converted the empirically estimated amplitudes to numbers that are appropriate for negative binomial sampling. What got me thinking about it is how much smaller the prediction intervals are for negbinom than for gaussian, for a given amplitude (for typical values of base).
Since we don't even talk about empircal amplitude estimation in the paper, how do you guys feel about simplifying the example in the documentation to something like what we talked about before:
# Simulate data for 100 genes, half non-rhythmic and half rhythmic, with
# amplitudes for rhythmic genes sampled from a log-normal distribution.
nGenes = 100
rhyFrac = 0.5
nRhyGenes = round(rhyFrac * nGenes)
rhyAmps = exp(rnorm(nRhyGenes, mean = 0, sd = 0.25))
fracGenes = c(1 - rhyFrac, rep(rhyFrac / nRhyGenes, nRhyGenes))
exprGroups = data.table(amp = c(0, rhyAmps), fracGenes = fracGenes)
simData = simphony(exprGroups, nGenes = nGenes)
Line 91 in b69b7ae
Instead of fixed interval with fixed number of replicates. User could just specify the total number of samples, which would then be split across conditions.
period
should be added as a column to featureGroups
, defaulting to the provided period
argument to simphony.
Simulate differential rhythmicity in which amplitude and phase can vary as a function of a continuous condition.
rhyIndex
is not as clear as groupIndex
.
For example, change it here:
https://github.com/hugheylab/SimulatedExpression/blob/f6c8ee08a5bbcbe6b8a2b1d8254a64fb61416ce9/R/simulation.R#L39-L47
Should we allow users to pass in just a single data.table
object instead of a list of a single data.table
?
Along with the simulated expression time course, plot the underlying sin curves for extra clarity.
.N = time = mu = base = amp = rhyFunc = phase = period = NULL
for (i in 1:nrow(d)) {
abund = .N = mu = sd = dispFunc = NULL
for (ii in 1:length(cols)) {
group = .N = phase = period = sd = NULL
for (i in 1:nrow(featureGroups)) {
for (i in 1:nrow(featureGroups)) {
sm = foreach(cond = 1:nrow(times), .combine = rbind) %do% {
featureGroups = .N = cond = ..cond = feature = NULL
test_check("simphony")
expectedAbund = foreach(r = 1:nrow(simData$abundData), .combine = rbind) %do% {
for(timeNow in unique(simData$sampleMetadata$time)) {
for(groupNow in unique(simData$featureMetadata$group)) {
featureGroups = data.table::data.table(amp = function(t) 5 * 2^(-t/12),
featureGroups = data.table::data.table(amp = function(t) 5 * 2^(-t/12),
base = function(t) 4 * 2^(-t/12))
base = function(t) 4 * 2^(-t/12))
featureGroups = data.table(amp = c(function(tt) 3, function(tt) 3 * 2^(-tt / 24)),
kable(simData$sampleMetadata[1:3,])
labs(x = expression('Rhythm amplitude '*(log[2]~counts)), y = 'P-value of rhythmicity')
labs(x = expression('Rhythm amplitude '*(log[2]~counts)), y = 'P-value of rhythmicity')
To have lintr ignore any of these issues, add the appropriate lines of those shown below to lint_ignore.csv at the top-level of the repository:
filename,line_number,message,line
R/accessories.R,58,Variable and function name style should be camelCase.,.N = time = mu = base = amp = rhyFunc = phase = period = NULL
R/accessories.R,74,"Avoid 1:nrow(...) expressions, use seq_len.",for (i in 1:nrow(d)) {
R/accessories.R,119,Variable and function name style should be camelCase.,abund = .N = mu = sd = dispFunc = NULL
R/accessories.R,228,"Avoid 1:length(...) expressions, use seq_len.",for (ii in 1:length(cols)) {
R/internals.R,7,Variable and function name style should be camelCase.,group = .N = phase = period = sd = NULL
R/internals.R,18,"Avoid 1:nrow(...) expressions, use seq_len.",for (i in 1:nrow(featureGroups)) {
R/internals.R,50,"Avoid 1:nrow(...) expressions, use seq_len.",for (i in 1:nrow(featureGroups)) {
R/internals.R,103,"Avoid 1:nrow(...) expressions, use seq_len.","sm = foreach(cond = 1:nrow(times), .combine = rbind) %do% {"
R/internals.R,132,Variable and function name style should be camelCase.,featureGroups = .N = cond = ..cond = feature = NULL
tests/testthat.R,4,Only use single-quotes.,"test_check(""simphony"")"
tests/testthat/test-functional-simphony.R,15,"Avoid 1:nrow(...) expressions, use seq_len.","expectedAbund = foreach(r = 1:nrow(simData$abundData), .combine = rbind) %do% {"
tests/testthat/test-functional-simphony.R,47,"Place a space before left parenthesis, except in a function call.",for(timeNow in unique(simData$sampleMetadata$time)) {
tests/testthat/test-functional-simphony.R,50,"Place a space before left parenthesis, except in a function call.",for(groupNow in unique(simData$featureMetadata$group)) {
tests/testthat/test-functional-simphony.R,72,"Place a space before left parenthesis, except in a function call.","featureGroups = data.table::data.table(amp = function(t) 5 * 2^(-t/12),"
tests/testthat/test-functional-simphony.R,72,Put spaces around all infix operators.,"featureGroups = data.table::data.table(amp = function(t) 5 * 2^(-t/12),"
tests/testthat/test-functional-simphony.R,73,"Place a space before left parenthesis, except in a function call.",base = function(t) 4 * 2^(-t/12))
tests/testthat/test-functional-simphony.R,73,Put spaces around all infix operators.,base = function(t) 4 * 2^(-t/12))
vignettes/examples.Rmd,99,"Place a space before left parenthesis, except in a function call.","featureGroups = data.table(amp = c(function(tt) 3, function(tt) 3 * 2^(-tt / 24)),"
vignettes/introduction.Rmd,48,Commas should always have a space after.,"kable(simData$sampleMetadata[1:3,])"
vignettes/introduction.Rmd,112,Put spaces around all infix operators.,"labs(x = expression('Rhythm amplitude '*(log[2]~counts)), y = 'P-value of rhythmicity')"
vignettes/introduction.Rmd,112,"Place a space before left parenthesis, except in a function call.","labs(x = expression('Rhythm amplitude '*(log[2]~counts)), y = 'P-value of rhythmicity')"
A decision should be made about whether this option belongs in the simulation package.
Rhythmic feature groups should include a coefficient for exponential dampening, defaulting to 0.
makefunc = function(x) {x; function(m) x}
To have lintr ignore any of these issues, add the appropriate lines of those shown below to lint_ignore.csv at the top-level of the repository:
filename,line_number,message,line
R/internals.R,63,Compound semicolons are discouraged. Replace them by a newline.,makefunc = function(x) {x; function(m) x}
Warning, lint_ignore.csv contains lines not found in the current code:
R/accessories.R line 58: Variable and function name style should be camelCase.
r .N = time = mu = base = amp = rhyFunc = phase = period = NULL
R/accessories.R line 119: Variable and function name style should be camelCase.
r abund = .N = mu = sd = dispFunc = NULL
R/internals.R line 7: Variable and function name style should be camelCase.
r group = .N = phase = period = sd = NULL
R/internals.R line 132: Variable and function name style should be camelCase.
r featureGroups = .N = cond = ..cond = feature = NULL
Users should be able to set the mean phase for rhythmic expression. This should be an option for both rhythmic groups and differentially rhythmic groups.
.N = time = mu = base = amp = rhyFunc = phase = period = NULL
d = data.table(featureMetadata)[rep(1:.N, each = length(times))]
abund = .N = mu = sd = dispFunc = NULL
group = .N = phase = period = sd = NULL
featureGroups[, group := 1:.N]
featureGroups = .N = cond = ..cond = feature = NULL
fmNow = featureGroups[rep(1:.N, times = nFeaturesPerGroup)]
To have lintr ignore any of these issues, add the appropriate lines of those shown below to lint_ignore.csv at the top-level of the repository:
filename,line_number,message,line
R/accessories.R,58,Variable and function name style should be camelCase.,.N = time = mu = base = amp = rhyFunc = phase = period = NULL
R/accessories.R,61,1:.N is likely to be wrong in the empty edge case. Use seq_len(.N) instead.,"d = data.table(featureMetadata)[rep(1:.N, each = length(times))]"
R/accessories.R,119,Variable and function name style should be camelCase.,abund = .N = mu = sd = dispFunc = NULL
R/internals.R,7,Variable and function name style should be camelCase.,group = .N = phase = period = sd = NULL
R/internals.R,15,1:.N is likely to be wrong in the empty edge case. Use seq_len(.N) instead.,"featureGroups[, group := 1:.N]"
R/internals.R,134,Variable and function name style should be camelCase.,featureGroups = .N = cond = ..cond = feature = NULL
R/internals.R,142,1:.N is likely to be wrong in the empty edge case. Use seq_len(.N) instead.,"fmNow = featureGroups[rep(1:.N, times = nFeaturesPerGroup)]"
For consistency and generality, maybe we should call it getSimulatedCounts
. Thoughts?
Since we are adding expression level as a feature in each group, rhythmicGroups
no longer makes sense as the datatable object name.
The Introduction vignette should be expanded on, explaining how the parameters to simphony are controlling the simulation output.
Do we need another vignette to show simulations of asymmetric oscillation?
Currently, the number of DR genes must be divisible by the number of rows in rhythmicGroups
. This should change so that any number of DR genes can be created, relative to the number of groups.
It has been decided that the repercussions of doing this (having an non-uniform number of genes per group across all groups) is fine.
Add a first-class function to getRhythmicExpr
which defaults to sin
. This should be used to generate the rhythmic component of the gene expression.
@jakejh do we want to let different rows in rhyhmicGroups
have their own rhythmic function? We can have it default to sin
if a function is not supplied for a row.
getRhythmicExpr
getSimulatedExpr
I would suggest adding the following lines to simphony.R.
#' @importFrom data.table data.table
#' @importFrom foreach foreach
This would let us simplify data.table::data.table
to data.table
and foreach::foreach
to foreach
.
makefunc = function(x) {x; function(m) x}
To have lintr ignore any of these issues, add the appropriate lines of those shown below to lint_ignore.csv at the top-level of the repository:
filename,line_number,message,line
R/internals.R,63,Compound semicolons are discouraged. Replace them by a newline.,makefunc = function(x) {x; function(m) x}
Warning, lint_ignore.csv contains lines not found in the current code:
R/accessories.R line 58: Variable and function name style should be camelCase.
r .N = time = mu = base = amp = rhyFunc = phase = period = NULL
R/accessories.R line 119: Variable and function name style should be camelCase.
r abund = .N = mu = sd = dispFunc = NULL
R/internals.R line 7: Variable and function name style should be camelCase.
r group = .N = phase = period = sd = NULL
R/internals.R line 132: Variable and function name style should be camelCase.
r featureGroups = .N = cond = ..cond = feature = NULL
Should we not allow a standard deviation of 0, to allow Gaussian simulations without noise?
Line 29 in f28075b
To maintain consistency across naming convention, rename the geneId
columns to gene
.
For example:
https://github.com/hugheylab/SimulatedExpression/blob/f6c8ee08a5bbcbe6b8a2b1d8254a64fb61416ce9/R/simulation.R#L39-L47
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.