rapporter / rapport Goto Github PK
View Code? Open in Web Editor NEWrapport is an R package that facilitates the creation of reproducible statistical report templates
Home Page: https://rapporter.github.io/rapport
rapport is an R package that facilitates the creation of reproducible statistical report templates
Home Page: https://rapporter.github.io/rapport
Users might create some custom templates not incorporated in rapport. The usage of those templates is quite thorny ATM: users have to specify the full path of the template at each run.
This (well: @aL3xa) suggests to create some functions to easily add custom directories holding templates to tpl.find
path.
Keywords for the process: tpl.example
, assignInNamespace, methods for add
/remove
/reset
This one would require some CSS/JS tweaking, like adding skeleton and other JS plugins. It's here just for my reference.
As we have the following in custom.css
, each paragraph with an inline LaTeX formula gets centered:
.math-container {
text-align: center;
}
I might be able to fix this, but you would not love that :)
Could you please tweak it a bit?
Link to google groups on http://rapport-package.info/#discuss is dead.
rapport
class has many useful information in its structure like metadata
and inputs
besides the most important report
.
Unfortunately metadata
and inputs
are only based on template and not dealing with the command that returned the rapport
class - which would be really handy to.
The structure looks like now:
res <- list(
metadata = meta,
inputs = inputs,
report = report
)
I would suggest adding something like src
or call
to that list holding the raw command, eg..:
res <- list(
metadata = meta,
inputs = inputs,
call = match.call(),
report = report
)
I just did not wanted to modify your function, but if you would not have time and currently not working on template.R, just reply here and will take care of that modification. That would be important for exporting multiple examples to GH pages ATM but I would easily imagine other situations too when that would be handy (eg. saving a rapport
class to disk and later loading again to find out where the data sourced).
Just to have something to play with in the morning: Description shows NULL
in HTML export.
IMO, name tpl.header
is too uptight - it should be renamed to tpl.info
, 'cause that's exactly what you get by calling it. Likewise, tpl.header
should only extract the header part and that's it...
It would be awesome if package versions could be also specified in the packages
section of templates. With the help of https://github.com/christophergandrud/repmis multiple packages can be installed in the same R environment.
Maybe adding this feature as optional with only suggest
repmis?
This one came to my mind while developing t-test template. It would be nice to provide more control to input definition. For example, if I'm about to specify y
variable for t.test
function, I'd like it to either be factor
or numeric
. Right now, it's not possible to add multiple rules in input definition field, but you can do that via variable
convention, which will match "any" variable type (not to confuse with the Any
type).
So my idea was to provide multiple types in input definition, like factor,numeric
or smth. We should probably use ;
or any other non-comma-or-pipe separator, as they have special meaning.
Letting template writers to specify multiple examples in a template (e.g. with different input settings for the same dataset) would be a cool feature.
It would allow users to check different settings of a template with ease.
Some parts of the code to be changed:
tpl.example
(I would do that one)Suggestion for format: comma separated list of rapport
commands (in one line). You could easily write a regexp for that :)
Example: rapport('correlations', data=ius2009, vars=c('age', 'it.edu', 'it.leisure')), rapport('correlations', data=ius2009, vars=c('age', 'it.edu', 'it.leisure'), cor.plot=FALSE)
Okay, the idea is to add an R file (as @daroczig suggested), in which various rapport settings will take place. File should probably be named like .rapport.R
or something. This issue should summarise all user customisable settings. Feel free to add/request new ones. At first, we'll support all settings available in init.R
file.
Our rapport docs page it says that "number" etc. input fields's limit deals with the number of passed elements. This is strange, and I have must forgot something really obvious, please recall why we needed this kind of limit.
But even if this is OK, the result seems to be strange:
> tpl <- strsplit('<!--head
Title: test
Description: test
Author: test
x | number[1] | x | x
head-->
<%=x%>', '\\n')[[1]]
rapport(tpl,x=3)
+ + + + + + > Error in stopf("input \"%s\" has length of %d, and should be %s", name, :
input "x" has length of 3, and should be 1
So it seems that we rather check the passed one number if it's inside the limited interval, which makes more sense anyway. So e.g. I could create a template which would take one number as parameter between e.g. 1 and 10, which is working now (although the docs should be updated):
> tpl <- strsplit('<!--head
Title: test
Description: test
Author: test
x | number[1,10] | x | x
head-->
<%=x%>', '\\n')[[1]]
rapport(tpl,x=3)
+ + + + + + > _3_
But the current implementation does not allow users to specify the limits below 1. E.g. I would want to ask users to pass a number between 0
and 100
:
> tpl <- strsplit('<!--head
Title: test
Description: test
Author: test
x | number[0,100] | x | x
head-->
<%=x%>', '\\n')[[1]]
rapport(tpl,x=3)
+ + + + + + > Error in check.limit(gsub(re5, "\\3", x), "number") :
only positive integers should be provided as a limit
But I might even want to specify the limits as negative numbers.
Anyway, a quick fix would be really welcomed. @aL3xa: pls think about the limit specification and pls also fix the "number" input type to let users pass any numbers there (e.g. 0.231234
), which latter is a really high priority for rapporter.net.
So, you used global settings for rounding from getOption("digits")
, which can be a bit too verbose for "ordinary" usage. We agreed that 4 is OK. But should we add several levels of digit rounding, or let the print methods decide about that one? If you define stat helpers to, say, calculate mean and round result on 2 digits, that would be OK, but sometimes you may want to get 3 digits (p-values, ANOVA tables, etc.)
Should we do something like:
or should we let print methods decide? IMO, we need to think about this one in more details, but for now, for the sake of KISS principle, let's stick with 4 as default value.
rp.freq
should contain NA
and Valid
rows, so users could get more insight on missing data in their dataset.
IMO, p()
should be added as hook to inline chunks when output has length > 1.
I've just pushed a live demo of multiple levels of nested rapport
s in aL3xa/rapport@c675b18.
There I made up a 3-level templates in templates structure as follows:
It runs fine thanks to modifications done in rapport lately and later, but there are something to tweak :)
Just run an example from multivar-descriptive
, eg.:
rapport.html('multivar-descriptive', data=mtcars, vars=c('hp','wt'))
Sorry for not using ius2008
but here I need more then one numeric variable.
The problem in the output is that nested rapport classes retain the header level, which is not the best solution as is screws up the structure of the document IMHO.
Feature request: add an optional parameter to rapport
to be able to "lag" header levels by given value. Eg. that way we could call rapport
in templates (as nested template) with added parameter, like:
<!--head
....
head-->
## Title
rapport('template', inputs, header.lag=1)
If you are ready with modifying template.R
, I am open to implement this quite easily, just ping me.
Just committed (aL3xa/rapport@dcc13b2) a new feature of tpl.export
: exporting multiple rapport classes at once.
Check it out: tpl.export(tpl.example('example', 'all'))
A strange bug occured: an extra "Description" field is added to the bottom of the report. I'll investigate it, posting here not to forget.
@daroczig mentioned this earlier, dunno where, but here goes... officially: datasetRequired
should be set within rapport
call, so there's no need to define it explicitly in the template. Just *apply
through inputs
and see if any there's one that's not standalone. Job done.
Put URLs to posts on SO in helper documentation. \url{}
convention should be used.
Image caption is just a paragraph
with caption
class in a div
with figure
class (besides the img
tag) ATM.
@aL3xa: could you please tweak the CSS/JS part to center images and boost caption somehow? :)
Note to me: fix this!!!
* looking to see if a 'data/datalist' file should be added
* re-saving .R files as .rda
Error in rp.label(ius2008$gender) <- "Gender" :
could not find function "rp.label<-"
Execution halted
Error: Command failed (1)
It seems that some commit broke rapport
to work with multiple variables :(
See eg. anova.tpl
or correlations.tpl
, all results in the following error:
Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'normal' of mode 'function' was not found
in the assign part somewhere in lapply(inputs, function(x) { .... }
.
It seems that the added new rapport
option is causing the problem (mode='normal'
), sorry for that.
Similarly:
> rapport("anova", ius2008, resp = "leisure", fac = c("gender", "partner"), mode='performance')
Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'performance' of mode 'function' was not found
I am braindead now to debug. Templates with vectors works fine.
Please consider moving packages from Depends to Imports and import them in your NAMESPACE file.
Dependencies mess up the user's namespace, make R slower and have a risk of masking other functions. You should only use 'depends' for packages that you really want to expose to the user. If you are loading them only because they are needed by functions in your package, it is better to use import and not pollute the user's namespace.
IMO, there's too much clutter in inst/
folder. We should have a neat structure like in CI or Rails. I vote for assets
directory with css
and js
stuff, and definitely a separate directory for i18n
templates. Think about html
folder as includes
, etc.
If a table or data frame with row names is outputted after a list, it possibly breaks up pandoc exported formats.
Comment:
It's based on that the header's first element is missing and somehow it breaks up pandoc. Adding some extra space there solves the problem somehow :S
Possible workarounds till fix:
<%
'<!--- end of the list, ha! -->'
%>
Minimal example:
<!--head
Title: TEST
Author: My n
Description: test
head-->
- x
- y
<%
table(mtcars$am)
%>
- x
- y
<%
as.data.frame(table(mtcars$am))
%>
At this point, it's not possible to supply optional inputs other than CSV
or boolean
types. Template author should specify whether or not an input is required, and header parser should also be aware of differences between values and variables. For instance, "age"
can be just a string, and it can also refer to eponymous column name of supplied data.frame
.
My suggestions on this issue:
string
for string input, number
for numeric input*
or !
[
and ]
should remain as wildcard for length limits|
-delimited section in header. @daroczig if you have any better idea about this one, please do tell.So, in a nutshell, string
and number
inputs can be specified like this:
str | string[1] | String input | this is a string input
num | number[1] | Number input | this is a number input
we can specify mandatory field with *
and put something like =
after size limits to indicate default value, like this:
str | *string[1] = foo | label | description
or should we move required stuff in name section, rather than limit section. @daroczig, please send your suggestions!
With recent YAML
inputs we opened up a Pandora's box. Instead of doing slight redesign of input specifications, a need for rewrite of input specifications emerged. Rationale: we're doing this to make inputs feel more native to R, hence other users, hence make rapport
more cooler, etc.
While I was implementing integer
inputs, it came to me that there's no need for yet another input type, but we can pass something like integer: TRUE
to existing number
input. A plea for option
input to accept multiple values was also a game changer. Then why not making multiple
available for all standalone inputs? When it comes to string
inputs - why not performing more checks in there, like nchar
or regexp
validation, etc. That said, let's unlock this further.
As I like renaming stuff, this is a perfect opportunity to do it (again) 👍 , like changing mandatory
attribute to required
, or changing default
to value
. Backward compatibility is a must. Reason: they seem more intuitive. mandatory
was my "invention" and it sucks.
Key changes:
type
attribute and/or perform checks based on native R object attributes, like class
(or storage.mode
, and the likes)nchar
for strings or integer
for numbersWe have them for a good reason. I guess that inputs should have one additional attribute like: standalone
(I don't like it, doesn't sound native enough). Currently rapport
accepts only data.frame
objects, but when I think about #41 , it makes me wonder if we can expand the data
argument a bit. What if we can pass all kinds of objects in there, and match inputs more dynamically. Imagine this: you pass a list
or other recursive
object, and match its named attributes with the provided inputs.
string
Dilemma: how to perform nchar
checks? Should we provide only one number, or a vector to perform vectorised nchar
check? What about limit
s? Should we drop 'em, and replace with length
. In that case, I guess that length
will take over min
and max
attributes, as it feels more native.
- name: s
class: character
label: String input
description: Bla bla string string
required: TRUE
value:
- fee
- fi
- foo
- fam
nchar: 10
length:
min: 1
max: 100
regexp: "^.+$"
number
Of course, length
can take min
and max
, but I'll omit it here. Notice the change from number
to numeric
Q: should we add class: integer
or provide separate argument to bypass precision issue (in case you want round numbers larger than .Machine$integer.max + 1
). Or to check via storage.mode
, or dunno what.
- name: n
class: numeric
label: Number input
description: Bla bla numeric numeric
required: FALSE
value: 10
length: 1
boolean
Yes, length
can be different, as well as value
. Should we check anything other than that? I mean... what's specific to NA
and useful at the same time, so that it's worth implementing?
- name: b
class: logical
label: Logical input
description: Bla bla logical logical
required: TRUE
value: TRUE
length: 1
option
Okay, how should we make this native? My guess is either to provide a custom object (new class
), or to provide an additional argument that will allow matchng like match.arg
or something else.
I'll just brainstorm in here:
NA
and should we do that at all?names
attribute: this should be handy in, say populating select boxes in HTML form (label: value)I have just updated the roxygen comments to use the #'
convention instead of unique ##'
in aL3xa/rapport@e8cc9b7 and also did some debugging about R CMD check
yesterday.
I have found and fixed some not so neat solutions, but there some others left:
print
functions should not have any parameters it seems as those not being @export
ed as normal functions, but being S3 methods. I have cleared up print.rapport
to not take arguments, this should be done in other functions too. We might create a new function instead of print
and use the parameters there, print
should stand for default settings. I did not changed anything in @aL3xa's code there as not sure which of other functions use those.elem.eval.default
and elem.eval.rp.block
rp.label<-
which should be (and can be fixed)extract.meta
, which is a roxygen bug (?). This should be investigated too. If not manageable, then will have to update the 1.0
release there before sending to CRAN.
Setting this just for the sake of consistency. YAML (not R) should be used in header to define both metadata and inputs. Implementation is located in yaml-header
branch.
Current problems:
desc
and description
metadata field (not a YAML bug per-se, but should be handled correctly)Ideas & suggestions:
as.yaml
ain't one)Fixes to be done:
overflow: auto
) body of the table, header should hold still =)As you could see in L654
in template.R
, there are some assign statements going on. I always found it tedious to type/assign variable length. Why not doing some more things like adding precomputed input length, like <input name>.len
?
I would be convenient to make significant correlations/differences/whatever jump up from the rest of table. We need a simple wrapper for that, but it may be tricky to implement it in correlation tables "out of the box". This one just came to my mind, so no milestone will be set right now.
OK, dunno if this is a global rounding feature, but it's fugly when you see counts with .00
appended. As far as I can see, it's ascii
that does the damage:
> ascii(rp.freq("gender", data = ius2008))
**gender** **N** **pct** **cumul.count** **cumul.pct**
------- ------------ -------- --------- ----------------- ---------------
1 male 410.00 60.92 410.00 60.92
2 female 263.00 39.08 673.00 100.00
Total 673.00 100.00 673.00 100.00
------- ------------ -------- --------- ----------------- ---------------
Note that this happens if you explicitly set integers:
> mtcars$am <- as.integer(mtcars$am)
> str(head(mtcars[, c("cyl", "am")]))
'data.frame': 6 obs. of 2 variables:
$ cyl: num 6 6 4 6 8 6
$ am : int 1 1 1 0 0 0
> ascii(head(mtcars[, c("cyl", "am")]))
**cyl** **am**
------------------- --------- --------
Mazda RX4 6.00 1.00
Mazda RX4 Wag 6.00 1.00
Datsun 710 4.00 1.00
Hornet 4 Drive 6.00 0.00
Hornet Sportabout 8.00 0.00
Valiant 6.00 0.00
------------------- --------- --------
We already provide some helper variables in the "rapport" environment for user to built on like input.name
and e.g. input.iname
, but we do not add anything about the template.
So here goes: please add author
, title
and description
fields maybe with a tpl
or template
prefix to the envir. I just dunno how we missed this so far.
Template for normality tests should contain following tests:
from stats
:
shapiro.test
)ks.test
)from nortest
:
ad.test
)lillie.test
)cvm.test
)pearson.test
)sf-test
)with following graphs:
Just for future reference: instead of using tpl.meta
, tpl.inputs
and tpl.info
, summary
method should be extended.
I realised that we're using input labels and description from header only in tpl.inputs
and tpl.info
functions, in order to show a user-friendly info about template inputs. Why not saving them too in rapport
's evaluation environment?
rp.label
method currently operates only on atomic vectors, which sucks. IMO, we should extend it so that we don't get those pesky X[[1L]]
stuff as result from sapply
loops.
t-test template should contain:
inputs:
outputs:
OK, you can guess it from the title... something like rp.densityplot
should be added. It can be a simple wrapper for lattice
function, I don't mind... =)
Filing this issue not to forget a sly bug: a chunk which returns an R object in an if
condition is not returned by rapport
. Strange :)
Example code:
<%
if (TRUE) {
'DOES NOT WORK'
}
%>
<%
if (TRUE) 'WORKS'
%>
I hope it's not connected to invisible
return and the commit that changed that, will investigate tomorrow.
Not sure if this is avoidable:
> source("http://bioconductor.org/biocLite.R")
> biocLite("GenomicRanges")
> library("GenomicRanges")
> library("rapport")
> tpl.export(tpl.example('example', 'all'))
Error in envRefSetField(x, what, refObjectClass(x), selfEnv, value) :
"time" is not a field in class “Report”
> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rapport_0.31 reshape_0.8.4 plyr_1.7.1
[4] RColorBrewer_1.0-5 lattice_0.20-0 evaluate_0.4.1
[7] ascii_2.1 GenomicRanges_1.6.7 IRanges_1.12.6
loaded via a namespace (and not attached):
[1] grid_2.14.1 stringr_0.6
I run W7 64bit. I have pandoc installed (and in PATH), but when I run demo(rapport, ask = FALSE), it stops at step 6 (or there about) and says I don't have pandoc installed.
tpl.export(tpl.example('correlations', 'all'))
Error: Specified backend (pandoc) is not installed! Please see details in INSTALL file or rapport homepage (http://rapport-package.info).
Currently rapport() always assumes a dataframe, similar to e.g. ggplot2
. It would be nice if there would also be functionality that uses method-based reporting, similar to R's plot()
and print()
methods. There are a lot of advantages to levering R's class/method system. It will be easier to use for the user, and allow package developers to create report templates for their custom objects based on the class of the object, in the same way as they might define summary(), print() and plot() methods for their objects.
I think it would not be too hard to introduce this. You would start with a generic method:
report <- function (data, template, package ...) {
UseMethod("report")
}
and define some basic reports for standard methods:
report.numeric <-function(data, template = "default", package="rapport", digits = 5, align = "right", ...){
#the default is to use default_numeric.tpl from the package 'rapport'
tpl <- system.file(paste(template, "numeric.tpl", paste="_"), package=package)
stopifnot(file.exists(tpl));
rapport(tpl, data=data, digits=digits, algin=align, ...);
}
You would define one or more, flexible templates standard templates for the standard R classes in e.g. report.data.frame. report.list, report.matrix, etc. This way the user can do:
report(cars);
report(cars, digits=10);
report(cars, template="descriptives", somecustomarg=TRUE);
Additionaly, this allows package developers to include reporting templates in their packages. Hence Douglas Bates could define a function like:
report.lmer <- function(data, template = "multilevel", package="lme4", plotranef=TRUE, descriptives=TRUE, ...){
tpl <- system.file(template=template, package=package);
stopifnot(file.exists(tpl));
rapport(tpl, data=data, plotranef=plotranef, descriptives=descriptives ...);
}
So the advantage of this is not only that a user can do:
fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy);
print(fm1);
report(fm1);
And get something nice 'out of the box' as defined by Douglas bates. If none of the attached packages are defining a report.lmer, it will natually fall back on report.list
, that might try to do something standard. Also you can easily check the required arguments for a report for a certain class:
args(report.lmer)
htest
helper is in pretty uptight state, as it's not working properly. This is only reminder to myself.
U NO TOUCH IT OKAYS? =)
This is only for my reference. rp.desc
fails if only one dependent variable is provided. This needs a fix. =)
OK, so far we've been throwing an error if inline chunk evaluates expression that yields result with length >= 1. What if I have a character string and I'd like to get a pretty output?
x <- c("foo", "bar", "fee", "fi")
and I'd like to get this:
Elements of vector `x` are: "foo", "bar", "fee" and "fi".
So this hook should:
,
, and
) - e.g. c("Gergely", "Alex")
should yield Gergely and Alex
and you may want to add Hungarian copula'
, "
)Add ANOVA template, with automatic recognition of One/Two-Way variant(s). In a nutshell, it should contain following sections:
plot.lm
stuff)For One-Way variant, means plot should take place, for Two-Way ANOVA - interaction plot. Maybe a helper for ANOVA should be added (rp.anova
, of course), just to play well with the template conventions
In order to make transition from old syntax to YAML (new) one less painful, it'll be kewl to write a helper that takes template with old syntax and converts it... you get the rest. @daroczig I reckon you'll like the idea, just drop a few more lines to tell me what do you think about it. Suggestions are welcome, and I'll also take no, you're crazy, don't f***in' do it!!! as an answer. =)
We were aware of the fact that nesting templates in each other might have a small performance issue, but in my last commit I could set up some 3-4 levels of sub-templates, which showed decent differences - although almost the same was run:
> system.time(rapport.html("nortest", ius2008, var = "age"))
Trying to open /tmp/RtmpMXGtSg/R-report7b9eb363.html with xdg-open...
user system elapsed
0.497 0.020 0.514
> system.time(rapport.html("descriptives-univar-numeric", ius2008, var = "age"))
Trying to open /tmp/RtmpMXGtSg/R-report54c5ab21.html with xdg-open...
user system elapsed
1.326 0.050 1.382
> system.time(rapport.html("descriptives-univar", ius2008, var = "age"))
Trying to open /tmp/RtmpMXGtSg/R-report2dd50213.html with xdg-open...
user system elapsed
2.730 0.030 2.778
> system.time(rapport.html("descriptives-multivar.tpl", ius2008, vars = c("age")))
Trying to open /tmp/RtmpMXGtSg/R-report248d48fe.html with xdg-open...
user system elapsed
5.512 0.114 5.651
TODO: investigate which lapply
loop is the devil one - which I can handle tomorrow.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.