boxuancui / dataexplorer Goto Github PK
View Code? Open in Web Editor NEWAutomate Data Exploration and Treatment
Home Page: http://boxuancui.github.io/DataExplorer/
License: Other
Automate Data Exploration and Treatment
Home Page: http://boxuancui.github.io/DataExplorer/
License: Other
Thanks for the library!
Would it be possible to add an option for a title to the plots?
I think all it would require is
function(data, title=NULL) {
[ggplot code] +
ggtitle(title)
Ryan
NA
.Hey sir! Wondering if it's possible to set it up so the library installs rprojroot as part of the install process. I didn't have it installed and as such it popped an error for me.
Warning message from data.table:
Warning message:
In[.data.table
(data, ,:=
(ind, NULL), with = FALSE) :
with=FALSE together with := was deprecated in v1.9.4 released Oct 2014. Please wrap the LHS of := with parentheses; e.g., DT[,(myVar):=sum(b),by=a] to assign to column name(s) held in variable myVar. See ?':=' for other examples. As warned in 2014, this is now a warning.
Add SetNaTo()
.
If selected feature is discrete, do box plot for all categories. Otherwise, split continuous features into quartiles and treat it as discrete.
Do not use cat()
.
checking Rd cross-references ... WARNING
Missing link or links in documentation object 'BarDiscrete.Rd':
'data.table'
Missing link or links in documentation object 'CollapseCategory.Rd':
'data.table'
Missing link or links in documentation object 'CorrelationContinuous.Rd':
'data.table'
Missing link or links in documentation object 'CorrelationDiscrete.Rd':
'data.table'
Missing link or links in documentation object 'DensityContinuous.Rd':
'data.table'
Missing link or links in documentation object 'DropVar.Rd':
'data.table'
Missing link or links in documentation object 'GenerateReport.Rd':
'data.table'
Missing link or links in documentation object 'HistogramContinuous.Rd':
'data.table'
Missing link or links in documentation object 'PlotMissing.Rd':
'data.table'
Missing link or links in documentation object 'SetNaTo.Rd':
'data.table'
Missing link or links in documentation object 'SplitColType.Rd':
'data.table'
See section 'Cross-references' in the 'Writing R Extensions' manual.
https://www.r-project.org/nosvn/R.check/r-devel-windows-ix86+x86_64/DataExplorer-00check.html
When quiet
argument is not supplied, the function will not print report location, instead, generate an error.
num_all_missing
is missing.
In addition,
binwidth
causing error.Hi,
I've just tried DataExplorer and have followed the minimal example:
library(DataExplorer)
GenerateReport(iris)
It seems that the knitr
part runs OK, but then it stops at the end with error:
Error in file(con, "w") : cannot open the connection
In addition: Warning message:
In file(con, "w") : cannot open file 'report.knit.md': Permission denied
I have also tried the diamonds
example with the same result.
Any hint on what may happen?
Thank you,
> sessionInfo()
R version 3.2.4 (2016-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Gentoo/Linux
locale:
[1] LC_CTYPE=ca_AD.UTF8 LC_NUMERIC=C LC_TIME=ca_AD.UTF8 LC_COLLATE=C LC_MONETARY=ca_AD.UTF8 LC_MESSAGES=ca_AD.UTF8 LC_PAPER=ca_AD.UTF8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=ca_AD.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_2.1.0 DataExplorer_0.2.4 vimcom_1.2-6 setwidth_1.0-4 colorout_1.0-3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.3 knitr_1.12.3 magrittr_1.5 munsell_0.4.3 colorspace_1.2-6 stringr_1.0.0 plyr_1.8.3 tools_3.2.4 grid_3.2.4 data.table_1.9.6 gtable_0.2.0 htmltools_0.3.5
[13] yaml_2.1.13 digest_0.6.9 gridExtra_2.2.1 reshape2_1.4.1 formatR_1.3 evaluate_0.8.3 rmarkdown_0.9.5 labeling_0.3 stringi_1.0-1 scales_0.4.0 chron_2.3-47
>
Worked beautifully on the sample data sets. I tried on a personal data set and received the following error:
|...................................... | 59%
label: density_continuous
|.......................................... | 65%
ordinary text without R code
|.............................................. | 71%
label: correlation_continuous
Quitting from lines 51-52 (report.rmd)
Error: Aesthetics must be either length 1 or the same as the data (100): x, y, fill
All data are transformed into data.table
for speed. Might want to detect the input format and output the same format.
CollapseCategory
SplitColType
PlotMissing
maybe?In the following line, observations are not rows.
There are **`r format(nrow(data), big.mark = ",")`** observations (rows)
In addition,
At first: I like your package very much, it works perfect for data.frames.
But I am looking for a functionality to get quick information on more complex Data, as structured lists or multidimensional arrays. What are the names, where are the tables etc.
Or maybe, to report the content of a *.RData file (Data, functions, ...)?
Is it possible to include this in your Package?
The example looks very nice. I am trying to run it on my own data, but I am getting the following error:
label: correlation_continuous
Quitting from lines 51-52 (report.rmd)
Error in seq.default(from = best$lmin, to = best$lmax, by = best$lstep) :
'from' must be of length 1
I have installed development version. I am not getting errors any more. Report is generated with a warning:
Warning message: In writeLines(if (encoding == "") res else native_encode(res, to = encoding), : invalid char string in output conversion`
And report is unreadable.
Reported by Uros Godnov:
I've tried your package and when running GenerateReport(mydata) I get the following error:
Quitting from lines 58-59 (report.rmd)
Error in BarDiscrete(data) : No Discrete Features
CRAN check results: https://cran.r-project.org/web/checks/check_results_DataExplorer.html
Hi
Where is package "eda" that GenerateReport requires?
GenerateReport<-
function (input_data, output_file = "report.html", output_dir = getwd(),
...)
{
report_dir <- system.file("rmd_template/report.rmd", package = "eda")
Set percentage to 1 - current threshold
Something like ggplot2::guides(fill = guide_legend(nrow = 1))
To re-produce:
data <- data.frame("a" = as.factor(round(rnorm(500, 10, 5))), "b" = rexp(500, 1:500))
table(data$a)
CollapseCategory(data, "a", 0.2, update = TRUE) ## data is not updated
table(data$a)
Instead, use reshape2::dcast
for all levels of a discrete feature.
Ignore and keep indicated categories when collapsing.
Reported by @Raelili :
The color scale for correlation heat map looks misleading sometimes. For example, a -0.8 correlation looks less correlated than a 0.1 correlation. The color scale should be fixed to either [-1, 1] or the maximum of both absolute values.
The feature should mostly be response variable, so should support both discrete and continuous scale.
http://style.tidyverse.org/
To deprecate old functions, see https://stackoverflow.com/a/10145627/2158269.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.