rolkra / explore Goto Github PK
View Code? Open in Web Editor NEWR package that makes basic data exploration radically simple (interactive data exploration, reproducible data science)
Home Page: https://rolkra.github.io/explore/
License: Other
R package that makes basic data exploration radically simple (interactive data exploration, reproducible data science)
Home Page: https://rolkra.github.io/explore/
License: Other
Thanks for creating such an amazing package!
I'm trying to use your package in a shiny app but the report function failed to run when reactive object was passed into the function.
Suppose I have a data frame generated from reactive tabexp, this work just fine.
report(tabexp(), output_dir = tempdir())
However, if I wanted to use one variable from this data frame as a target, which is generated from another reactive groups, like this
report(tabexp(), target = groups(), output_dir = tempdir())
it won't run.
The error is
Error in explore(., !!sym(var_name_target)) : variable 'groups()' not found
I tried to use paste function and then get the correct target variable out, it returns another error.
comd <- paste("report(tabexp(), target = ",groups(),",output_dir = tempdir())") parse(comd) Warning in file(filename, "r") : cannot open file 'report(tabexp(),target = trt ,output_dir = tempdir())': No such file or directory Error in file(filename, "r") : cannot open the connection
I'm wondering if this is an issue you are aware of or could you please point me the right way to use this function within interactive environment.
Thanks!
Warning in get_nrow(names(iris), ncol = 2) : get_nrow() is deprecated.
Horoscope Barchart
in {explore} PKG:
Fun example!. ๐
But it actually shows
the expanding power of the {explore} PKG.
In very few lines of R code,
you:
- generated sample variables
and
- created a very information-rich plot
w/ multiple barcodes...
Suggestion:
Allow User
to add actual values of
**Count / Frequency ** and (% Percentage),
i.e.: 182 (15%),
inside EACH red & blue bar ?.
(via an optional T/F parameter/argument= in call to explore).
Many applications for this...
+ Happy 2023 to you too, Roland!.
Add nthreat paramer for explain_xgboost()
to allow user to use more than 1 thread for training (to reduce training time of larger datasets)
explore shows the number of NA-values in the title of the plot
move into subtitle
Running the command explore
I get the following warning:
Warning:
fct_explicit_na()was deprecated in forcats 1.0.0. โน Please use
fct_na_value_to_level()instead. โน The deprecated feature was likely used in the explore package. Please report the issue to the authors.
It appears that with the latest version of the explore package having been added to CRAN (version 1.2.0), the following function is not accessible, which was newly added in the latest version:
This issue was identified based on the latest version of both RStudio (2023.12.1 Build 402) and R (ver. 4.3.2).
library(tidyverse)
library(explore)
library(palmerpenguins)
penguins %>% explore(sex)
Great package. FYI: to install this on Debian (would also work on Ubuntu), I needed some additional dependencies, and the full installation process looked like this:
sudo apt install unixodbc unixodbc-dev
install.packages("odbc")
install.packages("explore")
drop_var_with_na()
drop_var_non_numeric()
drop_obs_with_na()
I propose that a similar process model be included in your explore package related to three-dimensional modeling. So, the way this would work is very similar to how you essentially captured the essence of an entire R package (xgboost) with one function!
There is an R package called svgViewR that essentially allows a user to model multivariate data in three dimensions using html-based interactions. Using a concept called MDS or multidimensional scaling, a single value is generated after which it is colorized by gradient, then plotted.
While this algorithm is remarkable, it would be even more compelling if it were captured in a single model and resulting plot using ONE function. The algorithm in toto can be found on page 15 of the current version of svgViewR, an R package currently available on CRAN.
To facilitate this effort, I worked with a colleague of mine to replicate what is referred to as the pair distance, called pdist in the svgViewR documentation, converting it to a separate function. This function, called pairDist, can be found in the quickcode package, also stored on CRAN. Since including this function would create a dependency in your code, it's up to you whether you want to use this function or use the pair distance mathematics provided in svgViewR.
I am envisioning that the output be represented as a list object which contains the following artifacts:
A data frame containing the original variables used in the analysis along which is appended the following additional variables:
pdist
colHex
color
cluster
If possible, the html file as a ggplot object which may or may not be possible to include
To facilitate these extra proposed (4) data elements added to the output, the following pseudo-code is provided:
library(DescTools)
library(xlsx)
points3d = as.data.frame(points3d)
points3d$pdist = pdist
points3d$colHex = col
points3d$col = HexToCol(points3d$colHex)
points3d$col = as.factor(points3d$col)
points3d$cluster = unclass(points3d$col)
Optionally, you could convert the 'points3d' data object to an Excel file, or alternatively, include an argument that would control for this:
write.xlsx(x = points3d, file = "filepath/points3d.xlsx", row.names = FALSE, sheetName = "points3D")
To facilitate complete transparency on an understanding of what is being proposed, the following explanation breaks down the code on page 15 into bullet points as I have already thoroughly studied the svgViewR package and its associated code. These are my remarks:
Library Inclusion:
svgViewR
library, which is likely used for creating interactive 3D scatter plots in SVG (Scalable Vector Graphics) format.Data Generation:
points3d
with 300 rows and 3 columns.SVG Initialization:
Distance Calculation:
points3d
to the mean point of all points.pdist
.Color Mapping:
colorRampPalette
.col_grad
holds the gradient with 50 colors.Color Assignment:
col
.SVG Plotting:
svg.points
.col
.SVG Frame Initialization:
svg.frame
.SVG File Closing:
svg.close()
.In summary, this code generates a 3D scatter plot with 300 points, each having random coordinates. The color of each point is determined by its distance from the mean point, and the plot is saved in an SVG file named 'plot_static_points.html'. The use of the svgViewR
library suggests that the resulting SVG file can be interactive, allowing users to manipulate and explore the 3D plot.
There is one more remarkable aspect to this function if you were to accept this idea - the html generated is a single self-contained file. This makes it incredibly easy to distribute!
I know there is a lot here but Roland I believe that creating a single function that can be used to model any number of numeric variables three-dimensionally would be worth the effort to create and add to the explore package. This function would do for three-dimensional modeling what your explain_xgboost function did for feature engineering.
I can answer any questions you may have regarding this proposal.
Warmest regards,
Brice
https://github.com/rolkra/explore#manual-exploration
One line of example code has capital I rather than small i in iris
Iris %>% explore_all(target = is_versicolor)
should read
iris %>% explore_all(target = is_versicolor)
Hi Roland!
Your {explore} PKG (now v: 1.0.0),
is superb!.
Problem:
The [EXPLAIN] tab Tree
is not switching to a newly selected Var
(chosen in the left-side Menu)...
Steps to Reproduce Problem:
with
latest Rstudio v 576 - Ubuntu Linux 20.04 - {explore} PKG 1.0.0
library(explore)
explore(iris)
The [VARIABLE] tab opens OK
with "Sepal.Length", as default var.
I now choose from left menu:
- target: Species
- variable: Sepal.Width
[VARIABLE] tab displays plots OK!. Good.
But...
5) ...when I click on the [EXPLAIN] tab,
the decision Tree is still showing
the old vars:
"Petal.Length" and "Petal.Width",
not the new chosen var: Sepal.Width
with target: "Species".
This happens even
if I again select_ from the left-menu
in the [EXPLAIN] tab:
- target: Species
- variable: Sepal.Width
The displayed Tree is still the original one,
ie: No mention at all
of the new selected var: Sepal.Width
in the Tree in {EXPLAIN] tab.
Nothing changed in the Tree...
Help!
What am I missing?...
sfd99
latest Rstudio v 576 - Ubuntu Linux 20.04 - explore PKG 1.0.0
San Francisco
The argument titled, 'factorise_target,' which exists in functions create_data_buy and create_data_random does not work as described. Both functions however, work as expected when this argument is not used.
Replicate error:
library(explore)
x = create_data_random(
obs = 200,
vars = 8,
target_name = "buy",
target1_prob = 0.4,
add_id = FALSE,
seed = 123,
factorise_target = TRUE
)
The function returns the following error:
Error in $<-.data.frame
(*tmp*
, "buy", value = integer(0)) :
replacement has 0 rows, data has 200
explore() throws an error if variable is numeric and contains NAs
Hi Roland,
Thanks for the update
to the GREAT explore PKG.
Your work and dedication
is wonderful work
(as always...) ! .
Q:
Can the explore_all() function
allow more than 1 single target variable?.
ie:
instances COMBINING 2 target vars
at the same time...
EX:
penguins %>%
explore_all( target = species AND island )
The output report / plots
would show (for ex),
the density plot of variable:
body_mass_g ,
for penguins species: "Adelie"
also living in island: "Dream"...
I can see applications
of explore_all(),
where allowing MORE THAN 1 single var as the target=
would be VERY powerful
for analysis.
sfd99
San Francisco
explore PKG 1.1.0
Ubuntu Linux 20.04,
latest Rstudio and R.
add interactivity using plotly
add_var_random_01() creates a variable with values 0 and 1. Type is double, but integer would be sufficient.
explore_tbl() does not work if R version < 4.1 because native pipe is used
When running explore::explore(attitude)
on a non-Windows device and clicking on report all, I get the following error:
Warning in normalizePath(path, winslash = winslash, mustWork = mustWork) :
path[1]="C:/R/template_report_variable.Rmd": No such file or directory
Warning: Error in abs_path: The file 'C:/R/template_report_variable.Rmd' does not exist.
3: runApp
2: print.shiny.appobj
1: <Anonymous>
There seem to be some paths hardcoded that are specific to Windows:
Lines 1255 to 1274 in 1f5d244
explore 1.0.0 uses native pipe |> in vignettes, therefore dependency R version >= 4.1.0 should be added (or go back to %>% in vignettes)
Hello,
I am using explore with the Data which has "Japanese" Font.
I hope "ggplot + theme_light()" will accept the font selection.
In usual use of ggplot with such a Data, I write as following.
I tried to use "theme_set( theme_light(base_family = "sans") ", but it did not affected to explore. If there are any way to affect this kind of setting(s) to the graph of explore, please advice to me.
Thank you for your nice package!
best.
kazuo
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.