rolkra / explore Goto Github PK

View Code? Open in Web Editor NEW

204.0 204.0 24.0 32.2 MB

R package that makes basic data exploration radically simple (interactive data exploration, reproducible data science)

Home Page: https://rolkra.github.io/explore/

License: Other

R 100.00%

data-exploration data-visualisation decision-trees eda r rmarkdown shiny tidy

explore's People

Contributors

Stargazers

Watchers

explore's Issues

Target in the report function can't be used interactively..

Thanks for creating such an amazing package!

I'm trying to use your package in a shiny app but the report function failed to run when reactive object was passed into the function.

Suppose I have a data frame generated from reactive tabexp, this work just fine.
report(tabexp(), output_dir = tempdir())

However, if I wanted to use one variable from this data frame as a target, which is generated from another reactive groups, like this
report(tabexp(), target = groups(), output_dir = tempdir())

it won't run.

The error is
Error in explore(., !!sym(var_name_target)) : variable 'groups()' not found

I tried to use paste function and then get the correct target variable out, it returns another error.

comd <- paste("report(tabexp(), target = ",groups(),",output_dir = tempdir())") parse(comd) Warning in file(filename, "r") : cannot open file 'report(tabexp(),target = trt ,output_dir = tempdir())': No such file or directory Error in file(filename, "r") : cannot open the connection
I'm wondering if this is an issue you are aware of or could you please point me the right way to use this function within interactive environment.

Thanks!

CRAN Check Warning running example get_nrow()

Warning in get_nrow(names(iris), ncol = 2) : get_nrow() is deprecated.

Horoscope Barchart: suggestion to add Count and (% percentage) value# inside each bar

Horoscope Barchart
in {explore} PKG:
Fun example!. 😀

But it actually shows
the expanding power of the {explore} PKG.

In very few lines of R code,
you:
- generated sample variables
and
- created a very information-rich plot
w/ multiple barcodes...

Suggestion:

Allow User
to add actual values of
**Count / Frequency ** and (% Percentage),
i.e.: 182 (15%),
inside EACH red & blue bar ?.
(via an optional T/F parameter/argument= in call to explore).

Many applications for this...

+ Happy 2023 to you too, Roland!.

Add nthreat paramer for explain_xgboost()

Add nthreat paramer for explain_xgboost()
to allow user to use more than 1 thread for training (to reduce training time of larger datasets)

explore(): move NA-info into subtitle

explore shows the number of NA-values in the title of the plot
move into subtitle

forcats 1.0.0 functionn deprecation

Running the command explore I get the following warning:
Warning: fct_explicit_na()was deprecated in forcats 1.0.0. ℹ Please usefct_na_value_to_level()instead. ℹ The deprecated feature was likely used in the explore package. Please report the issue to the authors.

New Function Not Accessible

It appears that with the latest version of the explore package having been added to CRAN (version 1.2.0), the following function is not accessible, which was newly added in the latest version:

check_vec_low_variance

This issue was identified based on the latest version of both RStudio (2023.12.1 Build 402) and R (ver. 4.3.2).

explore_bar() doesn't show NAs correctly in title

library(tidyverse)
library(explore)
library(palmerpenguins)
penguins %>% explore(sex)

Install on GNU/Linux

Great package. FYI: to install this on Debian (would also work on Ubuntu), I needed some additional dependencies, and the full installation process looked like this:

sudo apt install unixodbc unixodbc-dev
install.packages("odbc")
install.packages("explore")

add drop_var_() and drop_obs_() functions

drop_var_with_na()
drop_var_non_numeric()
drop_obs_with_na()

Three dimesional modeling (from Brice)

I propose that a similar process model be included in your explore package related to three-dimensional modeling. So, the way this would work is very similar to how you essentially captured the essence of an entire R package (xgboost) with one function!

There is an R package called svgViewR that essentially allows a user to model multivariate data in three dimensions using html-based interactions. Using a concept called MDS or multidimensional scaling, a single value is generated after which it is colorized by gradient, then plotted.

While this algorithm is remarkable, it would be even more compelling if it were captured in a single model and resulting plot using ONE function. The algorithm in toto can be found on page 15 of the current version of svgViewR, an R package currently available on CRAN.

To facilitate this effort, I worked with a colleague of mine to replicate what is referred to as the pair distance, called pdist in the svgViewR documentation, converting it to a separate function. This function, called pairDist, can be found in the quickcode package, also stored on CRAN. Since including this function would create a dependency in your code, it's up to you whether you want to use this function or use the pair distance mathematics provided in svgViewR.

I am envisioning that the output be represented as a list object which contains the following artifacts:
A data frame containing the original variables used in the analysis along which is appended the following additional variables:
pdist
colHex
color
cluster
If possible, the html file as a ggplot object which may or may not be possible to include

To facilitate these extra proposed (4) data elements added to the output, the following pseudo-code is provided:
library(DescTools)
library(xlsx)

points3d = as.data.frame(points3d)
points3d$pdist = pdist
points3d$colHex = col
points3d$col = HexToCol(points3d$colHex)
points3d$col = as.factor(points3d$col)
points3d$cluster = unclass(points3d$col)

Optionally, you could convert the 'points3d' data object to an Excel file, or alternatively, include an argument that would control for this:
write.xlsx(x = points3d, file = "filepath/points3d.xlsx", row.names = FALSE, sheetName = "points3D")

To facilitate complete transparency on an understanding of what is being proposed, the following explanation breaks down the code on page 15 into bullet points as I have already thoroughly studied the svgViewR package and its associated code. These are my remarks:

Library Inclusion:
- The code includes the svgViewR library, which is likely used for creating interactive 3D scatter plots in SVG (Scalable Vector Graphics) format.
Data Generation:
- Generates a matrix points3d with 300 rows and 3 columns.
- Each column is populated with random numbers generated from normal distributions with different standard deviations (3, 2, and 1).
SVG Initialization:
- Opens a new SVG file named 'plot_static_points.html' for writing.
Distance Calculation:
- Computes the Euclidean distance from each point in points3d to the mean point of all points.
- The distances are stored in the variable pdist.
Color Mapping:
- Defines a color gradient from red to blue using colorRampPalette.
- col_grad holds the gradient with 50 colors.
Color Assignment:
- Calculates colors for each point based on their distance using linear interpolation.
- The colors are assigned to the variable col.
SVG Plotting:
- Plots the 3D points in the SVG file using svg.points.
- The color of each point is determined by the previously calculated col.
SVG Frame Initialization:
- Initializes an SVG frame for the 3D points using svg.frame.
SVG File Closing:
- Closes the SVG file with svg.close().

In summary, this code generates a 3D scatter plot with 300 points, each having random coordinates. The color of each point is determined by its distance from the mean point, and the plot is saved in an SVG file named 'plot_static_points.html'. The use of the svgViewR library suggests that the resulting SVG file can be interactive, allowing users to manipulate and explore the 3D plot.

There is one more remarkable aspect to this function if you were to accept this idea - the html generated is a single self-contained file. This makes it incredibly easy to distribute!

I know there is a lot here but Roland I believe that creating a single function that can be used to model any number of numeric variables three-dimensionally would be worth the effort to create and add to the explore package. This function would do for three-dimensional modeling what your explain_xgboost function did for feature engineering.

I can answer any questions you may have regarding this proposal.

Warmest regards,

Brice

Small typo in Manual Exploration example

https://github.com/rolkra/explore#manual-exploration

One line of example code has capital I rather than small i in iris

Iris %>% explore_all(target = is_versicolor)

should read

iris %>% explore_all(target = is_versicolor)

[EXPLAIN] tab Tree not switching to a newly selected Var...

Hi Roland!

Your {explore} PKG (now v: 1.0.0),
is superb!.

Problem:
The [EXPLAIN] tab Tree
is not switching to a newly selected Var
(chosen in the left-side Menu)...

Steps to Reproduce Problem:
with
latest Rstudio v 576 - Ubuntu Linux 20.04 - {explore} PKG 1.0.0

library(explore)
explore(iris)
The [VARIABLE] tab opens OK
with "Sepal.Length", as default var.
I now choose from left menu:
- target: Species
- variable: Sepal.Width

[VARIABLE] tab displays plots OK!. Good.

But...
5) ...when I click on the [EXPLAIN] tab,
the decision Tree is still showing
the old vars:
"Petal.Length" and "Petal.Width",
not the new chosen var: Sepal.Width
with target: "Species".

This happens even
if I again select_ from the left-menu
in the [EXPLAIN] tab:
- target: Species
- variable: Sepal.Width

The displayed Tree is still the original one,
ie: No mention at all
of the new selected var: Sepal.Width
in the Tree in {EXPLAIN] tab.

Nothing changed in the Tree...

Help!
What am I missing?...

sfd99
latest Rstudio v 576 - Ubuntu Linux 20.04 - explore PKG 1.0.0
San Francisco

Artificial Data Error

The argument titled, 'factorise_target,' which exists in functions create_data_buy and create_data_random does not work as described. Both functions however, work as expected when this argument is not used.

Replicate error:
library(explore)
x = create_data_random(
obs = 200,
vars = 8,
target_name = "buy",
target1_prob = 0.4,
add_id = FALSE,
seed = 123,
factorise_target = TRUE
)

The function returns the following error:
Error in $<-.data.frame(*tmp*, "buy", value = integer(0)) :
replacement has 0 rows, data has 200

Error explore numeric variable with NA

explore() throws an error if variable is numeric and contains NAs

Can explore_all() function allow more than 1 single target= variable?

Hi Roland,

Thanks for the update
to the GREAT explore PKG.

Your work and dedication
is wonderful work
(as always...) ! .

Q:
Can the explore_all() function
allow more than 1 single target variable?.
ie:
instances COMBINING 2 target vars
at the same time...

EX:
penguins %>%
explore_all( target = species AND island )

The output report / plots
would show (for ex),
the density plot of variable:
body_mass_g ,
for penguins species: "Adelie"
also living in island: "Dream"...

I can see applications
of explore_all(),
where allowing MORE THAN 1 single var as the target=
would be VERY powerful
for analysis.

sfd99
San Francisco
explore PKG 1.1.0
Ubuntu Linux 20.04,
latest Rstudio and R.

Add interactivity using plotly

add interactivity using plotly

add_var_random_01() creates double

add_var_random_01() creates a variable with values 0 and 1. Type is double, but integer would be sufficient.

Native Pipe in explore_tbl()

explore_tbl() does not work if R version < 4.1 because native pipe is used

Hardcoded Windows paths

When running explore::explore(attitude) on a non-Windows device and clicking on report all, I get the following error:

Warning in normalizePath(path, winslash = winslash, mustWork = mustWork) :
  path[1]="C:/R/template_report_variable.Rmd": No such file or directory
Warning: Error in abs_path: The file 'C:/R/template_report_variable.Rmd' does not exist.
  3: runApp
  2: print.shiny.appobj
  1: <Anonymous>

There seem to be some paths hardcoded that are specific to Windows:

explore/R/explore.R

Lines 1255 to 1274 in 1f5d244

 if(input$target == "<no target>") { 

 input_file <- ifelse(run_explore_package, 

 system.file("extdata", "template_report_variable.Rmd", package="explore"), 

 "C:/R/template_report_variable.Rmd") 

 rmarkdown::render(input = input_file, output_file = output_file, output_dir = output_dir) 

 # report target with split 

 } else if(input$targetpct == FALSE) { 

 input_file <- ifelse(run_explore_package, 

 system.file("extdata", "template_report_target_split.Rmd", package="explore"), 

 "C:/R/template_report_target_split.Rmd") 

 rmarkdown::render(input = input_file, output_file = output_file, output_dir = output_dir) 

 # report target with percent 

 } else { 

 input_file <- ifelse(run_explore_package, 

 system.file("extdata", "template_report_target_pct.Rmd", package="explore"), 

 "C:/R/template_report_target_pct.Rmd") 

 rmarkdown::render(input = input_file, output_file = output_file, output_dir = output_dir) 

 }

Dependency R version >= 4.1.0

explore 1.0.0 uses native pipe |> in vignettes, therefore dependency R version >= 4.1.0 should be added (or go back to %>% in vignettes)

Font selection for "ggplot + theme_light()"

Hello,
I am using explore with the Data which has "Japanese" Font.

I hope "ggplot + theme_light()" will accept the font selection.

In usual use of ggplot with such a Data, I write as following.

theme_light(base_family="sans")

I tried to use "theme_set( theme_light(base_family = "sans") ", but it did not affected to explore. If there are any way to affect this kind of setting(s) to the graph of explore, please advice to me.

Thank you for your nice package!

best.

kazuo

	if(input$target == "<no target>") {
	input_file <- ifelse(run_explore_package,
	system.file("extdata", "template_report_variable.Rmd", package="explore"),
	"C:/R/template_report_variable.Rmd")
	rmarkdown::render(input = input_file, output_file = output_file, output_dir = output_dir)

	# report target with split
	} else if(input$targetpct == FALSE) {
	input_file <- ifelse(run_explore_package,
	system.file("extdata", "template_report_target_split.Rmd", package="explore"),
	"C:/R/template_report_target_split.Rmd")
	rmarkdown::render(input = input_file, output_file = output_file, output_dir = output_dir)

	# report target with percent
	} else {
	input_file <- ifelse(run_explore_package,
	system.file("extdata", "template_report_target_pct.Rmd", package="explore"),
	"C:/R/template_report_target_pct.Rmd")
	rmarkdown::render(input = input_file, output_file = output_file, output_dir = output_dir)
	}

rolkra / explore Goto Github PK

explore's People

Contributors

Stargazers

Watchers

Forkers

explore's Issues

Recommend Projects

Recommend Topics

Recommend Org