coorsaa / shinymlr Goto Github PK

View Code? Open in Web Editor NEW

92.0 28.0 20.0 1.34 MB

shiny-mlr: Integration of the mlr package into shiny

License: Other

R 46.80% CSS 44.16% JavaScript 9.04%

r mlr shiny machine-learning data-analysis data-visualization shiny-apps r-package

shinymlr's Introduction

shinyMlr: Integration of the mlr package into shiny

With help of this package mlr can be accessed via a shiny interface.

This project has started last year and contains now mlr's major functionalities:

Data import
Data exploration and preprocessing
Creating regression or classification tasks
Making use of any mlr learner
Tuning of learner hyper parameters
Training and predicting a model
Benchmark experiments with different learners and measures
Many visualisations

Installation and starting shinyMlr

You can simply install the package from github:

devtools::install_github("mlr-org/shinyMlr")

Starting shinyMlr:

runShinyMlr()

Problems with rJava on OSX Yosemite

If rJava fails to load, this link might be helpful!

shinymlr's People

Contributors

Stargazers

Watchers

Forkers

philipppro konerukeerthi heoa ssyzyg drbinzhao grseb9s sixianghu guptam vineethvijaysv shankarchavan sarikayamehmet sebggruber guhjy gridl nagdevamruthnath actuarial-tools hieuqtran lucaz01 umeshrrao

shinymlr's Issues

For some datasets we don't get predictionplots

e.g. for OpenML dataset 21, when choosing prediction plot, we cannot choose the needed features

Bug in train panel when parameter of learner is changed and trained again

make the TabPanel structure better

I think we could put all visualisations into a NavbarMenu.
Maybe also one navbar menu for train/predict.

Better benchmark panel

The benchmark panel needs improvement. To dos:

Resampling LOO is not working
Inputs should go into a sidebar
Measures are broken a bit. Should only appear if applicable for predict type
The whole panel needs better (missing) input handling and checking (especially plots)
Maybe a button "stop" if a (long) experiment wants to be stopped?

Tuning panel needs improvement

We need exception / input handling when tuning fails.
Example where already 2 things fail:
iris-task, randomForest (with predict responses), gridSearch, logos, parallel tuning, tune mtry and ntree.
If you don't set upper there's the raw error code that assertion on upper failed.
If you do you get the error code because logloss needs predict.type "prob"
Also it seems that some inputs don't update after tuning. E.g: I tried to set parallel to "no" after I ran into the above error. When I hit tune again I still get the same error code, so it seems we still use the parallel mode.
Update:
Yes parallel mode needs to be switched off after use. I'm still running on 2 cores if I close the app and restart in same R-Session

Online demo

There should be an online demo where at least developers can try this out.

Move dashboard sideboard to the top

I now agree with @PhilippPro that we should rather move the sidebar to the top. I see at least 2 reasons why this is better:

When we have data with many variables we can fit more on the screen. Also for example in the predict panel the pagination bar beneath the dataTables produces ugly lines breaks, which shouldn't happen anymore the other way.
(Update: The latter issue i quick fixed with pagingType = "simple" I think in that spot with can live with that)
Secondly, and maybe the more important one: On mobile devices the sidebar pushes the content of the body to the right. Especially on smaller screens it can get a real pain to use the app. (Sure we don't really support mobile use properly so far, but that day will come eventually right?)

I didn't dive deeply into the topic so far but after googling 5min it doesn't seem trivial to move the dashboardSidebar to the top. Maybe we need to find a smart way using CSS or we need to switch back to navlistBar (which is hopefully easy integrateable with the rest of the shinydashboard layout ).

Handling missing inputs and reactive values

We already started using req(), validate(), need() whenever we worked on a specific reactive value (which is the recommended way of dealing with missing inputs etc. in shiny). However, a lot of objects still contain these if(is.null(obj)) return(NULL) statements which should be removed. Also our error handling was pretty primitive so far. Mostly we just put req() everywhere a reactive value was needed. We need to check if we even need a check and rethink if we need to prompt a message to the user.

So to-dos here are:

write function reqAndAssign and "needy" helper functions to safe typing
check all objects for input handling:
- Remove if(is.null(obj)) return(NULL) statements
- Insert require/needy function where needed and remove where unnecessary
  (shiny is smart and we only need to call it the first time that reactive value is used.)
- (Maybe while we're on it customize style of error messages)

implement tuning

Hyperparametersettings Panels too big

I am really impressed about the functionalities and appearance of the App. Even an automatic report system!

One small thing that bothered me, when testing, was that the Panels for the hyperparameters are too big. You maybe have to scroll down a lot if you search a specific hyperparameter.

Optimize current state of app

Making a little list here what the next development steps will be.

Sidenote: BLOCK for introducing new features until all this is done:

All tabPanels need to be properly reviewed and possibly refactored:

learners: too complicated, introduce reactiveValues
train/predict: input handling is not great when data, is missing or prediction failed
preprocessing: undo-button and related with that is handling and assignment of data$data needs checking
task: again data$data!
other tabs: imho ok but see below.

This somehow is related to the second checkbox but I want to stress it here separately so we really do this properly: Review everything for style guide, function usage:

variable names: Maybe we need to deviate from Bernd's style guide here because we get problems with . when we use the variable in a java-expression. Right now we use both _ and ., which is bad.
@Coorsaa let's agree on a solution for this on Monday.
reactiveValues and reactive-functions need to be used in correct context consistently
stringi for string-operations everywhere
checkmate should be removed. We only use it in 2/3 places and should be done with needy-functions within shiny

Find a better design for the UI and apply everywhere. Ideas for now:

Use flexdashboard to move sidebar to top. Our section specific inputs that are now in our
makeSidebar function can then go in a sidebarPanel. Maybe we can even use the one from shiny dashboard for that?!
Colors, button style, ... (I think the radio buttons are super ugly, dunno if thats only me though 😄 )

explanatory texts for each step/panel of the app

switch plots from ggplot to ggvis?

ggvis has a similar structure to ggplot but produces interactive plots.

mini issue: benchmark: the "logging" line works a bit stupidly

at first it is positioned at top. then below bmr table.

in general i am not sure how to handle this.
maybe display it instead of the table then hide it?

File/Folder management

We're having quite some files in the root directory now and I'm sometimes a bit annoyed when looking for a file a couple of seconds 😄
Probably there are not much more files to be added in the near future but I would vote for separating the files better by restructuring a bit. Maybe sth like this:

helpers (ideally with renaming helper files to tabName_helpers.R
server
ui
new file app.R in the root directory

Here's a minimal example:
https://github.com/daattali/advanced-shiny/tree/master/split-code

@Coorsaa what do you reckon?

How to handle untyped learner params

Currently we have a textInput() for UntypedLearnerParams. However, they are passed down to makeLearner() as characters, so right now they only work if the param value is actually of type character.

Error in Tuning when setting numeric factor levels for discrete parameters

We get an error e.g.: Setting hyperpars failed: Error in setHyperPars2.Learner(learner, insert(par.vals, args)) : 1 is not feasible for parameter 'surrogatestyle'! for discrete params which have numeric values as factor levels, when trying to tune them.

META: Will any of you guys present this work on the useR 2017?

Hi,
I think it would be really nice to see this work presented at the useR 2017. What are your thoughts?

set tuned param values as input values in learner section

Right now, we always have the param defaults in the input boxes as predefined values in the learner section. It would be nice if we could change them to the tuned ones (after tuning).

OpenML import broken

fails immediately with error:

Assertion on 'data.id' failed: Must have length 1.

I think the error was introduced by merging #42 since it worked fine 2 days
ago. Looking at the diff though I'm not sure what exactly caused the bug since getOMLDataSet wasn't touched. However, I'll try fix this in master now since it's holding us up.

benchmark: predict.type = "prob"

This option is implemented for "train, predict, performance". Make it available for benchmark also.

Scrolling down in "Data Summary"

I know that this is probably only preliminary state, but when I scroll down in the "Data Summary" panel, I cannot see the whole plots, only half.

Better train & predict panel

Currently we have separate sub menus for train, predict and performance.
All look really empty. Maybe summarize in one tab?
Also we need to review the predict tab specifically:
Do we really need to display the data that is uploaded for prediction?
Also it should be possible to extract the predictions afterwards
(corresponding issue #18 )

import: get data from openml

Improve function sMakeTask

We currently specifically ask for class numeric / factor of the target to create the regression and classification task, respectively. We should also allow classes integer and character since mlr allows this out of the box, so it would be an easy fix for us.

mini issue: task: default id must handled correctly

currently we took the id from the filename of the csv. but we now can also import from mlr and possibly other sources.

train and benchmark: make preprocessing (Wrapper) available

Tuning
Imputation
Variable Selection

maybe with nested resampling.

convert functions as.factor(), as.numeric()

we should add those for data preprocessing. I would suggest to add them directly as further preprocessing methods. Do u agree @florianfendt ?

cannot create another task after changing or preprocessing data

Showing Message when loading OpenML datasets

Downloading the OpenML datasets are taking a long time but it's unclear to the user what
happens, since screen just stays clear.

benchmark: mini issue: default measures for task should be selected by default

Benchmark not working

I get an info message "you didn't create a learner yet" when trying to do a benchmark experiment.

predict: make it possible to extract predictions after prediction

e.g. in csv format

Visualisations: ROC plots

Are not implemented yet.

benchmark broken

it seems that the learnes are not passed into the benchmark function. I get Error: object 'ls.ids' not found

Time estimation in advance and progress bar

Would be cool to have a time estimation in advance after choosing a learner and a progress bar, so you see how long it still takes to train/benchmark your model.

Improve interactive report

Since today we have a new panel for rendering interactive reports. However, there's still a lot to improve. As of now we mainly support plots and some small learner and data summaries. This issue is here to collect ideas what we can add/improve and file bugs concerning this panel.

Error when predicting randomForest with se estimation

When training the randomForest on bh.task with se estimation, I get an error when predicting: Error: argument must be coercible to non-negative integer. This continues of course for performance, visualizations and benchmarking as well, since prediction is not possible.

mini issue: benchmark plots: the rank plot seems pretty stupid

i guess we can simply remove it

new tab: train some models on complete data

currently, we benchmark some models and display the results.
but one should also have the possibility to train the models on the data, to analyse them

Learner selection & construction

clickable OpenML data table

Idea: the user explores the OpenML datasets in the "explore OpenML" section, clicks on a dataset he likes in the data frame output and the dataset will automatically be imported.

better output for calculateConfusionMatrix

i think you want calculateROCMeasures, thats a new function
@ja-thomas

Implement data preprocessing

Make a new tab where the user can see a quick summary of the data, like:
Any Na's, constant features etc.
Then make preprocessing available with help of impute, capLargeValues etc.

After setting probability to "yes", setting threshold to slow ends up in a loop

First thing I have to say is, that I really like all the new features, especially the new section with preprocessing.

I discovered a small bug. After setting your classification task you can go to your learner section and set your learner to probability "yes". If you immediately click on the next learner (you have to be quick enough) the shiny app keeps changing from the first to the second learner all the time. Hope you can reproduce this small bug.

Visualisations: make it conditional on the task

Show the respective visualisation tool only if accessible. (e.g. ROC Analysis only available for binary classification)

mini issue: benchmark: repcv and holdout are missing

import: get data from arff files

Learner recommendation

We could order the learners due to their performance and give somehow some recommendations on them.

Improve plotly plots

we need to reiterate over the plotly plots.
For some plots the reactive text of the plots is not nice.
The summary plots show for example: as.numeric(d[, feature]) which should be changed to the actual feature name.
Check all other plotly objects too when we do this!