Git Product home page Git Product logo

tblcheck's Introduction

tblcheck tblcheck hex logo

CRAN status R-CMD-check

tblcheck provides functions for grading tibbles, data frames, and vectors with gradethis.

gradethis is designed for use with gradethis in learnr tutorials. We recommend that you first be comfortable grading learnr tutorials with gradethis before you begin using tblcheck. You can learn more with the gradethis package documentation and the learnr package documentation.

A screenshot of the tblcheck package in action. An exercise starts with the following instructions: 'Create a tibble with two columns. The first column should be called "fruit" and contain the value "tomato". The second column should be called "color" and contain the value "red".' The student has entered the following code: 'tibble(vegetable = "tomato", color = "red")'. The tblcheck package generates the following message: 'Your table should have a column named "fruit". Your table should not have a column named "vegetable". Please try again.'

Installation

tblcheck is still in development and not on CRAN yet. The development version of tblcheck can be installed from GitHub with:

# install.packages("remotes")
remotes::install_github("rstudio/tblcheck")

Usage

To use tblcheck in a learnr tutorial, start by loading tblcheck after learnr and gradethis in the setup chunk of your tutorial:

```{r setup}
library(learnr)
library(gradethis)
library(tblcheck)
library(tibble)
```

Then include one of the tblcheck functions in your gradethis code:

```{r exercise, exercise = TRUE}

```

```{r exercise-solution}
tibble(fruit = "tomato", color = "red")
```

```{r exercise-check}
grade_this_table()
```

grade_this_table() compares the .result generated by a students’ code to the .solution generated in the exercise-solution chunk.

  • grade_this_table() passes the student’s .result if it’s identical to the .solution, or if a difference is detected, it produces a failing grade with a message describing the problem with the table.

  • grade_this_table() is built on gradethis functions and tbl_grade(), which can be used in exercise grading code to detect problems in tables.

  • tbl_grade(), in turn, is built on lower-level functions that can be used directly to compare specific aspects of two tables.

Learn more about the various tblcheck functions in vignette("tblcheck").

Code of Conduct

Please note that the tblcheck project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

tblcheck's People

Contributors

gadenbuie avatar github-actions[bot] avatar nischalshrestha avatar rossellhayes avatar skaltman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

isabella232

tblcheck's Issues

make messages more informative, e.g. `vec_grade_vector()`

I love the swiss army knife approach of vec_grade_vector(), but the messages are not as customized to the student's code as they could be.

Here are some examples

library(tblcheck)
.result <- c("Austin", "Sacramento", "Denver", "Dayona Beach", "Concord")
.solution <- c("Austin", "Sacramento", "Denver", "Daytona Beach", "Concord")
vec_grade_vector()

Screen Shot 2021-10-07 at 12 31 50 PM

I'd like something that mentions the students code and points them to what went wrong, e.g: I didn't expect your result to contain the value Dayona Beach.

.result <- c("Austin", "Sacramento", "Denver", "Daytona Beach", "Concord", "Boston")
.solution <- c("Austin", "Sacramento", "Denver", "Daytona Beach", "Concord")
vec_grade_vector()

Screen Shot 2021-10-07 at 12 34 03 PM

I'd like to identify the offending value and suggest a remedy: It looks like you added a value that is not in the solution, "Boston". Try removing it.

Why is this a problem?

The messages feel like R error messages. Since they don't talk about the student's code, the student needs to figure out what about their code caused the message. This can be frustrating.

The goal is to make the grading messages always sound like personalized feedback and advice coming from a friendly mentor who has read the student's code. Whenever possible, we should glue in pieces of the student code or result to make it clear what we are saying. And to make it feel personalized.

Refactor `max_diffs` argument to be handled by the grade/message functions only

Currently the max_diffs arg in grade_ functions goes to both tbl_grade() and the underlying checking functions.

We should restructure this so that the max_diffs logic is handled by the checking function only. Ideally, this would remove the need for separate problem classes when the max_diffs argument was included. Once the problem is created, we should take it at face value, other than in the generic case of tblcheck_problem.

If we need to think of an approach for comparing large actual and expected items, we could investigate using the output from waldo::compare() or using the diffobj package directly. Worst case, we could use options(max.print) for an upper limit.

`grade_this_table()` doesn't expose `check_column_order` argument of `tbl_grade()`

pkgload::load_all()
#> ℹ Loading tblcheck
library(dplyr, warn.conflicts = FALSE)

.solution <- .result <- 
  starwars %>%
  transmute(height = height / 2.54, mass = mass * 2.205)

grade_this_table(check_column_order = TRUE)
#> Error: `...` is not empty.
#> 
#> We detected these problematic arguments:
#> * `check_column_order`
#> 
#> These dots only exist to allow future extensions and should be empty.
#> Did you misspecify an argument?

Values problems should limit the significant digits of numbers printed in the problem message

pkgload::load_all()
#> ℹ Loading tblcheck

vec <- c(100, 200, 300, 400)
vec_check_values(vec * 1.23, vec / 1.23)
#> <tblcheck problem>
#> The first 3 values of your result should be `81.3008130081301`, `162.60162601626`, and `243.90243902439`, not `123`, `246`, and `369`.
#> $ type    : chr "values"
#> $ expected: num [1:4] 81.3 162.6 243.9 325.2
#> $ actual  : num [1:4] 123 246 369 492

Create interface to add messages for new classes

Add an interface for adding methods to hinted_class_message_list() and friendly_class_message_list() from outside tblcheck.

Possible solutions:

  • Check for additional message formats in options().
  • Convert message generators into R7 generics.
    • This would involve refactoring the class problem to hold the entire object and expected objects, not just their class and length.

`values` check is too strict with floating point errors

vec_*_values() is too strict for differences resulting from floating point errors. In this example, changing the order or multiplication leads a correct solution to be marked incorrect. The vectors are all.equal, but are not identical.

library(dplyr)
library(tblcheck)
library(Zseq)

cases <- Fibonacci(20)[-1]

.result <- (cases - lag(cases)) / lag(cases) * 100
.solution <- 100 * (cases - lag(cases)) / lag(cases)

vec_grade_values()
#> <gradethis_graded: [Incorrect]
#>   I didn't expect your result to include the values `66.7`, `61.8`, and
#>   `61.8`.
#> >

all.equal(.result, .solution)
#> [1] TRUE
identical(.result, .solution)
#> [1] FALSE

Created on 2021-12-14 by the reprex package (v2.0.1)

Code hints not showing next to fail messages from `tbl_grade_table()`

I've noticed that fail messages returned by tblcheck::tbl_grade_table() don't return code hints even when fail.hint = TRUE inside of gradethis_setup().

For example, code hints are turned on for the exercise shown below, but the fail message does not include the expected code hint appended to the end:

Screen Shot 2021-09-01 at 3 38 03 PM

Expected code hint to be included with the fail message:

Screen Shot 2021-09-01 at 3 35 10 PM

Perhaps I'm missing something here, but since tbl_grade_table() returns a gradethis::fail() message, shouldn't it inherit the gradethis.fail.hint option specified in gradethis_setup()?

cc @rossellhayes @gadenbuie


Here is the code used to generate the example exercise above:

---
title: "tblcheck reprex"
output:
  learnr::tutorial:
    progressive: true
    allow_skip: true
    df_print: paged
    css: https://rsacdn.link/teachthis/css/style.css
runtime: shiny_prerendered
---

```{r setup, include=FALSE}
library(academyDatasets)
library(learnr)
library(gradethis)
library(tblcheck) # For table-based exercises. Remove if not needed

gradethis_setup(
  fail.hint = TRUE,      # Give a code feedback hint on all fail*()? Remove if not desired
  fail.encourage = TRUE, # Encourage on all fail*()
  pass.praise = TRUE,    # Praise on all pass*()
  # All hints start on a new line, except for the default fail() message
  maybe_code_feedback.before = "\n\n",
  fail = "Incorrect.{maybe_code_feedback(before = ' ')} {random_encouragement()}"
)

# Exercise Packages and Global Setup ----
library(dplyr)
```

Select the `height` column from `starwars`.

```{r tblcheck-reprex, exercise = TRUE}

```

```{r tblcheck-reprex-solution}
starwars %>% select(height)
```

```{r tblcheck-reprex-check}
grade_this({
  pass_if_equal()
  
  tbl_grade_table()
  
  pass()
})
```

I used the most recent available versions of gradethis (SHA1 e9b10ae3) and tblcheck (SHA1 b7b821f).

Handle database tables?

Maybe tblcheck should include some special handling for remote database tables. Here's a smaller example based on the Introduction to dbplyr • dbplyr:

library(dplyr)
con <- DBI::dbConnect(RSQLite::SQLite(), dbname = ":memory:")

copy_to(con, nycflights13::flights[1:100, ], "flights",
        temporary = FALSE, 
        indexes = list(
          c("year", "month", "day"), 
          "carrier", 
          "tailnum",
          "dest"
        )
)

Given a remote database table, flights_db, tbl_grade() errors because the expected object is not a data.frame.

flights_db <- tbl(con, "flights")

tblcheck::tbl_grade(flights_db, flights_db)
#> An error occurred in the grading code: Assertion on 'expected' failed: Must be of type 'data.frame', not 'tbl_SQLiteConnection/tbl_dbi/tbl_sql/tbl_lazy/tbl'.
#> <gradethis_graded: [Neutral]
#>   Uh-oh! We can't provide feedback at this time. Don't worry, it's not
#>   your fault! There's an issue behind-the-scenes with this exercise.
#> >

For now authors can collect(.result) and collect(.solution), but maybe tbl_grade() could handle this automatically. Alternatively, we could think about rendering the lazy tbl to SQL or doing something else to avoid having to collect the complete data...

flights <- collect(flights_db)
tblcheck::tbl_grade(flights, flights)

`tbl_grade_class()` calls anything with length > 1 a vector

tbl_grade_class() should describe things as objects unless they are truly vectors of length > 1.

library(tblcheck)

.result <- 1
.solution <- lm(mpg ~ wt, data = mtcars)
tbl_grade_class()
#> <gradethis_graded: [Incorrect]
#>   Your result should be a vector with class `lm`, but it is a number
#>   (class `numeric`).
#> >

Created on 2022-08-09 by the reprex package (v2.0.1)

`vec_check_length()` should still check dimensions

In #37, we added vec_check_length(), which simplifies tbl_check_dimensions() for vectors by only checking length. However, this means we get incorrect feedback in places where vectors do have dimension, like matrices.

library(tblcheck)

.result <- matrix(1:6, 2)
.solution <- matrix(1:6, 3)
vec_grade_vector()
#> <gradethis_graded: [Incorrect] Your result contains unexpected values.>

Created on 2021-09-14 by the reprex package (v2.0.1)

The result fails in vec_check_values(), giving a misleading message. This is not actually a values problem, it should be a dimensions problem.

To resolve this, vec_check_length() should be remerged with tbl_check_dimensions() and exist only as an alias rather than a streamlined implementation.

Prepare tblcheck package for release

  • pkgdown site setup
  • Official pkgdown site URL: https://pkgs.rstudio.com/tblcheck
  • Getting started vignette
  • Add RStudio as CPH
  • Clean up DESCRIPTION as needed (add URLs)
    • Add URLs
    • Remove LazyData field
    • Add config/needs fields as needed
  • All S3 methods need #' @export
  • README
    • Fix exercise chunk syntax in README (exercise = TRUE)
    • Add library(tibble) with a comment so the exercise works as-is
  • Final issues prior to release
  • Bump to version 0.1.0
  • Add NEWS with "initial github release" entry
  • Update rcmdcheck and pkgdown workflows to use the latest (see gradethis, ggcheck)

`vec_*_class()` ignores differences between `integer` and `numeric`

Currently, vec_*_class() is set to ignore differences between integer and numeric (double) data.

.result   <- 1
.solution <- 1L
vec_check_class() # No output

Created on 2021-10-14 by the reprex package (v2.0.1)

This is intended behavior, because the difference between integer data and numeric data is usually inconsequential (provided the values are the same). However, this means exercise authors cannot use tblcheck in cases where they want to enforce a distinction between integer and numeric.

Possible paths forward:

  1. Leave the implementation as is.
    • Pro: In the majority of exercises, the difference will not matter.
    • Con: This means tblcheck will be insufficient for cases where the difference does matter.
  2. Enforce the distinction when .solution is integer, but allow either data type when .solution is numeric
    • Pro: The distinction is more likely to matter if the intended solution is an integer.
    • Con: Authors may accidentally create an integer .solution when the distinction doesn't matter, e.g. by using x:y.
  3. Add an argument to specify if the distinction should be enforced.
    • Pro: Maximally clear.
    • Con: Clunky, and opens the door to an ever-growing number of arguments.
  4. Enforce the distinction in all instances.
    • Pro: Easy to understand.
    • Con: Authors will have to implement a workaround every time they do not care about this distinction, or students will have deal with a lot of unnecessary failing grades.

I lean towards option 2.

Add check for rownames

library(tblcheck)
library(waldo)

.solution <- data.frame(a = 1:2, b = 3:4)
.result <- `rownames<-`(.solution, c("x", "y"))

waldo::compare(.result, .solution)
#> `attr(old, 'row.names')` is a character vector ('x', 'y')
#> `attr(new, 'row.names')` is an integer vector (1, 2)
tbl_check(.result, .solution)
# No output

Created on 2022-07-06 by the reprex package (v2.0.1)

Reduce the magic in `return_if_graded()`

  • Add an argument that allows a calling function to prepend a class to the captured problem
  • ... will be data that is added into the captured problem

tblcheck/R/utils.R

Lines 60 to 75 in b288f98

return_if_problem <- function(problem, ..., envir = parent.frame()) {
if (inherits(problem, "tblcheck_problem")) {
dots <- list(...)
if (length(dots)) {
problem_prefix <- paste0(names(dots)[[length(dots)]], "_")
assert_internally(checkmate::assert_string(problem_prefix))
problem$type <- gsub("^(.*_)?", problem_prefix, problem$type)
dots <- dots[!names(dots) %in% names(problem)]
problem <- as.problem(c(problem, dots))
}
rlang::return_from(envir, problem)
}
}

Splitting `tblcheck` functions

tblcheck functions currently do two major things:

  1. Detect the difference between a user's result and the solution.
  2. Return a grade with a message explaining the difference.

We're planning to split those tasks into two families of functions. The check_* functions will return a problem object detailing the difference between a user's result and the solution. That object may be handled by the tutorial author in their grading code.

An as of yet unnamed second family of functions will internally call the check_* functions and format and throw a gradethis grade based on the detected problem. These functions will be what tutorial authors should use if they do not need to modify the default handling of the problems.

Use S3 classes for problems

Use S3 classes to track the problem type and using class dispatch to find the correct tbl_message() function, e.g. tbl_message.names() for a "names" problem.

We might also want to either remove the $type field or have it track the outermost problem.

Remove `return_fail` and `return_if_graded()`

return_fail() is redundant because it is only called as the last step of tbl_grade().

return_if_graded() is use because assert_internally() returns a neutral grade in the check function, not the grade function. As an improvement, we could change assert_internally() to return a grading_error problem that gets turned into neutral grade by tbl_grade().

Use `vec_` prefix for vector checks

tbl_check_vector() and tbl_check_levels() have misleading names, because these functions check vectors, not tables. Replacing the tbl_ prefix with vec_ may be more intuitive.

`tbl_grade_class()` can give unhelpful messages for grouped and rowwise tables

tbl_grade_class() suggests using group_by(), ungroup(), or rowwise() whenever .result or .solution is a grouped or rowwise tibble, even if the other isn't even a data frame.

library(dplyr)
library(tblcheck)

.result <- 1:2
.solution <- tibble(a = 1, b = 2) %>% group_by(b)

tbl_grade_class()
#> <gradethis_graded: [Incorrect]
#>   Your table isn't a grouped data frame, but I was expecting it to be
#>   grouped. Maybe you need to use `group_by()`?
#> >

Created on 2022-08-08 by the reprex package (v2.0.1)

If the student failed to even create a data frame, failing to use group_by() is not the most actionable feedback.

`tbl_check(check_groups = FALSE)` still finds problem if one table is grouped and the other is not

Currently, the check_groups argument of tbl_check() determines whether the groups of two tables should be compared with tbl_check_groups(). But if one table is grouped and the other is not, this problem is detected with tbl_check_class(), not tbl_check_groups(), so the problem is still detected even if check_groups = FALSE.

To solve this, tbl_check_class() should gain an argument specifying classes to be ignored, and tbl_check() should set this argument to include grouped_df if check_groups = FALSE.

library(dplyr)
library(tblcheck)

.result <- mtcars %>% group_by(am)
.solution <- mtcars
tbl_check(check_groups = FALSE)
#> <tblcheck problem>
#> Your table is a grouped data frame, but I wasn't expecting it to be grouped. Maybe you need to use `ungroup()`?
#> $ type           : chr "class"
#> $ expected       : chr "data.frame"
#> $ actual         : chr [1:4] "grouped_df" "tbl_df" "tbl" "data.frame"
#> $ expected_length: int 11
#> $ actual_length  : int 11
#> $ location       : chr "table"

Created on 2022-02-01 by the reprex package (v2.0.1)

Should `check_table()` also provide passing grades

I realized that the checking functions check_table(), check_column(), and check_names() all return only failing grades. I think this makes sense for check_names() and check_column().

But for the larger, encompassing check_table() function, it also makes sense that it would return a passing grade if all of the internal checks pass. Currently, we return an invisible NULL...

ex <- gradethis::mock_this_exercise(
  .user_code = tibble::tibble(a = letters[1:3], b = a, c = a),
  .solution = tibble::tibble(a = letters[1:3], b = a, c = a)
)

grade <- gradethis::grade_this(tblcheck::check_table())(ex)

grade
#> NULL

But this means that using grading code with just check_table() will return an evaluation error.

```{r demo-check}
grade_this({
  check_table()
})
```
solution <- "tibble::tibble(a = letters[1:3], b = a, c = a)"

grade <- gradethis::gradethis_exercise_checker(
  label = "demo",
  user_code = solution, 
  solution_code = solution,
  check_code = "gradethis::grade_this(tblcheck::check_table())",
  envir_result = new.env(),
  envir_prep = new.env(),
  last_value = eval(parse(text = solution))
)
#> `demo-check` chunk did not mark an answer as correct or incorrect.Consider adding a `pass()` or `fail()` at the end of your `demo-check` code

cat(gradethis:::conditionMessage.gradethis_graded(grade))
#> [Incorrect]
#>   <p>A problem occurred with your teacher’s grading code. Defaulting to
#>   <em>incorrect</em>.</p>

It's not that hard to call pass() after check_table() but I wonder if people will assume that's what check_table() is going to do.

```{r demo-check}
grade_this({
  check_table()
  pass() # required!
})
```

We could...

  1. Give check_table() a final argument that defaults to NULL or pass()

    ex <- gradethis::mock_this_exercise(
      .user_code = tibble::tibble(a = letters[1:3], b = a, c = a),
      .solution = tibble::tibble(a = letters[1:3], b = a, c = a)
    )
    
    grader <- gradethis::grade_this(
      tblcheck::check_table(final = pass('Good job!'))
    ) 
    
    grader(ex)
    #> <gradethis_graded: [Correct] Good job!>
    
  2. Have check_table() pass automatically... but then would we need to expose the arguments of pass()?

  3. Change the name of the check_ functions that make it clearer that they only throw failures?

    • check_for_table_problems(), check_for_column_problems()
    • fail_if_table_problem()
    • ??

I think I'm leaning toward the first option.

Add a `select` argument to `tbl_grade()` with tidyselect semantics

pkgload::load_all("~/work/tblcheck")
#> ℹ Loading tblcheck

library(dplyr)

foods <- tibble(
  food = c("lettuce", "tomato"),
  vegetable = c(TRUE, FALSE),
  color = c("green", "red"),
  count = c(5, 3)
)

The example exercise is to turn count into pct.

.solution <-
  foods %>%
  mutate(pct = count / sum(count))

.solution
#> # A tibble: 2 × 5
#>   food    vegetable color count   pct
#>   <chr>   <lgl>     <chr> <dbl> <dbl>
#> 1 lettuce TRUE      green     5 0.625
#> 2 tomato  FALSE     red       3 0.375

A student might create a temp column for total.

.result <- 
  foods %>% 
  mutate(
    total = sum(count),
    pct = count / total
  )

How to ignore extra columns?

tbl_grade()
#> <gradethis_graded: [Incorrect]
#>   Your table should not have a column named `total`.
#> >
tbl_grade(check_names = FALSE)
#> <gradethis_graded: [Incorrect]
#>   Your table should have 5 columns, but it has 6 columns.
#> >
tbl_grade(check_names = FALSE, check_dimensions = FALSE)

You have to turn off namee and dimension checking, but this feels a little excessive, especially since we also lose row-count checking (even though it’s picked up by the values check).

tbl_grade(.result[1, ], check_names = FALSE, check_dimensions = FALSE)
#> <gradethis_graded: [Incorrect]
#>   The first 2 values of your `food` column should be `lettuce` and
#>   `tomato`.
#> >

Without the column problem, we’d expect to learn that our table doesn’t have as many rows as expected.

.result %>% 
  slice(1) %>% 
  select(-total) %>% 
  tbl_grade(check_names = FALSE)
#> <gradethis_graded: [Incorrect]
#>   Your table should have 2 rows, but it has 1 row.
#> >

But if tbl_grade() had a select option with tidyselect semantics…

.result %>% 
  select(any_of(names(.solution))) %>% 
  tbl_grade()

Should `check_table` check class?

By default, check_table() currently checks the class of each column of the table, but not the class of the table itself.

library(tblcheck)

.result   <- data.frame(a = 1:5, b = 6:10)
.solution <- tibble::tibble(a = 1:5, b = 6:10)
check_table(.result, .solution) # Passes

Created on 2021-07-27 by the reprex package (v2.0.0)

Should check_table() require that the table has the expected class?

Provide complete `grade_this_table()` and `grade_this_vector()` functions

If we're going to remove the idea of inconsequential differences as in #76, I think we should go all the way and even provide a grade_this_table() or grade_this_vector() function.

Following up with the comment #71 (comment) it would look something like

grade_this_table <- function(
  expr,
  correct = "Message when correct",
  incorrect = "Message when incorrect",
  ...,
  # all the arguments from tbl_grade_table() except object/expected
  max_diffs = 3,
  check_class = TRUE,
  check_names = TRUE,
  check_dimensions = TRUE,
  check_groups = TRUE,
  check_columns = TRUE,
  check_column_class = check_columns,
  check_column_values = check_columns
) {
  # expr lets users add extra checks or grading code
  
  # ... prep check env and things
  
  function(check_env) {
    grade_this({
      # pass immediately if they're *exactly* the same
      pass_if_equal(correct)
      
      # check for problems with the table
      tbl_grade_table(
        # fill in values from grade_this_table()
      )
      
      # evaluate extra checks in `expr` here
      
      # finally, fail()
      fail(incorrect)
    })
  }
}

Provide problem helper functions

problem_type(problem)
## "column_class"
is_problem(problem)
## TRUE
is_problem(problem, "column_class")
## TRUE

is_problem(problem, "names")
## FALSE

`vec_check_values()` fails when column contains missing value

pkgload::load_all()
#> ℹ Loading tblcheck

solution <- result <- c(1, 2, NA)

vec_check_values(
  object = result,
  expected = solution
)
#> Error in if (!all(vctrs::vec_equal(object, expected))) {: missing value where TRUE/FALSE needed

The error comes from vctrs::vec_equal()

vctrs::vec_equal(result, solution)
#> [1] TRUE TRUE   NA

where we need to include na_equal = TRUE

vctrs::vec_equal(result, solution, na_equal = TRUE)
#> [1] TRUE TRUE TRUE

Include actual and expected levels in message for levels problem

Your geographic_grouping column’s levels were not in the expected order. The first 3 levels of your geographic_grouping column should be North America, Oceania, and Europe.

Since levels aren't included in the default print method for tables, it's hard to know what order the levels are in.

Rename all-in-one grading functions

While working on editing the tblcheck vignette, it occurred to me that the API might be a little bit cleaner with slightly different naming for the all-in-one functions.

My proposal is that we would rename

  • tbl_grade_table() to tbl_grade() for all-in-one table grading
    • Also tbl_check_table() to tbl_check()
  • vec_grade_vector() to vec_grade() for all-in-one vector grading
    • Also vec_check_vector() to vec_check()
  • tbl_grade() to ??? to return a grade from a problem object
    • tblcheck_grade() (my current favorite)
    • grade_problem()
    • problem_grade()
    • as_grade()
    • gradethis_problem()
    • ...

There are two major reasons why I think this change is warranted:

  1. This makes the tbl_ and vec_ function naming cleaner, it's easier to write tbl_grade and all specific functions under tbl_grade are extensions of this prefix. The structure is then consistently [thing]_[action](_[subset])?

  2. Currently tbl_grade() gets in the way of autocomplete. In the autocomplete menu in the screenshot below, it would be much better for tbl_grade() to be the all-in-one function with additional, more specific checks below. Currently, tbl_grade() somewhat confusingly comes first.

    image

Standardize problem types

Since there's overlap between the problem type and the problem classes, we should standardize the problem names. Ideally we'd be able to identify the check function that creates a problem by inspecting the class name.

For example, the vec_check_levels() function now returns problems with type

  • n_levels
  • levels
  • level_order_diffs
  • level_order

but if all problem types started with levels_ we'd be able to identify the underlying check function more easily.

  • levels_n
  • levels
  • levels_order_diffs
  • levels_order

We should be careful to ensure that problem types from tbl_check_ and vec_check_ functions don't overlap unless it's unavoidable.

`tbl_check_names()` does not give feedback for incorrect order

tbl_check_names() successfully detects a names_problem if the order of names is incorrect, but does not fill actual and expected, creating a failing grade with no feedback.

library(dplyr)
library(tblcheck)

.result <- starwars
.solution <- starwars %>% relocate(species, .before = 1)
tbl_grade_table()
#> <gradethis_graded: [Incorrect] >

Created on 2021-09-10 by the reprex package (v2.0.1)

General checking functions should only rely on specific checking functions

We should ensure that the three general check functions don't actually have any checking logic and instead simply coordinate calls to the specific checking functions.

In conjunction with #51 this will make it easier to navigate the package source: the problem type identifies the checking function where the message formatting methods can also be found.

The three general check functions are

  • tbl_check_table()
  • tbl_check_column()
  • vec_check_vector()

The biggest change will be to vec_check_vector() which contains logic for checking the vector values. tbl_check_column() includes a small check for a missing column; even though it's a small check it'd be worth moving to it's own function. I think tbl_check_table() already follows this principle of separation.

Indicate that we're only showing a subset of incorrect values

In the following example, there are 5 unequal values in the column being checked.

pkgload::load_all()
#> ℹ Loading tblcheck

.result <- tibble::tibble(x = c(1:5, runif(5)))
.solution <- tibble::tibble(x = c(1:5, runif(5)))

But the values problem message only shows us 3 values and doesn't give any indication that there are more issues than these three.

tbl_grade_column("x")
#> <gradethis_graded: [Incorrect]
#>   I didn't expect your `x` column to include the values `0.2423`,
#>   `0.1420`, and `0.0605`.
#> >

We should either:

  1. Show the number of total errors, e.g.

    Your x column contains 5 unexpected values including 0.2423, 0.1420, and 0.0605.

  2. Give a hint that there are more errors, e.g.

    I didn't expect your x column to include the values 0.2423, 0.1420, and 0.0605 and 2 more.

  3. Both?

    Your x column contains 5 unexpected values including 0.2423, 0.1420, and 0.0605 and 2 more.

Make `pre_check` the first argument of `grade_this_table()`?

Should grade_this_table() take pre_check as its first argument (and post_check as its second)? I would imagine changing pre_check code is more common than the correct message, and it would enable slightly more compact code, e.g.

grade_this_table({
  tbl_grade_is_table(.result)
  .result <- .result[intersect(names(.result), names(.solution))]
})

What do you think @gadenbuie?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.