rstudio / tblcheck Goto Github PK

View Code? Open in Web Editor NEW

10.0 30.0 1.0 2.68 MB

Checks for grading tabular data

Home Page: https://pkgs.rstudio.com/tblcheck

License: Other

R 99.95% CSS 0.05%

learnr gradethis autograding tables tibbles

tblcheck's Issues

Code hints not showing next to fail messages from `tbl_grade_table()`

I've noticed that fail messages returned by tblcheck::tbl_grade_table() don't return code hints even when fail.hint = TRUE inside of gradethis_setup().

For example, code hints are turned on for the exercise shown below, but the fail message does not include the expected code hint appended to the end:

Expected code hint to be included with the fail message:

Perhaps I'm missing something here, but since tbl_grade_table() returns a gradethis::fail() message, shouldn't it inherit the gradethis.fail.hint option specified in gradethis_setup()?

cc @rossellhayes @gadenbuie

Here is the code used to generate the example exercise above:

---
title: "tblcheck reprex"
output:
  learnr::tutorial:
    progressive: true
    allow_skip: true
    df_print: paged
    css: https://rsacdn.link/teachthis/css/style.css
runtime: shiny_prerendered
---

```{r setup, include=FALSE}
library(academyDatasets)
library(learnr)
library(gradethis)
library(tblcheck) # For table-based exercises. Remove if not needed

gradethis_setup(
  fail.hint = TRUE,      # Give a code feedback hint on all fail*()? Remove if not desired
  fail.encourage = TRUE, # Encourage on all fail*()
  pass.praise = TRUE,    # Praise on all pass*()
  # All hints start on a new line, except for the default fail() message
  maybe_code_feedback.before = "\n\n",
  fail = "Incorrect.{maybe_code_feedback(before = ' ')} {random_encouragement()}"
)

# Exercise Packages and Global Setup ----
library(dplyr)
```

Select the `height` column from `starwars`.

```{r tblcheck-reprex, exercise = TRUE}

```

```{r tblcheck-reprex-solution}
starwars %>% select(height)
```

```{r tblcheck-reprex-check}
grade_this({
  pass_if_equal()
  
  tbl_grade_table()
  
  pass()
})
```

I used the most recent available versions of gradethis (SHA1 e9b10ae3) and tblcheck (SHA1 b7b821f).

`vec_check_values()` fails when column contains missing value

pkgload::load_all()
#> ℹ Loading tblcheck

solution <- result <- c(1, 2, NA)

vec_check_values(
  object = result,
  expected = solution
)
#> Error in if (!all(vctrs::vec_equal(object, expected))) {: missing value where TRUE/FALSE needed

The error comes from vctrs::vec_equal()

vctrs::vec_equal(result, solution)
#> [1] TRUE TRUE   NA

where we need to include na_equal = TRUE

vctrs::vec_equal(result, solution, na_equal = TRUE)
#> [1] TRUE TRUE TRUE

Indicate that we're only showing a subset of incorrect values

In the following example, there are 5 unequal values in the column being checked.

pkgload::load_all()
#> ℹ Loading tblcheck

.result <- tibble::tibble(x = c(1:5, runif(5)))
.solution <- tibble::tibble(x = c(1:5, runif(5)))

But the values problem message only shows us 3 values and doesn't give any indication that there are more issues than these three.

tbl_grade_column("x")
#> <gradethis_graded: [Incorrect]
#>   I didn't expect your `x` column to include the values `0.2423`,
#>   `0.1420`, and `0.0605`.
#> >

We should either:

Show the number of total errors, e.g.

Your x column contains 5 unexpected values including 0.2423, 0.1420, and 0.0605.
Give a hint that there are more errors, e.g.

I didn't expect your x column to include the values 0.2423, 0.1420, and 0.0605 and 2 more.
Both?

Your x column contains 5 unexpected values including 0.2423, 0.1420, and 0.0605 and 2 more.

Add examples to documentation for all exported functions

Required for CRAN and generally good practice

Remove `return_fail` and `return_if_graded()`

return_fail() is redundant because it is only called as the last step of tbl_grade().

return_if_graded() is use because assert_internally() returns a neutral grade in the check function, not the grade function. As an improvement, we could change assert_internally() to return a grading_error problem that gets turned into neutral grade by tbl_grade().

`tbl_check(check_groups = FALSE)` still finds problem if one table is grouped and the other is not

Currently, the check_groups argument of tbl_check() determines whether the groups of two tables should be compared with tbl_check_groups(). But if one table is grouped and the other is not, this problem is detected with tbl_check_class(), not tbl_check_groups(), so the problem is still detected even if check_groups = FALSE.

To solve this, tbl_check_class() should gain an argument specifying classes to be ignored, and tbl_check() should set this argument to include grouped_df if check_groups = FALSE.

library(dplyr)
library(tblcheck)

.result <- mtcars %>% group_by(am)
.solution <- mtcars
tbl_check(check_groups = FALSE)
#> <tblcheck problem>
#> Your table is a grouped data frame, but I wasn't expecting it to be grouped. Maybe you need to use `ungroup()`?
#> $ type           : chr "class"
#> $ expected       : chr "data.frame"
#> $ actual         : chr [1:4] "grouped_df" "tbl_df" "tbl" "data.frame"
#> $ expected_length: int 11
#> $ actual_length  : int 11
#> $ location       : chr "table"

^{Created on 2022-02-01 by the reprex package (v2.0.1)}

Reduce the magic in `return_if_graded()`

Add an argument that allows a calling function to prepend a class to the captured problem
... will be data that is added into the captured problem

tblcheck/R/utils.R

Lines 60 to 75 in b288f98

 return_if_problem <- function(problem, ..., envir = parent.frame()) { 

 if (inherits(problem, "tblcheck_problem")) { 

 dots <- list(...) 

 if (length(dots)) { 

 problem_prefix <- paste0(names(dots)[[length(dots)]], "_") 

 assert_internally(checkmate::assert_string(problem_prefix)) 

 problem$type <- gsub("^(.*_)?", problem_prefix, problem$type) 

 dots <- dots[!names(dots) %in% names(problem)] 

 problem <- as.problem(c(problem, dots)) 

 } 

 rlang::return_from(envir, problem) 

 } 

 }

Include actual and expected levels in message for levels problem

Your geographic_grouping column’s levels were not in the expected order. The first 3 levels of your geographic_grouping column should be North America, Oceania, and Europe.

Since levels aren't included in the default print method for tables, it's hard to know what order the levels are in.

Create interface to add messages for new classes

Add an interface for adding methods to hinted_class_message_list() and friendly_class_message_list() from outside tblcheck.

Possible solutions:

Check for additional message formats in options().
Convert message generators into R7 generics.
- This would involve refactoring the class problem to hold the entire object and expected objects, not just their class and length.

Vector `length` problem does not produce a message if there are no unique missing values

library(tblcheck)

.result   <- c(3, 4, 3)
.solution <- c(3, 4, 3, 4, 3)
vec_grade_length()
#> <gradethis_graded: [Incorrect] >

^{Created on 2021-12-03 by the reprex package (v2.0.1)}

Refactor `max_diffs` argument to be handled by the grade/message functions only

Currently the max_diffs arg in grade_ functions goes to both tbl_grade() and the underlying checking functions.

We should restructure this so that the max_diffs logic is handled by the checking function only. Ideally, this would remove the need for separate problem classes when the max_diffs argument was included. Once the problem is created, we should take it at face value, other than in the generic case of tblcheck_problem.

If we need to think of an approach for comparing large actual and expected items, we could investigate using the output from waldo::compare() or using the diffobj package directly. Worst case, we could use options(max.print) for an upper limit.

`tbl_grade_class()` can give unhelpful messages for grouped and rowwise tables

tbl_grade_class() suggests using group_by(), ungroup(), or rowwise() whenever .result or .solution is a grouped or rowwise tibble, even if the other isn't even a data frame.

library(dplyr)
library(tblcheck)

.result <- 1:2
.solution <- tibble(a = 1, b = 2) %>% group_by(b)

tbl_grade_class()
#> <gradethis_graded: [Incorrect]
#>   Your table isn't a grouped data frame, but I was expecting it to be
#>   grouped. Maybe you need to use `group_by()`?
#> >

^{Created on 2022-08-08 by the reprex package (v2.0.1)}

If the student failed to even create a data frame, failing to use group_by() is not the most actionable feedback.

Align vignette with the authoring experience

Using knitr hooks and custom CSS to make the examples align with the R Markdown source that will be used by exercise authors.

tblcheck should return grades in addition to using gradethis-signalling

The check_*() functions should return grades and exit early even when not inside a gradethis context.

Values problems should limit the significant digits of numbers printed in the problem message

pkgload::load_all()
#> ℹ Loading tblcheck

vec <- c(100, 200, 300, 400)

vec_check_values(vec * 1.23, vec / 1.23)
#> <tblcheck problem>
#> The first 3 values of your result should be `81.3008130081301`, `162.60162601626`, and `243.90243902439`, not `123`, `246`, and `369`.
#> $ type    : chr "values"
#> $ expected: num [1:4] 81.3 162.6 243.9 325.2
#> $ actual  : num [1:4] 123 246 369 492

Make `tblcheck` messages translatable

Investigate using `cli` to provide friendly class names in `tbl_check_class()` messages

In r-lib/cli#463, cli introduced an interface for generating user-friendly object types. Could we use this in generating messages from tbl_check_class()?

`tbl_check_names()` does not give feedback for incorrect order

tbl_check_names() successfully detects a names_problem if the order of names is incorrect, but does not fill actual and expected, creating a failing grade with no feedback.

library(dplyr)
library(tblcheck)

.result <- starwars
.solution <- starwars %>% relocate(species, .before = 1)
tbl_grade_table()
#> <gradethis_graded: [Incorrect] >

^{Created on 2021-09-10 by the reprex package (v2.0.1)}

Standardize problem types

Since there's overlap between the problem type and the problem classes, we should standardize the problem names. Ideally we'd be able to identify the check function that creates a problem by inspecting the class name.

For example, the vec_check_levels() function now returns problems with type

n_levels
levels
level_order_diffs
level_order

but if all problem types started with levels_ we'd be able to identify the underlying check function more easily.

levels_n
levels
levels_order_diffs
levels_order

We should be careful to ensure that problem types from tbl_check_ and vec_check_ functions don't overlap unless it's unavoidable.

Add check for rownames

library(tblcheck)
library(waldo)

.solution <- data.frame(a = 1:2, b = 3:4)
.result <- `rownames<-`(.solution, c("x", "y"))

waldo::compare(.result, .solution)
#> `attr(old, 'row.names')` is a character vector ('x', 'y')
#> `attr(new, 'row.names')` is an integer vector (1, 2)
tbl_check(.result, .solution)
# No output

^{Created on 2022-07-06 by the reprex package (v2.0.1)}

General checking functions should only rely on specific checking functions

We should ensure that the three general check functions don't actually have any checking logic and instead simply coordinate calls to the specific checking functions.

In conjunction with #51 this will make it easier to navigate the package source: the problem type identifies the checking function where the message formatting methods can also be found.

The three general check functions are

tbl_check_table()
tbl_check_column()
vec_check_vector()

The biggest change will be to vec_check_vector() which contains logic for checking the vector values. tbl_check_column() includes a small check for a missing column; even though it's a small check it'd be worth moving to it's own function. I think tbl_check_table() already follows this principle of separation.

Splitting `tblcheck` functions

tblcheck functions currently do two major things:

Detect the difference between a user's result and the solution.
Return a grade with a message explaining the difference.

We're planning to split those tasks into two families of functions. The check_* functions will return a problem object detailing the difference between a user's result and the solution. That object may be handled by the tutorial author in their grading code.

An as of yet unnamed second family of functions will internally call the check_* functions and format and throw a gradethis grade based on the detected problem. These functions will be what tutorial authors should use if they do not need to modify the default handling of the problems.

Use S3 classes for problems

Use S3 classes to track the problem type and using class dispatch to find the correct tbl_message() function, e.g. tbl_message.names() for a "names" problem.

We might also want to either remove the $type field or have it track the outermost problem.

Use `vec_` prefix for vector checks

tbl_check_vector() and tbl_check_levels() have misleading names, because these functions check vectors, not tables. Replacing the tbl_ prefix with vec_ may be more intuitive.

Use `expect_snapshot()` for message testing

`tbl_grade_class()` calls anything with length > 1 a vector

tbl_grade_class() should describe things as objects unless they are truly vectors of length > 1.

library(tblcheck)

.result <- 1
.solution <- lm(mpg ~ wt, data = mtcars)
tbl_grade_class()
#> <gradethis_graded: [Incorrect]
#>   Your result should be a vector with class `lm`, but it is a number
#>   (class `numeric`).
#> >

^{Created on 2022-08-09 by the reprex package (v2.0.1)}

Add `vec_` aliases for functions that can apply to both tables and vectors

Add a `select` argument to `tbl_grade()` with tidyselect semantics

pkgload::load_all("~/work/tblcheck")
#> ℹ Loading tblcheck

library(dplyr)

foods <- tibble(
  food = c("lettuce", "tomato"),
  vegetable = c(TRUE, FALSE),
  color = c("green", "red"),
  count = c(5, 3)
)

The example exercise is to turn count into pct.

.solution <-
  foods %>%
  mutate(pct = count / sum(count))

.solution
#> # A tibble: 2 × 5
#>   food    vegetable color count   pct
#>   <chr>   <lgl>     <chr> <dbl> <dbl>
#> 1 lettuce TRUE      green     5 0.625
#> 2 tomato  FALSE     red       3 0.375

A student might create a temp column for total.

.result <- 
  foods %>% 
  mutate(
    total = sum(count),
    pct = count / total
  )

How to ignore extra columns?

tbl_grade()
#> <gradethis_graded: [Incorrect]
#>   Your table should not have a column named `total`.
#> >
tbl_grade(check_names = FALSE)
#> <gradethis_graded: [Incorrect]
#>   Your table should have 5 columns, but it has 6 columns.
#> >
tbl_grade(check_names = FALSE, check_dimensions = FALSE)

You have to turn off namee and dimension checking, but this feels a little excessive, especially since we also lose row-count checking (even though it’s picked up by the values check).

tbl_grade(.result[1, ], check_names = FALSE, check_dimensions = FALSE)
#> <gradethis_graded: [Incorrect]
#>   The first 2 values of your `food` column should be `lettuce` and
#>   `tomato`.
#> >

Without the column problem, we’d expect to learn that our table doesn’t have as many rows as expected.

.result %>% 
  slice(1) %>% 
  select(-total) %>% 
  tbl_grade(check_names = FALSE)
#> <gradethis_graded: [Incorrect]
#>   Your table should have 2 rows, but it has 1 row.
#> >

But if tbl_grade() had a select option with tidyselect semantics…

.result %>% 
  select(any_of(names(.solution))) %>% 
  tbl_grade()

`vec_*_class()` ignores differences between `integer` and `numeric`

Currently, vec_*_class() is set to ignore differences between integer and numeric (double) data.

.result   <- 1
.solution <- 1L
vec_check_class() # No output

^{Created on 2021-10-14 by the reprex package (v2.0.1)}

This is intended behavior, because the difference between integer data and numeric data is usually inconsequential (provided the values are the same). However, this means exercise authors cannot use tblcheck in cases where they want to enforce a distinction between integer and numeric.

Possible paths forward:

Leave the implementation as is.
- Pro: In the majority of exercises, the difference will not matter.
- Con: This means tblcheck will be insufficient for cases where the difference does matter.
Enforce the distinction when .solution is integer, but allow either data type when .solution is numeric
- Pro: The distinction is more likely to matter if the intended solution is an integer.
- Con: Authors may accidentally create an integer .solution when the distinction doesn't matter, e.g. by using x:y.
Add an argument to specify if the distinction should be enforced.
- Pro: Maximally clear.
- Con: Clunky, and opens the door to an ever-growing number of arguments.
Enforce the distinction in all instances.
- Pro: Easy to understand.
- Con: Authors will have to implement a workaround every time they do not care about this distinction, or students will have deal with a lot of unnecessary failing grades.

I lean towards option 2.

Provide complete `grade_this_table()` and `grade_this_vector()` functions

If we're going to remove the idea of inconsequential differences as in #76, I think we should go all the way and even provide a grade_this_table() or grade_this_vector() function.

Following up with the comment #71 (comment) it would look something like

grade_this_table <- function(
  expr,
  correct = "Message when correct",
  incorrect = "Message when incorrect",
  ...,
  # all the arguments from tbl_grade_table() except object/expected
  max_diffs = 3,
  check_class = TRUE,
  check_names = TRUE,
  check_dimensions = TRUE,
  check_groups = TRUE,
  check_columns = TRUE,
  check_column_class = check_columns,
  check_column_values = check_columns
) {
  # expr lets users add extra checks or grading code
  
  # ... prep check env and things
  
  function(check_env) {
    grade_this({
      # pass immediately if they're *exactly* the same
      pass_if_equal(correct)
      
      # check for problems with the table
      tbl_grade_table(
        # fill in values from grade_this_table()
      )
      
      # evaluate extra checks in `expr` here
      
      # finally, fail()
      fail(incorrect)
    })
  }
}

Add pkgdown site

Using the same theme as gradethis

make messages more informative, e.g. `vec_grade_vector()`

I love the swiss army knife approach of vec_grade_vector(), but the messages are not as customized to the student's code as they could be.

Here are some examples

library(tblcheck)
.result <- c("Austin", "Sacramento", "Denver", "Dayona Beach", "Concord")
.solution <- c("Austin", "Sacramento", "Denver", "Daytona Beach", "Concord")
vec_grade_vector()

I'd like something that mentions the students code and points them to what went wrong, e.g: I didn't expect your result to contain the value Dayona Beach.

.result <- c("Austin", "Sacramento", "Denver", "Daytona Beach", "Concord", "Boston")
.solution <- c("Austin", "Sacramento", "Denver", "Daytona Beach", "Concord")
vec_grade_vector()

I'd like to identify the offending value and suggest a remedy: It looks like you added a value that is not in the solution, "Boston". Try removing it.

Why is this a problem?

The messages feel like R error messages. Since they don't talk about the student's code, the student needs to figure out what about their code caused the message. This can be frustrating.

The goal is to make the grading messages always sound like personalized feedback and advice coming from a friendly mentor who has read the student's code. Whenever possible, we should glue in pieces of the student code or result to make it clear what we are saying. And to make it feel personalized.

Finalize package documentation for pkgdown site

Provide problem helper functions

problem_type(problem)
## "column_class"

is_problem(problem)
## TRUE

is_problem(problem, "column_class")
## TRUE

is_problem(problem, "names")
## FALSE

`grade_this_table()` doesn't expose `check_column_order` argument of `tbl_grade()`

pkgload::load_all()
#> ℹ Loading tblcheck
library(dplyr, warn.conflicts = FALSE)

.solution <- .result <- 
  starwars %>%
  transmute(height = height / 2.54, mass = mass * 2.205)

grade_this_table(check_column_order = TRUE)
#> Error: `...` is not empty.
#> 
#> We detected these problematic arguments:
#> * `check_column_order`
#> 
#> These dots only exist to allow future extensions and should be empty.
#> Did you misspecify an argument?

`vec_check_length()` should still check dimensions

In #37, we added vec_check_length(), which simplifies tbl_check_dimensions() for vectors by only checking length. However, this means we get incorrect feedback in places where vectors do have dimension, like matrices.

library(tblcheck)

.result <- matrix(1:6, 2)
.solution <- matrix(1:6, 3)
vec_grade_vector()
#> <gradethis_graded: [Incorrect] Your result contains unexpected values.>

^{Created on 2021-09-14 by the reprex package (v2.0.1)}

The result fails in vec_check_values(), giving a misleading message. This is not actually a values problem, it should be a dimensions problem.

To resolve this, vec_check_length() should be remerged with tbl_check_dimensions() and exist only as an alias rather than a streamlined implementation.

Add problem for case where factor levels are reversed from expected order?

Prepare tblcheck package for release

`values` check is too strict with floating point errors

vec_*_values() is too strict for differences resulting from floating point errors. In this example, changing the order or multiplication leads a correct solution to be marked incorrect. The vectors are all.equal, but are not identical.

library(dplyr)
library(tblcheck)
library(Zseq)

cases <- Fibonacci(20)[-1]

.result <- (cases - lag(cases)) / lag(cases) * 100
.solution <- 100 * (cases - lag(cases)) / lag(cases)

vec_grade_values()
#> <gradethis_graded: [Incorrect]
#>   I didn't expect your result to include the values `66.7`, `61.8`, and
#>   `61.8`.
#> >

all.equal(.result, .solution)
#> [1] TRUE
identical(.result, .solution)
#> [1] FALSE

^{Created on 2021-12-14 by the reprex package (v2.0.1)}

Remove unused function `expect_grade()`

Check factor levels in `tblcheck`

Should `check_table` check class?

By default, check_table() currently checks the class of each column of the table, but not the class of the table itself.

library(tblcheck)

.result   <- data.frame(a = 1:5, b = 6:10)
.solution <- tibble::tibble(a = 1:5, b = 6:10)
check_table(.result, .solution) # Passes

^{Created on 2021-07-27 by the reprex package (v2.0.0)}

Should check_table() require that the table has the expected class?

Create `check_vector()` function (in tblcheck?) that has the same sort of checks that `check_column()` gives

Should `check_table()` also provide passing grades

I realized that the checking functions check_table(), check_column(), and check_names() all return only failing grades. I think this makes sense for check_names() and check_column().

But for the larger, encompassing check_table() function, it also makes sense that it would return a passing grade if all of the internal checks pass. Currently, we return an invisible NULL...

ex <- gradethis::mock_this_exercise(
  .user_code = tibble::tibble(a = letters[1:3], b = a, c = a),
  .solution = tibble::tibble(a = letters[1:3], b = a, c = a)
)

grade <- gradethis::grade_this(tblcheck::check_table())(ex)

grade
#> NULL

But this means that using grading code with just check_table() will return an evaluation error.

```{r demo-check}
grade_this({
  check_table()
})
```

solution <- "tibble::tibble(a = letters[1:3], b = a, c = a)"

grade <- gradethis::gradethis_exercise_checker(
  label = "demo",
  user_code = solution, 
  solution_code = solution,
  check_code = "gradethis::grade_this(tblcheck::check_table())",
  envir_result = new.env(),
  envir_prep = new.env(),
  last_value = eval(parse(text = solution))
)
#> `demo-check` chunk did not mark an answer as correct or incorrect.Consider adding a `pass()` or `fail()` at the end of your `demo-check` code

cat(gradethis:::conditionMessage.gradethis_graded(grade))
#> [Incorrect]
#>   <p>A problem occurred with your teacher’s grading code. Defaulting to
#>   <em>incorrect</em>.</p>

It's not that hard to call pass() after check_table() but I wonder if people will assume that's what check_table() is going to do.

```{r demo-check}
grade_this({
  check_table()
  pass() # required!
})
```

We could...

Give check_table() a final argument that defaults to NULL or pass()

ex <- gradethis::mock_this_exercise(
  .user_code = tibble::tibble(a = letters[1:3], b = a, c = a),
  .solution = tibble::tibble(a = letters[1:3], b = a, c = a)
)

grader <- gradethis::grade_this(
  tblcheck::check_table(final = pass('Good job!'))
) 

grader(ex)
#> <gradethis_graded: [Correct] Good job!>

Have check_table() pass automatically... but then would we need to expose the arguments of pass()?
Change the name of the check_ functions that make it clearer that they only throw failures?
- check_for_table_problems(), check_for_column_problems()
- fail_if_table_problem()
- ??

I think I'm leaning toward the first option.

`tbl_grade_*()` should pass dots to down to `fail()`

The tbl_grade_*() functions should include ... that are passed to tbl_grade() so they reach gradethis::fail().

`vec_grade_vector()` deprecation message references incorrect function

vec_grade_vector() warns that vec_check_vector() is deprecated and suggests vec_check(). Simple forgotten update after copy paste.

tblcheck/R/deprecated.R

Line 123 in ce3bc6a

lifecycle::deprecate_warn("0.1.0", "vec_check_vector()", "vec_check()")

Rename all-in-one grading functions

While working on editing the tblcheck vignette, it occurred to me that the API might be a little bit cleaner with slightly different naming for the all-in-one functions.

My proposal is that we would rename

tbl_grade_table() to tbl_grade() for all-in-one table grading
- Also tbl_check_table() to tbl_check()
vec_grade_vector() to vec_grade() for all-in-one vector grading
- Also vec_check_vector() to vec_check()
tbl_grade() to ??? to return a grade from a problem object
- tblcheck_grade() (my current favorite)
- grade_problem()
- problem_grade()
- as_grade()
- gradethis_problem()
- ...

There are two major reasons why I think this change is warranted:

This makes the tbl_ and vec_ function naming cleaner, it's easier to write tbl_grade and all specific functions under tbl_grade are extensions of this prefix. The structure is then consistently [thing]_[action](_[subset])?
Currently tbl_grade() gets in the way of autocomplete. In the autocomplete menu in the screenshot below, it would be much better for tbl_grade() to be the all-in-one function with additional, more specific checks below. Currently, tbl_grade() somewhat confusingly comes first.

Make `pre_check` the first argument of `grade_this_table()`?

Should grade_this_table() take pre_check as its first argument (and post_check as its second)? I would imagine changing pre_check code is more common than the correct message, and it would enable slightly more compact code, e.g.

grade_this_table({
  tbl_grade_is_table(.result)
  .result <- .result[intersect(names(.result), names(.solution))]
})

What do you think @gadenbuie?

Handle database tables?

Maybe tblcheck should include some special handling for remote database tables. Here's a smaller example based on the Introduction to dbplyr • dbplyr:

library(dplyr)
con <- DBI::dbConnect(RSQLite::SQLite(), dbname = ":memory:")

copy_to(con, nycflights13::flights[1:100, ], "flights",
        temporary = FALSE, 
        indexes = list(
          c("year", "month", "day"), 
          "carrier", 
          "tailnum",
          "dest"
        )
)

Given a remote database table, flights_db, tbl_grade() errors because the expected object is not a data.frame.

flights_db <- tbl(con, "flights")

tblcheck::tbl_grade(flights_db, flights_db)
#> An error occurred in the grading code: Assertion on 'expected' failed: Must be of type 'data.frame', not 'tbl_SQLiteConnection/tbl_dbi/tbl_sql/tbl_lazy/tbl'.
#> <gradethis_graded: [Neutral]
#>   Uh-oh! We can't provide feedback at this time. Don't worry, it's not
#>   your fault! There's an issue behind-the-scenes with this exercise.
#> >

For now authors can collect(.result) and collect(.solution), but maybe tbl_grade() could handle this automatically. Alternatively, we could think about rendering the lazy tbl to SQL or doing something else to avoid having to collect the complete data...

flights <- collect(flights_db)
tblcheck::tbl_grade(flights, flights)

Build on or be inspired by {diffdf}

https://gowerc.github.io/diffdf might provide solid inspiration or be usable for some low-level functions.

	return_if_problem <- function(problem, ..., envir = parent.frame()) {
	if (inherits(problem, "tblcheck_problem")) {
	dots <- list(...)

	if (length(dots)) {
	problem_prefix <- paste0(names(dots)[[length(dots)]], "_")
	assert_internally(checkmate::assert_string(problem_prefix))
	problem$type <- gsub("^(.*_)?", problem_prefix, problem$type)

	dots <- dots[!names(dots) %in% names(problem)]
	problem <- as.problem(c(problem, dots))
	}

	rlang::return_from(envir, problem)
	}
	}

rstudio / tblcheck Goto Github PK

tblcheck's Issues

Why is this a problem?

Recommend Projects

Recommend Topics

Recommend Org