Git Product home page Git Product logo

Comments (6)

teunbrand avatar teunbrand commented on May 24, 2024

Thanks for the report! While I agree that this is a poor error message, this error isn't really under ggplot2's control as it is thrown from {grid}. Unrelated to linetype, I'm not sure that doing (computationally 'expensive') checks on input data is the best idea as it can hamper some performance.

from ggplot2.

njspix avatar njspix commented on May 24, 2024

Thanks for clarifying. I understand better now why the error is thrown, seems like dash is getting passed as a hex string to grid, and of course s and h are not valid hex.

Related to performance, I'm not a maintainer of a huge codebase used by 100k's of people, so I don't understand everyone's use case. However I fail to see how input validation could impose a substantial performance cost. I'll venture that the time it took me to understand this error was several orders of magnitude greater than the runtime of e.g. linetype %in% c("dashed", "dot" ...) | str_detect(linetype, "^[0-9a-f]+$")...

If performance is a concern (e.g. for folks making 1000's of plots), why not include a validate_input argument that defaults to true?

from ggplot2.

teunbrand avatar teunbrand commented on May 24, 2024

Please don't take the following personally, I do really empathise with the issue of encountering bad error messages. However, I have some remarks about why I think ggplot2 shouldn't validate all data input.

I'll venture that the time it took me to understand this error was several orders of magnitude greater

Yes, but the more relevant comparison is whether the runtime of the checks in all the plots that you ever made that didn't throw warnings exceeds the time it took for you to understand this error.

However I fail to see how input validation could impose a substantial performance cost.

For the sake of argument, let's say ggplot implements a validity check for linetype. Next, people are going to ask ggplot to also do a validity check for colour. Then, shape, or maybe the interaction between shape and fill. So the ask to do input validation for linetype can easily escalate to do many input validation steps.

Aside from the escalation argument, it should be noted that {grid} already does validity checks (as evident by the errors you've encountered). I appreciate a good error message as much as the next person, but I don't think ggplot2 should reimplement hairy parts of grid to throw friendlier messages. It might be more appropriate to petition {grid} directly to improve the messages.

Now any one validation step might not be expensive in 99% of plots, but let me illustrate the following.
The function below checks whether every value in a vector is a valid colour that {grid} understands.

# perhaps there is a faster way, but this is what I got at the moment
is_color <- function(x) {
  grepl("^#(([[:xdigit:]]{2}){3,4}|([[:xdigit:]]){3,4})$", x) |
    x %in% grDevices::colours()
}

In some fields, it is not uncommon to plot giant scatterplots with millions of points. To check if a million values are valid colours takes ~100ms.

input <- sample(c(colours(), NA, "foobar"), 1e6, replace = TRUE)

bench::mark(is_color(input))
#> # A tibble: 1 × 6
#>   expression           min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>      <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 is_color(input)    116ms    116ms      8.62    22.9MB     43.1

However, the rest of the plumbing in making a plot (here represented by a small scatterplot) only takes ~35ms.
So relative to the time it takes to render a plot, the check is 'expensive'.

library(ggplot2)

p <- ggplot(mpg, aes(displ, hwy)) +
  geom_point()

tmp <- tempfile(fileext = ".pdf")
pdf(tmp)

bench::mark(print(p))
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 print(p)     33.2ms   35.3ms      28.2    6.76MB     43.2

dev.off()
#> png 
#>   2

Created on 2024-04-25 with reprex v2.1.0

from ggplot2.

njspix avatar njspix commented on May 24, 2024

Ah, that makes perfect sense. Thank you for clarifying!

I don't know that the error message is particularly bad from grid's perspective. For all it knows, it's just being passed an invalid hex color. And trying to ask them to change the error message just for ggplot creates a sort of 'accidental api', which I doubt would be mutually beneficial.

It's a bigger ask, but perhaps we could consider whether the 'linetype' argument is a little too flexible? I understand the attraction of flexibility, but I really think the error message is pretty bad... :-)

from ggplot2.

teunbrand avatar teunbrand commented on May 24, 2024

I don't think limiting what can be considered a linetype is the right way forward, as it'll hurt backward compatibility. All valid input is documented in the vignette. Grid's error message could just be Invalid linetype: "dash" like they do for colours. It doesn't need to be tailored to ggplot2. Saying that the hex code is wrong (incorrectly) assumes that the user was trying to pass a linetype as a hex code, which might start off the user on the wrong path to solving the issue.

In any case, while I agree with you that the error message can be improved, I don't think it is ggplot2's place to do input validation for the reasons outlined in my previous message. I'll close the issue here and we can reopen if other maintainers disagree with me.

from ggplot2.

njspix avatar njspix commented on May 24, 2024

Great, thank you very much!

from ggplot2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.