Histograms convert counts withing bins into areas . However,

`stat_bin()` should have the area (instead of height) represent the count. about ggplot2 HOT 3 OPEN

mattansb commented on September 21, 2024

`stat_bin()` should have the area (instead of height) represent the count.

from ggplot2.

Comments (3)

teunbrand commented on September 21, 2024

Hi thanks for the suggestions!

Yes, using areas for histograms satisfies the proportional ink principle, but below are a few reasons I don't think we should do it.

Users have come to expect counts by default. We have parted with defaults before, but I don't think we should depart a very clear and simple metric (counts) in favour of more complicated metrics.
Counts and the proposed metric are only the same when the width of the bars are 1. If you replace the breaks by binwidth = 0.01, you see several values reach 200 with the proposed metric, whereas the data only has 32 observations in total.
after_stat(sum(count) * density) sums the counts over groups, which it shouldn't as density is calculated within groups. The appropriate metric would be after_stat(count / width). As this is available as a simple combination of already available computed variables, I don't think this merits a novel computed variable.

from ggplot2.

mattansb commented on September 21, 2024

Yes, changing a default is a pain... IMO it's worth it, but I don't have a huge community to serve ;)

At the very least, I think this should be written somewhere in the docs (as this is how histograms are commonly defined*). Additionally, an example with after_stat(count / width) can be added, with or without (or both) non-equi-width bins.

I'm willing to make (the world's smallest) PR if you'd like.

If you replace the breaks by binwidth = 0.01, you see several values reach 200 with the proposed metric, whereas the data only has 32 observations in total.

I don't see this as an issue - in PDFs, densities can also exceed 1 - it's just stats being stats 🤷‍♂️

* I only came to notice this when I was teaching histograms and a student pointed out that my plot didn't match what I had just said.

from ggplot2.

teunbrand commented on September 21, 2024

it's just stats being stats

Agreed, but it was meant to illustrate how it departed from counts even for equi-bins 🤓

Adding an example is a good idea, we'd welcome a PR for this.

from ggplot2.

Recommend Projects

`stat_bin()` should have the area (instead of height) represent the count. about ggplot2 HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent