As I continue to refine and expand the Altair example gallery, the barley dataset has

The barley dataset comes from Becker: <a href="https://mbostock.github.io/protovis/ex/

<a href="http://ml.stat.purdue.edu/stat695t/writings/TrellisDesignControl.pdf" rel="no

What is the origin of the barley dataset? about vega-datasets HOT 4 CLOSED

palewire commented on September 18, 2024

What is the origin of the barley dataset?

from vega-datasets.

Comments (4)

domoritz commented on September 18, 2024

The barley dataset comes from Becker: https://mbostock.github.io/protovis/ex/barley.html

from vega-datasets.

palewire commented on September 18, 2024

Forgive my ignorance, but why is it sourced to Becker? The version I see packaged for R sources Immer and Cleveland.

from vega-datasets.

palewire commented on September 18, 2024

Ah, is it because Becker is cited for making the trellis chart from the barley data? A Becker paper says:

The barley experiment was run in the 1930s. The data first appeared
in a 1934 report published by the experimenters. Since then, the data have
been analyzed and re-analyzed. R. A. Fisher presented the data for five of
the sites in his classic book, The Design of Experiments. Publication in the
book made the data famous, and many others subsequently analyzed them,
usually to illustrate a new statistical method.

Then in the early 1990s, the data were visualized by Trellis Graphics.
The result was a big surprise. Through 60 years and many analyses, an
important happening in the data had gone undetected.

from vega-datasets.

palewire commented on September 18, 2024

Here is another Becker paper I believe we can cite to write a solid source entry for the data, which he drew from elsewhere.

In the 1930s an experiment was run in the state of Minnesota. At six sites, ten
varieties of barley were grown in each of two years. The data collected for the experiment
are the yields for all combinations of site, variety, and year, so there are 6 x to x 2 = 120
observations. The experiment is of historical interest because it is one of the early field
trials that incorporated R. A. Fisher's ideas on randomization and the analysis of variance.
The agronomists published the data and an analysis of them in Immer, Hayes, and Powers
(1934). Fisher published the data in his classic book, The Design ofExperiments (Fisher 1971), but he did not present an analysis. Fisher's publication gave the data a large
exposure, and many others tried their hands at analyzing them to illustrate new statistical
methods (Anscombe 1981, 1983; Daniel 1976). We will do the same here, using the
data to illustrate Trellis display. The visualization using Trellis reveals an important
happening in the data-there appears to be a major error, one that survived undetected
for six decades (Cleveland 1993).

from vega-datasets.

Recommend Projects