vega / vega-datasets Goto Github PK
View Code? Open in Web Editor NEWCommon repository for example datasets used by Vega-related projects
Common repository for example datasets used by Vega-related projects
For the 2.0 release, let's clean up datasets we don't need anymore.
I suggest this top goal-scorers and footballers since 1980:
https://johnburnmurdoch.github.io/projects/goal-lines/all-comps/
1.9.2
to 1.9.3
.This version is covered by your current version range and after updating it in your project the build failed.
rollup is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.
2019-04-10
The new version differs by 3 commits.
516a06d
1.9.3
a5526ea
Update changelog
c3d73ff
Handle out-of-order binding of identifiers to improve tree-shaking (#2803)
See the full diff
There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.
Your Greenkeeper Bot 🌴
1.14.2
to 1.14.3
.This version is covered by your current version range and after updating it in your project the build failed.
rollup is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.
2019-06-06
The new version differs by 4 commits.
c68bd95
1.14.3
d79aa57
Update changelog
7179390
Use browser relative path algorithm for chunks (#2902)
b1df517
Add funding button
See the full diff
There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.
Your Greenkeeper Bot 🌴
I'm trying to run the examples specs in latest vega release, but am getting some errors with missing datasets, including flare-dependencies.json
from https://github.com/vega/vega/blob/master/spec/tree-radial-bundle.vg.json, normal-2.json
and flights-200k.json
. Are these available somewhere
(apologies if this issue is in the wrong spot)
A general demonstration is outlined here in the google collar file: https://colab.research.google.com/drive/1oKhivD5T9Yi1gMl0_7dUwqVFqiNfD43k?usp=sharing
The 'flights-200k.arrow" is producing an error every time I tried to read in the file using Pandas package.
Any license info on this repo?
The seattle-temps
dataset claims that the temperature in Seattle never rose above 76 degrees F in 2010:
According to my own memory, the temperature was much hotter. Other more reliable sources agree; for example, weather underground claims that Seattle hit 96 degrees F in August 2010: https://www.wunderground.com/history/monthly/us/wa/seattle/KSEA/date/2010-8
Perhaps the dataset is mislabeled?
If you are already aware of this, ignore this.
But when using the data/movies.json
for example on line 11 the movie "Duel in the Sun" has a Release Date
of "Dec 31 2046". When the movie actually was released in 1946.
There are numerous places where this mixup happens.
I figured once the Seattle temperatures are updated (#127) we should also update the corresponding San Francisco temperatures. I've already downloaded the data from the San Francisco International Airport weather station.
Making a note of it here so we don't forget.
There are several movies listed in the movies.json
dataset (everything after 2011) that are listed as having come out 100 years after they were actually released. It's a pretty quick fix and I'm wondering if a pull request would be welcome to either fix the data or create a new dataset with corrected data.
@domoritz sounds good! I'll do one better for you guys:
I woke up this morning thinking we could use a simple vega datasets preview js notebook :)
https://observablehq.com/@randomfractals/vega-datasets
I'll leave it up to you guys if you want to add this vega datasets preview utility Observable notebook to your editor or this datasets repo readme.md.
This notebook can be used as a supplemental tool for online vega editor and examples that use these data sources since it's much faster in data loading and scrolling than what github and vega editor provides.
I might add something similar to the https://github.com/RandomFractals/vscode-vega-viewer as a split panel in vega chart preview in the next major release.
Cheers! 🤗
Originally posted by @RandomFractals in #64 (comment)
🚨 You need to enable Continuous Integration on Greenkeeper branches of this repository. 🚨
To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.
Since we didn’t receive a CI status on the greenkeeper/initial
branch, it’s possible that you don’t have CI set up yet. We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.
If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/
.
Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please click the 'fix repo' button on account.greenkeeper.io.
Gapminder data, from a Swedish non-profit, is a popular part of this repository and truly fascinating to explore using visualization tools in the vega ecosystem. While working on an Altair example, I discovered what looked like a simple issue in the gapminder.json
dataset, but as I looked into fixing it with a simple pull request, the right solution seemed a bit more complex, and I wanted to lay out my thoughts here for feedback.
The immediate issue I found is that it looks like life expectancy data between North and South Korea has been swapped. For 2005, this repository's dataset shows South Korea's life expectancy as 67.297 years and North Korea's as 78.623 years. This contradicts current Gapminder life expectancy data (v14), which reports approximately the reverse. This raises questions about other errors lurking in the dataset.
Resolving this issue is complicated by the absence of sourcing or versioning details for the gapminder data in SOURCES.md. The json file in this repository appears to be based on an older version of the dataset that I could not locate. For instance, Afghanistan's 1955 life expectancy is 30.332 years in the vega-datasets json, which aligns closely with Gapminder's v11 data (32.48 years), but differs from the current v14 (43.88 years).
Given what the vega-datasets README states about versioning, there seem to be a few options for a solution:
Patch release: If the Korea data swap is confirmed as a formatting error, it could potentially be addressed in a patch release. That said, I still haven't been able to locate an older version of a Gapminder file containing data that matches the vega-datasets json.
Minor release: Updating the dataset with current Gapminder figures without changing field names or file names could be done in a minor release. This could address the outdated data issue. But the data could be significantly different (as in the Afghanistan life expectancy data) and some country names may have changed.
Major release: If we need to change field names (e.g., updating regional classification field name "cluster" to align with current Gapminder terminology) or significantly alter file contents, a major release would be necessary.
Regardless of the chosen approach, I propose:
When trying to load the birdstrikes data, a 404 error is thrown.
vega_datasets: 0.8
Ubuntu 20.04 LTS
Python 3.8.3 (default, May 19 2020, 18:47:26)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.15.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from vega_datasets import data
In [2]: bird = data.birdstrikes()
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
<ipython-input-2-a4e393dbcf55> in <module>
----> 1 bird = data.birdstrikes()
~/miniconda3/envs/cookbook/lib/python3.8/site-packages/vega_datasets/core.py in __call__(self, use_local, **kwargs)
222 parsed data
223 """
--> 224 datasource = BytesIO(self.raw(use_local=use_local))
225
226 kwds = self._pd_read_kwds.copy()
~/miniconda3/envs/cookbook/lib/python3.8/site-packages/vega_datasets/core.py in raw(self, use_local)
201 return pkgutil.get_data("vega_datasets", self.pkg_filename)
202 else:
--> 203 return urlopen(self.url).read()
204
205 def __call__(self, use_local=True, **kwargs):
~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
220 else:
221 opener = _opener
--> 222 return opener.open(url, data, timeout)
223
224 def install_opener(opener):
~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in open(self, fullurl, data, timeout)
529 for processor in self.process_response.get(protocol, []):
530 meth = getattr(processor, meth_name)
--> 531 response = meth(req, response)
532
533 return response
~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in http_response(self, request, response)
638 # request was successfully received, understood, and accepted.
639 if not (200 <= code < 300):
--> 640 response = self.parent.error(
641 'http', request, response, code, msg, hdrs)
642
~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in error(self, proto, *args)
567 if http_err:
568 args = (dict, 'default', 'http_error_default') + orig_args
--> 569 return self._call_chain(*args)
570
571 # XXX probably also want an abstract factory that knows when it makes
~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
500 for handler in handlers:
501 func = getattr(handler, meth_name)
--> 502 result = func(*args)
503 if result is not None:
504 return result
~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
647 class HTTPDefaultErrorHandler(BaseHandler):
648 def http_error_default(self, req, fp, code, msg, hdrs):
--> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp)
650
651 class HTTPRedirectHandler(BaseHandler):
HTTPError: HTTP Error 404: Not Found
Looks like 2.5.1 #390 includes the wrong version 2.5.0 in the published artifacts. Search for “version” here:
https://cdn.jsdelivr.net/npm/[email protected]/build/vega-datasets.module.js
This results in broken URLs that point to 2.5.0, which is missing data. For example, this:
world = (await import('[email protected]')).default['world-110m.json'].url
https://cdn.jsdelivr.net/npm/[email protected]/data/world-110m.json
https://github.com/vega/vega/blob/master/docs/data/volcano.json is used by Vega examples and we can only build examples in the editor if the datasets are in vega-datasets.
What's the source if this dataset? Is there a cleaner version that doesn't have the specific format that works well for just Vega?
Description
Hello, Vega team!
I hope you are doing well. I came across an issue while exploring datasets in the Vega dataset repository. Specifically, I found that 7 datasets in the following directory cannot be loaded using the pd.read_json(url) method:
https://github.com/vega/vega-datasets/tree/main/data
I would greatly appreciate it if you could take a look at this issue and provide a possible solution. If you need any additional information from me, please let me know.
Thank you for your time and attention!
Steps to Reproduce
Go to https://github.com/vega/vega-datasets/tree/main/data
Select any of the following 7 datasets:
https://raw.githubusercontent.com/vega/vega-datasets/main/data/annual-precip.json
https://raw.githubusercontent.com/vega/vega-datasets/main/data/earthquakes.json
https://raw.githubusercontent.com/vega/vega-datasets/main/data/londonBoroughs.json
https://raw.githubusercontent.com/vega/vega-datasets/main/data/londonTubeLines.json
https://raw.githubusercontent.com/vega/vega-datasets/main/data/miserables.json
https://raw.githubusercontent.com/vega/vega-datasets/main/data/us-10m.json
https://raw.githubusercontent.com/vega/vega-datasets/main/data/world-110m.json
Load the dataset using pd.read_json(url) method.
Observe that the dataset cannot be loaded.
Expected behavior
The datasets in the aforementioned directory should be able to be loaded using pd.read_json(url) method.
Actual behavior
7 of the datasets in the directory cannot be loaded using pd.read_json(url) method.
Additional Information
Operating System: Windows
Python version: 3.10.9
Pandas version: 1.5.2
list of Erorrs received:
[['ValueError', 'All arrays must be of the same length'],
['ValueError', 'All arrays must be of the same length'],
['ValueError', 'All arrays must be of the same length'],
['ValueError', 'All arrays must be of the same length'],
['ValueError', 'All arrays must be of the same length'],
['ValueError',
'Mixing dicts with non-Series may lead to ambiguous ordering.'],
['ValueError',
'Mixing dicts with non-Series may lead to ambiguous ordering.']]
See discussion in vega/vega-lite#3185
I work in data journalism and I think it would be cool to include some simple but famous datasets from our profession in your examples. If I submitted some would you be open to considering them?
1.18.0
to 1.19.0
.This version is covered by your current version range and after updating it in your project the build failed.
rollup is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.
The new version differs by 6 commits.
9af119d
1.19.0
b3f361c
Update changelog
456f4d2
Avoid variable from empty module name be empty (#3026)
17eaa43
Use id of last module in chunk as name base for auto-generated chunks (#3025)
871bfa0
Switch to a code-splitting build and update dependencies (#3020)
2443783
Unified file emission api (#2999)
See the full diff
There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.
Your Greenkeeper Bot 🌴
https://github.com/vega/vega-datasets/blob/gh-pages/data/weather.csv the date
field cannot be parsed automatically in safari.
The CSV loader in data.ts uses the following call:
return d3.csvParseRows(await result.text(), (d3 as any).autoType)
However, this method does not parse the header row as field names. This is the intended documented behavior of csvParseRows
(see https://github.com/d3/d3-dsv#dsv_parseRows).
We need to update vega-datasets to return a properly parsed CSV that uses the header row to determine object field names.
Oglala Lakota - 46102
in us_10m.jsonI think the addition of an "Open High Low Close" dataset would useful.
The VL example Candlestick Chart hard codes data that is found in an earlier Protovis example found here .
The dataset contains the performance of the Chicago Board Options Exchange Volatility Index (VIX) in the summer of 2009.
I think including this dataset would be especially useful for people in finance.
Possible names: vix.json
, vix-ohlc.json
or just ohlc.json
?
Let me know what your think and I can put together a PR.
I noticed that #388 changed build/vega-datasets.min.js to be an IIFE:
vega-datasets/rollup.config.js
Lines 40 to 46 in b2c5de0
Whereas build/vega-datasets.js is still a UMD:
vega-datasets/rollup.config.js
Lines 34 to 39 in b2c5de0
The problem is that require("vega-datasets")
on Observable will use your unpkg/jsdelivr entry point which points to the IIFE, and thus errors:
Lines 8 to 9 in b2c5de0
You can see it breaking here:
https://observablehq.com/@vega/vega-lite-input-binding
If you want to drop UMD support, we could fix that notebook (and presumably others) by using ES import instead of require:
world = (await import('vega-datasets')).default['world-110m.json'].url
But if you are supporting IIFE, maybe it’s worth continuing to support UMD for backwards compatibility?
Right now, the generated URLs point to github, which may change as we change the content of the files in master. Instead, we should generate URLs to jsdelivr and include the version number. This way we never change data without the user knowing.
As I continue to refine and expand the Altair example gallery, the barley dataset has become our standby for stacked bar charts.
It would be nice to fill out its sources entry in the same way we did the wheat dataset. Can someone here verify its origin?
The idea would be that someone can just load vega-datsets
and access the datasets (probably with async loading) or their URLs directly.
Can we delete it?
I'm finding it useful for creating simple bar chart examples in Altair. I'm interested to learn more about where the data comes from.
It appears that some invalid data has crept into cars.json
. Presumably, this is to highlight Voyager functionality, but it's had the knock-on effect of producing erroneous output with the Vega examples:
Perhaps cars.json
should contain the clean values, and a secondary duplicate data file introduces the error? Note: I believe this error was also introduced to the Vega test cases, so we'll want to update that too.
Any plans to update us-10m to represent the latest county FIPS.
it would be very helpful to have a machine readable licence file like path / spdx tsv
There are only 5 such files. I think one should remove these specials characters.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.