vega / vega-datasets Goto Github PK

View Code? Open in Web Editor NEW

256.0 23.0 206.0 10.33 MB

Common repository for example datasets used by Vega-related projects

JavaScript 8.38% Python 58.62% TypeScript 31.81% Shell 1.20%

datasets

vega-datasets's Issues

rename acceleration in cars.json to timeToXXXmph

Clean up for 2.0

For the 2.0 release, let's clean up datasets we don't need anymore.

Remove graticule
Consolidate weather datasets
Update the census dataset. #171
Update the CO2 dataset

Add one dataset for the sports fans

I suggest this top goal-scorers and footballers since 1980:

https://johnburnmurdoch.github.io/projects/goal-lines/all-comps/

An in-range update of rollup is breaking the build 🚨

The devDependency rollup was updated from `1.9.2` to `1.9.3`.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

rollup is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details

❌ continuous-integration/travis-ci/push: The Travis CI build could not complete due to an error (Details).
❌ Travis CI - Branch: The build errored.

Release Notes for v1.9.3

2019-04-10

Bug Fixes

Simplify return expressions that are evaluated before the surrounding function is bound (#2803)

Pull Requests

#2803: Handle out-of-order binding of identifiers to improve tree-shaking (@lukastaegert)

Commits

The new version differs by 3 commits.

516a06d 1.9.3
a5526ea Update changelog
c3d73ff Handle out-of-order binding of identifiers to improve tree-shaking (#2803)

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.

Your Greenkeeper Bot 🌴

An in-range update of rollup is breaking the build 🚨

The devDependency rollup was updated from `1.14.2` to `1.14.3`.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

Status Details

❌ continuous-integration/travis-ci/push: The Travis CI build could not complete due to an error (Details).
❌ Travis CI - Branch: The build errored.

Release Notes for v1.14.3

2019-06-06

Bug Fixes

Generate correct external imports when importing from a directory that would be above the root of the current working directory (#2902)

Pull Requests

#2902: Use browser relative path algorithm for chunks (@lukastaegert)

Commits

The new version differs by 4 commits.

c68bd95 1.14.3
d79aa57 Update changelog
7179390 Use browser relative path algorithm for chunks (#2902)
b1df517 Add funding button

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.

Your Greenkeeper Bot 🌴

missing datasets for vega 3 examples

I'm trying to run the examples specs in latest vega release, but am getting some errors with missing datasets, including flare-dependencies.json from https://github.com/vega/vega/blob/master/spec/tree-radial-bundle.vg.json, normal-2.json and flights-200k.json. Are these available somewhere

(apologies if this issue is in the wrong spot)

Unable to process .arrow file in the datasets

A general demonstration is outlined here in the google collar file: https://colab.research.google.com/drive/1oKhivD5T9Yi1gMl0_7dUwqVFqiNfD43k?usp=sharing

The 'flights-200k.arrow" is producing an error every time I tried to read in the file using Pandas package.

License?

Any license info on this repo?

Seattle temperature data incorrect?

The seattle-temps dataset claims that the temperature in Seattle never rose above 76 degrees F in 2010:

Vega Editor Link

According to my own memory, the temperature was much hotter. Other more reliable sources agree; for example, weather underground claims that Seattle hit 96 degrees F in August 2010: https://www.wunderground.com/history/monthly/us/wa/seattle/KSEA/date/2010-8

Perhaps the dataset is mislabeled?

movies.json Release Date is sometimes in the future

If you are already aware of this, ignore this.

But when using the data/movies.json for example on line 11 the movie "Duel in the Sun" has a Release Date of "Dec 31 2046". When the movie actually was released in 1946.

There are numerous places where this mixup happens.

Update sf-temps

I figured once the Seattle temperatures are updated (#127) we should also update the corresponding San Francisco temperatures. I've already downloaded the data from the San Francisco International Airport weather station.

Making a note of it here so we don't forget.

Movies release dates off by 100 years

There are several movies listed in the movies.json dataset (everything after 2011) that are listed as having come out 100 years after they were actually released. It's a pretty quick fix and I'm wondering if a pull request would be welcome to either fix the data or create a new dataset with corrected data.

add vega-datasets JS notebook to docs ...

@domoritz sounds good! I'll do one better for you guys:

I woke up this morning thinking we could use a simple vega datasets preview js notebook :)

https://observablehq.com/@randomfractals/vega-datasets

I'll leave it up to you guys if you want to add this vega datasets preview utility Observable notebook to your editor or this datasets repo readme.md.

This notebook can be used as a supplemental tool for online vega editor and examples that use these data sources since it's much faster in data loading and scrolling than what github and vega editor provides.

I might add something similar to the https://github.com/RandomFractals/vscode-vega-viewer as a split panel in vega chart preview in the next major release.

cc @kanitw @arvind & @jheer

Cheers! 🤗

Originally posted by @RandomFractals in #64 (comment)

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on Greenkeeper branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.

Since we didn’t receive a CI status on the greenkeeper/initial branch, it’s possible that you don’t have CI set up yet. We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/.

Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please click the 'fix repo' button on account.greenkeeper.io.

Address data inconsistencies and absence of versioning or sourcing in gapminder data

Gapminder data, from a Swedish non-profit, is a popular part of this repository and truly fascinating to explore using visualization tools in the vega ecosystem. While working on an Altair example, I discovered what looked like a simple issue in the gapminder.json dataset, but as I looked into fixing it with a simple pull request, the right solution seemed a bit more complex, and I wanted to lay out my thoughts here for feedback.

The immediate issue I found is that it looks like life expectancy data between North and South Korea has been swapped. For 2005, this repository's dataset shows South Korea's life expectancy as 67.297 years and North Korea's as 78.623 years. This contradicts current Gapminder life expectancy data (v14), which reports approximately the reverse. This raises questions about other errors lurking in the dataset.

Resolving this issue is complicated by the absence of sourcing or versioning details for the gapminder data in SOURCES.md. The json file in this repository appears to be based on an older version of the dataset that I could not locate. For instance, Afghanistan's 1955 life expectancy is 30.332 years in the vega-datasets json, which aligns closely with Gapminder's v11 data (32.48 years), but differs from the current v14 (43.88 years).

Given what the vega-datasets README states about versioning, there seem to be a few options for a solution:

Patch release: If the Korea data swap is confirmed as a formatting error, it could potentially be addressed in a patch release. That said, I still haven't been able to locate an older version of a Gapminder file containing data that matches the vega-datasets json.
Minor release: Updating the dataset with current Gapminder figures without changing field names or file names could be done in a minor release. This could address the outdated data issue. But the data could be significantly different (as in the Afghanistan life expectancy data) and some country names may have changed.
Major release: If we need to change field names (e.g., updating regional classification field name "cluster" to align with current Gapminder terminology) or significantly alter file contents, a major release would be necessary.

Regardless of the chosen approach, I propose:

Considering whether to add a disclaimer of some kind in the repository about the intended / appropriate use cases for the data (given the repository can have errors, may be out of date, isn't actively maintained, that it's more for demo purposes) and/or encouraging that non-demonstration use cases refer back to the original sources rather than rely on the vega-datasets repository.
Considering how best to adhere to appropriate sourcing requirements for datasets, such as attribution. Gapminder's license page lists attribution requirements.
Updating SOURCES.md with detailed sourcing information
It is also worth considering the handling of the other gapminder file in vega-datasets, gapminder-health-income.csv, which I haven't looked at.

Birdstrikes dataset missing

When trying to load the birdstrikes data, a 404 error is thrown.

vega_datasets: 0.8
Ubuntu 20.04 LTS

Python 3.8.3 (default, May 19 2020, 18:47:26) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.15.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from vega_datasets import data                                                            

In [2]: bird = data.birdstrikes()                                                                 
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-2-a4e393dbcf55> in <module>
----> 1 bird = data.birdstrikes()

~/miniconda3/envs/cookbook/lib/python3.8/site-packages/vega_datasets/core.py in __call__(self, use_local, **kwargs)
    222             parsed data
    223         """
--> 224         datasource = BytesIO(self.raw(use_local=use_local))
    225 
    226         kwds = self._pd_read_kwds.copy()

~/miniconda3/envs/cookbook/lib/python3.8/site-packages/vega_datasets/core.py in raw(self, use_local)
    201             return pkgutil.get_data("vega_datasets", self.pkg_filename)
    202         else:
--> 203             return urlopen(self.url).read()
    204 
    205     def __call__(self, use_local=True, **kwargs):

~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in http_response(self, request, response)
    638         # request was successfully received, understood, and accepted.
    639         if not (200 <= code < 300):
--> 640             response = self.parent.error(
    641                 'http', request, response, code, msg, hdrs)
    642 

~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    500         for handler in handlers:
    501             func = getattr(handler, meth_name)
--> 502             result = func(*args)
    503             if result is not None:
    504                 return result

~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 404: Not Found

Can not load earthquakes dataset

I couldn't load the earthquakes dataset, then i tried to manually download the json file and tried to read with pandas, the same error occured.

Published build artifacts have the wrong version (2.5.0 instead of 2.5.1)

Looks like 2.5.1 #390 includes the wrong version 2.5.0 in the published artifacts. Search for “version” here:

https://cdn.jsdelivr.net/npm/[email protected]/build/vega-datasets.module.js

This results in broken URLs that point to 2.5.0, which is missing data. For example, this:

world = (await import('[email protected]')).default['world-110m.json'].url

https://cdn.jsdelivr.net/npm/[email protected]/data/world-110m.json

Add volcano.json

https://github.com/vega/vega/blob/master/docs/data/volcano.json is used by Vega examples and we can only build examples in the editor if the datasets are in vega-datasets.

What's the source if this dataset? Is there a cleaner version that doesn't have the specific format that works well for just Vega?

7 datasets that cannot be loaded

Description
Hello, Vega team!

I hope you are doing well. I came across an issue while exploring datasets in the Vega dataset repository. Specifically, I found that 7 datasets in the following directory cannot be loaded using the pd.read_json(url) method:

https://github.com/vega/vega-datasets/tree/main/data

I would greatly appreciate it if you could take a look at this issue and provide a possible solution. If you need any additional information from me, please let me know.

Thank you for your time and attention!

Load the dataset using pd.read_json(url) method.
Observe that the dataset cannot be loaded.

Expected behavior
The datasets in the aforementioned directory should be able to be loaded using pd.read_json(url) method.

Actual behavior
7 of the datasets in the directory cannot be loaded using pd.read_json(url) method.

Additional Information
Operating System: Windows
Python version: 3.10.9
Pandas version: 1.5.2

list of Erorrs received:
[['ValueError', 'All arrays must be of the same length'],
['ValueError', 'All arrays must be of the same length'],
['ValueError', 'All arrays must be of the same length'],
['ValueError', 'All arrays must be of the same length'],
['ValueError', 'All arrays must be of the same length'],
['ValueError',
'Mixing dicts with non-Series may lead to ambiguous ordering.'],
['ValueError',
'Mixing dicts with non-Series may lead to ambiguous ordering.']]

Replace dates with safe dates

See discussion in vega/vega-lite#3185

Are you open to including more datasets?

I work in data journalism and I think it would be cool to include some simple but famous datasets from our profession in your examples. If I submitted some would you be open to considering them?

An in-range update of rollup is breaking the build 🚨

The devDependency rollup was updated from `1.18.0` to `1.19.0`.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

Status Details

❌ continuous-integration/travis-ci/push: The Travis CI build failed (Details).
❌ Travis CI - Branch: The build failed.

Commits

The new version differs by 6 commits.

9af119d 1.19.0
b3f361c Update changelog
456f4d2 Avoid variable from empty module name be empty (#3026)
17eaa43 Use id of last module in chunk as name base for auto-generated chunks (#3025)
871bfa0 Switch to a code-splitting build and update dependencies (#3020)
2443783 Unified file emission api (#2999)

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.

Your Greenkeeper Bot 🌴

weather.csv date parsing doesn't work in Safari

https://github.com/vega/vega-datasets/blob/gh-pages/data/weather.csv the date field cannot be parsed automatically in safari.

CSV parser treats header rows as data

The CSV loader in data.ts uses the following call:
return d3.csvParseRows(await result.text(), (d3 as any).autoType)

However, this method does not parse the header row as field names. This is the intended documented behavior of csvParseRows (see https://github.com/d3/d3-dsv#dsv_parseRows).

We need to update vega-datasets to return a properly parsed CSV that uses the header row to determine object field names.

Oglala Lakota County is missing

Thanks for your great work
Can't find Oglala Lakota - 46102 in us_10m.json

Add OHLC Data

I think the addition of an "Open High Low Close" dataset would useful.

The VL example Candlestick Chart hard codes data that is found in an earlier Protovis example found here .

The dataset contains the performance of the Chicago Board Options Exchange Volatility Index (VIX) in the summer of 2009.

I think including this dataset would be especially useful for people in finance.

Possible names: vix.json, vix-ohlc.json or just ohlc.json?

Let me know what your think and I can put together a PR.

build/vega-datasets.min.js is now an iife, breaking require

I noticed that #388 changed build/vega-datasets.min.js to be an IIFE:

vega-datasets/rollup.config.js

Lines 40 to 46 in b2c5de0

 { 

 file: "build/vega-datasets.min.js", 

 format: "iife", 

 sourcemap: true, 

 name: "vegaDatasets", 

 plugins: [terser()], 

 },

Whereas build/vega-datasets.js is still a UMD:

vega-datasets/rollup.config.js

Lines 34 to 39 in b2c5de0

 { 

 file: "build/vega-datasets.js", 

 format: "umd", 

 sourcemap: true, 

 name: "vegaDatasets", 

 },

The problem is that require("vega-datasets") on Observable will use your unpkg/jsdelivr entry point which points to the IIFE, and thus errors:

vega-datasets/package.json

Lines 8 to 9 in b2c5de0

 "unpkg": "build/vega-datasets.min.js", 

 "jsdelivr": "build/vega-datasets.min.js",

You can see it breaking here:

https://observablehq.com/@vega/vega-lite-input-binding

If you want to drop UMD support, we could fix that notebook (and presumably others) by using ES import instead of require:

world = (await import('vega-datasets')).default['world-110m.json'].url

But if you are supporting IIFE, maybe it’s worth continuing to support UMD for backwards compatibility?

	{
	file: "build/vega-datasets.min.js",
	format: "iife",
	sourcemap: true,
	name: "vegaDatasets",
	plugins: [terser()],
	},

	{
	file: "build/vega-datasets.js",
	format: "umd",
	sourcemap: true,
	name: "vegaDatasets",
	},

	"unpkg": "build/vega-datasets.min.js",
	"jsdelivr": "build/vega-datasets.min.js",

vega / vega-datasets Goto Github PK

vega-datasets's Issues

The devDependency rollup was updated from 1.9.2 to 1.9.3.

Bug Fixes

Pull Requests

The devDependency rollup was updated from 1.14.2 to 1.14.3.

Bug Fixes

Pull Requests

The devDependency rollup was updated from 1.18.0 to 1.19.0.

Recommend Projects

Recommend Topics

Recommend Org

The devDependency rollup was updated from `1.9.2` to `1.9.3`.

The devDependency rollup was updated from `1.14.2` to `1.14.3`.

The devDependency rollup was updated from `1.18.0` to `1.19.0`.