Git Product home page Git Product logo

vega-datasets's Introduction

Vega Datasets

npm version Build Status

Collection of datasets used in Vega and Vega-Lite examples. This data lives at https://github.com/vega/vega-datasets and https://cdn.jsdelivr.net/npm/vega-datasets.

Common repository for example datasets used by Vega related projects. Keep changes to this repository minimal as other projects (Vega, Vega Editor, Vega-Lite, Polestar, Voyager) use this data in their tests and for examples.

The list of sources is in SOURCES.md.

To access the data in Observable, you can import vega-dataset. Try our example notebook. To access these datasets from Python, you can use the Vega datasets python package. To access them from Julia, you can use the VegaDatasets.jl julia package.

Versioning

We use semantic versioning. However, since this package serve datasets we have additional rules about how we version data.

We do not change data in patch releases except to resolve formatting issues. Minor releases may change the data but only update datasets in ways that do not change field names or file names. Minor releases may also add datasets. Major versions may change file names, file contents, and remove or update files.

How to use it

HTTP

You can also get the data directly via HTTP served by GitHub or jsDelivr (a fast CDN) like:

https://vega.github.io/vega-datasets/data/cars.json or with a fixed version (recommended) such as https://cdn.jsdelivr.net/npm/vega-datasets@2/data/cars.json.

You can find a full listing of the available datasets at https://cdn.jsdelivr.net/npm/vega-datasets/data/.

NPM

Get the data on disk

npm i vega-datasets

Now you have all the datasets in a folder in node_modules/vega-datasets/data/.

Get the URLs or Data via URL

npm i vega-datasets

Now you can import data = require('vega-datasets') and access the URLs of any dataset with data[NAME].url. data[NAME]() returns a promise that resolves to the actual data fetched from the URL. We use d3-dsv to parse CSV files.

Here is a full example

import data from 'vega-datasets';

const cars = await data['cars.json']();
// equivalent to
// const cars = await (await fetch(data['cars.json'].url)).json();

console.log(cars);

Development process

Install dependencies with yarn.

Release process

To make a release, run npm run release.

vega-datasets's People

Contributors

arvind avatar avatorl avatar chanwutk avatar davidanthoff avatar dependabot-preview[bot] avatar dependabot[bot] avatar domoritz avatar eitanlees avatar greenkeeper[bot] avatar hydrosquall avatar ionathan avatar jakevdp avatar jheer avatar jwolondon avatar kanitw avatar lawlesst avatar light-and-salt avatar mcnuttandrew avatar mcorrell avatar p42-ai[bot] avatar palewire avatar pbi-david avatar rileychang avatar visnup avatar willium avatar ydlamba avatar yhoonkim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vega-datasets's Issues

License?

Any license info on this repo?

An in-range update of rollup is breaking the build 🚨

The devDependency rollup was updated from 1.9.2 to 1.9.3.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

rollup is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build could not complete due to an error (Details).
  • Travis CI - Branch: The build errored.

Release Notes for v1.9.3

2019-04-10

Bug Fixes

  • Simplify return expressions that are evaluated before the surrounding function is bound (#2803)

Pull Requests

  • #2803: Handle out-of-order binding of identifiers to improve tree-shaking (@lukastaegert)
Commits

The new version differs by 3 commits.

  • 516a06d 1.9.3
  • a5526ea Update changelog
  • c3d73ff Handle out-of-order binding of identifiers to improve tree-shaking (#2803)

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

build/vega-datasets.min.js is now an iife, breaking require

I noticed that #388 changed build/vega-datasets.min.js to be an IIFE:

{
file: "build/vega-datasets.min.js",
format: "iife",
sourcemap: true,
name: "vegaDatasets",
plugins: [terser()],
},

Whereas build/vega-datasets.js is still a UMD:

{
file: "build/vega-datasets.js",
format: "umd",
sourcemap: true,
name: "vegaDatasets",
},

The problem is that require("vega-datasets") on Observable will use your unpkg/jsdelivr entry point which points to the IIFE, and thus errors:

"unpkg": "build/vega-datasets.min.js",
"jsdelivr": "build/vega-datasets.min.js",

You can see it breaking here:

https://observablehq.com/@vega/vega-lite-input-binding

If you want to drop UMD support, we could fix that notebook (and presumably others) by using ES import instead of require:

world = (await import('vega-datasets')).default['world-110m.json'].url

But if you are supporting IIFE, maybe it’s worth continuing to support UMD for backwards compatibility?

Birdstrikes dataset missing

When trying to load the birdstrikes data, a 404 error is thrown.

vega_datasets: 0.8
Ubuntu 20.04 LTS

Python 3.8.3 (default, May 19 2020, 18:47:26) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.15.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from vega_datasets import data                                                            

In [2]: bird = data.birdstrikes()                                                                 
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-2-a4e393dbcf55> in <module>
----> 1 bird = data.birdstrikes()

~/miniconda3/envs/cookbook/lib/python3.8/site-packages/vega_datasets/core.py in __call__(self, use_local, **kwargs)
    222             parsed data
    223         """
--> 224         datasource = BytesIO(self.raw(use_local=use_local))
    225 
    226         kwds = self._pd_read_kwds.copy()

~/miniconda3/envs/cookbook/lib/python3.8/site-packages/vega_datasets/core.py in raw(self, use_local)
    201             return pkgutil.get_data("vega_datasets", self.pkg_filename)
    202         else:
--> 203             return urlopen(self.url).read()
    204 
    205     def __call__(self, use_local=True, **kwargs):

~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in http_response(self, request, response)
    638         # request was successfully received, understood, and accepted.
    639         if not (200 <= code < 300):
--> 640             response = self.parent.error(
    641                 'http', request, response, code, msg, hdrs)
    642 

~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    500         for handler in handlers:
    501             func = getattr(handler, meth_name)
--> 502             result = func(*args)
    503             if result is not None:
    504                 return result

~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 404: Not Found

Movies release dates off by 100 years

There are several movies listed in the movies.json dataset (everything after 2011) that are listed as having come out 100 years after they were actually released. It's a pretty quick fix and I'm wondering if a pull request would be welcome to either fix the data or create a new dataset with corrected data.

Screen Shot 2019-11-24 at 5 59 43 PM

An in-range update of rollup is breaking the build 🚨

The devDependency rollup was updated from 1.18.0 to 1.19.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

rollup is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build failed (Details).
  • Travis CI - Branch: The build failed.

Commits

The new version differs by 6 commits.

  • 9af119d 1.19.0
  • b3f361c Update changelog
  • 456f4d2 Avoid variable from empty module name be empty (#3026)
  • 17eaa43 Use id of last module in chunk as name base for auto-generated chunks (#3025)
  • 871bfa0 Switch to a code-splitting build and update dependencies (#3020)
  • 2443783 Unified file emission api (#2999)

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

add vega-datasets JS notebook to docs ...

@domoritz sounds good! I'll do one better for you guys:

I woke up this morning thinking we could use a simple vega datasets preview js notebook :)

https://observablehq.com/@randomfractals/vega-datasets

vega-datasets-notebook

I'll leave it up to you guys if you want to add this vega datasets preview utility Observable notebook to your editor or this datasets repo readme.md.

This notebook can be used as a supplemental tool for online vega editor and examples that use these data sources since it's much faster in data loading and scrolling than what github and vega editor provides.

I might add something similar to the https://github.com/RandomFractals/vscode-vega-viewer as a split panel in vega chart preview in the next major release.

cc @kanitw @arvind & @jheer

Cheers! 🤗

Originally posted by @RandomFractals in #64 (comment)

Add OHLC Data

I think the addition of an "Open High Low Close" dataset would useful.

The VL example Candlestick Chart hard codes data that is found in an earlier Protovis example found here .

The dataset contains the performance of the Chicago Board Options Exchange Volatility Index (VIX) in the summer of 2009.

I think including this dataset would be especially useful for people in finance.

Possible names: vix.json, vix-ohlc.json or just ohlc.json?

Let me know what your think and I can put together a PR.

An in-range update of rollup is breaking the build 🚨

The devDependency rollup was updated from 1.14.2 to 1.14.3.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

rollup is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build could not complete due to an error (Details).
  • Travis CI - Branch: The build errored.

Release Notes for v1.14.3

2019-06-06

Bug Fixes

  • Generate correct external imports when importing from a directory that would be above the root of the current working directory (#2902)

Pull Requests

Commits

The new version differs by 4 commits.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Provide a js module

The idea would be that someone can just load vega-datsets and access the datasets (probably with async loading) or their URLs directly.

cars.json contains invalid data

It appears that some invalid data has crept into cars.json. Presumably, this is to highlight Voyager functionality, but it's had the knock-on effect of producing erroneous output with the Vega examples:

Cars Parallel Coords

Perhaps cars.json should contain the clean values, and a secondary duplicate data file introduces the error? Note: I believe this error was also introduced to the Vega test cases, so we'll want to update that too.

What is the origin of the barley dataset?

As I continue to refine and expand the Altair example gallery, the barley dataset has become our standby for stacked bar charts.

It would be nice to fill out its sources entry in the same way we did the wheat dataset. Can someone here verify its origin?

Update sf-temps

I figured once the Seattle temperatures are updated (#127) we should also update the corresponding San Francisco temperatures. I've already downloaded the data from the San Francisco International Airport weather station.

Screen Shot 2019-12-18 at 10 03 35 AM

Making a note of it here so we don't forget.

Clean up for 2.0

For the 2.0 release, let's clean up datasets we don't need anymore.

  • Remove graticule
  • Consolidate weather datasets
  • Update the census dataset. #171
  • Update the CO2 dataset

What is the source of wheat.json?

I'm finding it useful for creating simple bar chart examples in Altair. I'm interested to learn more about where the data comes from.

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on Greenkeeper branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.

Since we didn’t receive a CI status on the greenkeeper/initial branch, it’s possible that you don’t have CI set up yet. We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/.

Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please click the 'fix repo' button on account.greenkeeper.io.

Are you open to including more datasets?

I work in data journalism and I think it would be cool to include some simple but famous datasets from our profession in your examples. If I submitted some would you be open to considering them?

Urls should point to stable source

Right now, the generated URLs point to github, which may change as we change the content of the files in master. Instead, we should generate URLs to jsdelivr and include the version number. This way we never change data without the user knowing.

Can not load earthquakes dataset

I couldn't load the earthquakes dataset, then i tried to manually download the json file and tried to read with pandas, the same error occured.

Screenshot 2022-02-19 163946

CSV parser treats header rows as data

The CSV loader in data.ts uses the following call:
return d3.csvParseRows(await result.text(), (d3 as any).autoType)

However, this method does not parse the header row as field names. This is the intended documented behavior of csvParseRows (see https://github.com/d3/d3-dsv#dsv_parseRows).

We need to update vega-datasets to return a properly parsed CSV that uses the header row to determine object field names.

Published build artifacts have the wrong version (2.5.0 instead of 2.5.1)

Looks like 2.5.1 #390 includes the wrong version 2.5.0 in the published artifacts. Search for “version” here:

https://cdn.jsdelivr.net/npm/[email protected]/build/vega-datasets.module.js

This results in broken URLs that point to 2.5.0, which is missing data. For example, this:

world = (await import('[email protected]')).default['world-110m.json'].url

https://cdn.jsdelivr.net/npm/[email protected]/data/world-110m.json

7 datasets that cannot be loaded

Description
Hello, Vega team!

I hope you are doing well. I came across an issue while exploring datasets in the Vega dataset repository. Specifically, I found that 7 datasets in the following directory cannot be loaded using the pd.read_json(url) method:

https://github.com/vega/vega-datasets/tree/main/data

I would greatly appreciate it if you could take a look at this issue and provide a possible solution. If you need any additional information from me, please let me know.

Thank you for your time and attention!

Steps to Reproduce
Go to https://github.com/vega/vega-datasets/tree/main/data
Select any of the following 7 datasets:
https://raw.githubusercontent.com/vega/vega-datasets/main/data/annual-precip.json
https://raw.githubusercontent.com/vega/vega-datasets/main/data/earthquakes.json
https://raw.githubusercontent.com/vega/vega-datasets/main/data/londonBoroughs.json
https://raw.githubusercontent.com/vega/vega-datasets/main/data/londonTubeLines.json
https://raw.githubusercontent.com/vega/vega-datasets/main/data/miserables.json
https://raw.githubusercontent.com/vega/vega-datasets/main/data/us-10m.json
https://raw.githubusercontent.com/vega/vega-datasets/main/data/world-110m.json

Load the dataset using pd.read_json(url) method.
Observe that the dataset cannot be loaded.

Expected behavior
The datasets in the aforementioned directory should be able to be loaded using pd.read_json(url) method.

Actual behavior
7 of the datasets in the directory cannot be loaded using pd.read_json(url) method.

Additional Information
Operating System: Windows
Python version: 3.10.9
Pandas version: 1.5.2

list of Erorrs received:
[['ValueError', 'All arrays must be of the same length'],
['ValueError', 'All arrays must be of the same length'],
['ValueError', 'All arrays must be of the same length'],
['ValueError', 'All arrays must be of the same length'],
['ValueError', 'All arrays must be of the same length'],
['ValueError',
'Mixing dicts with non-Series may lead to ambiguous ordering.'],
['ValueError',
'Mixing dicts with non-Series may lead to ambiguous ordering.']]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.