Git Product home page Git Product logo

vega-datasets's Issues

Clean up for 2.0

For the 2.0 release, let's clean up datasets we don't need anymore.

  • Remove graticule
  • Consolidate weather datasets
  • Update the census dataset. #171
  • Update the CO2 dataset

An in-range update of rollup is breaking the build 🚨

The devDependency rollup was updated from 1.9.2 to 1.9.3.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

rollup is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build could not complete due to an error (Details).
  • Travis CI - Branch: The build errored.

Release Notes for v1.9.3

2019-04-10

Bug Fixes

  • Simplify return expressions that are evaluated before the surrounding function is bound (#2803)

Pull Requests

  • #2803: Handle out-of-order binding of identifiers to improve tree-shaking (@lukastaegert)
Commits

The new version differs by 3 commits.

  • 516a06d 1.9.3
  • a5526ea Update changelog
  • c3d73ff Handle out-of-order binding of identifiers to improve tree-shaking (#2803)

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

An in-range update of rollup is breaking the build 🚨

The devDependency rollup was updated from 1.14.2 to 1.14.3.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

rollup is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build could not complete due to an error (Details).
  • Travis CI - Branch: The build errored.

Release Notes for v1.14.3

2019-06-06

Bug Fixes

  • Generate correct external imports when importing from a directory that would be above the root of the current working directory (#2902)

Pull Requests

Commits

The new version differs by 4 commits.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

License?

Any license info on this repo?

Update sf-temps

I figured once the Seattle temperatures are updated (#127) we should also update the corresponding San Francisco temperatures. I've already downloaded the data from the San Francisco International Airport weather station.

Screen Shot 2019-12-18 at 10 03 35 AM

Making a note of it here so we don't forget.

Movies release dates off by 100 years

There are several movies listed in the movies.json dataset (everything after 2011) that are listed as having come out 100 years after they were actually released. It's a pretty quick fix and I'm wondering if a pull request would be welcome to either fix the data or create a new dataset with corrected data.

Screen Shot 2019-11-24 at 5 59 43 PM

add vega-datasets JS notebook to docs ...

@domoritz sounds good! I'll do one better for you guys:

I woke up this morning thinking we could use a simple vega datasets preview js notebook :)

https://observablehq.com/@randomfractals/vega-datasets

vega-datasets-notebook

I'll leave it up to you guys if you want to add this vega datasets preview utility Observable notebook to your editor or this datasets repo readme.md.

This notebook can be used as a supplemental tool for online vega editor and examples that use these data sources since it's much faster in data loading and scrolling than what github and vega editor provides.

I might add something similar to the https://github.com/RandomFractals/vscode-vega-viewer as a split panel in vega chart preview in the next major release.

cc @kanitw @arvind & @jheer

Cheers! 🤗

Originally posted by @RandomFractals in #64 (comment)

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on Greenkeeper branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.

Since we didn’t receive a CI status on the greenkeeper/initial branch, it’s possible that you don’t have CI set up yet. We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/.

Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please click the 'fix repo' button on account.greenkeeper.io.

Address data inconsistencies and absence of versioning or sourcing in gapminder data

Gapminder data, from a Swedish non-profit, is a popular part of this repository and truly fascinating to explore using visualization tools in the vega ecosystem. While working on an Altair example, I discovered what looked like a simple issue in the gapminder.json dataset, but as I looked into fixing it with a simple pull request, the right solution seemed a bit more complex, and I wanted to lay out my thoughts here for feedback.

The immediate issue I found is that it looks like life expectancy data between North and South Korea has been swapped. For 2005, this repository's dataset shows South Korea's life expectancy as 67.297 years and North Korea's as 78.623 years. This contradicts current Gapminder life expectancy data (v14), which reports approximately the reverse. This raises questions about other errors lurking in the dataset.

Resolving this issue is complicated by the absence of sourcing or versioning details for the gapminder data in SOURCES.md. The json file in this repository appears to be based on an older version of the dataset that I could not locate. For instance, Afghanistan's 1955 life expectancy is 30.332 years in the vega-datasets json, which aligns closely with Gapminder's v11 data (32.48 years), but differs from the current v14 (43.88 years).

Given what the vega-datasets README states about versioning, there seem to be a few options for a solution:

  1. Patch release: If the Korea data swap is confirmed as a formatting error, it could potentially be addressed in a patch release. That said, I still haven't been able to locate an older version of a Gapminder file containing data that matches the vega-datasets json.

  2. Minor release: Updating the dataset with current Gapminder figures without changing field names or file names could be done in a minor release. This could address the outdated data issue. But the data could be significantly different (as in the Afghanistan life expectancy data) and some country names may have changed.

  3. Major release: If we need to change field names (e.g., updating regional classification field name "cluster" to align with current Gapminder terminology) or significantly alter file contents, a major release would be necessary.

Regardless of the chosen approach, I propose:

  1. Considering whether to add a disclaimer of some kind in the repository about the intended / appropriate use cases for the data (given the repository can have errors, may be out of date, isn't actively maintained, that it's more for demo purposes) and/or encouraging that non-demonstration use cases refer back to the original sources rather than rely on the vega-datasets repository.
  2. Considering how best to adhere to appropriate sourcing requirements for datasets, such as attribution. Gapminder's license page lists attribution requirements.
  3. Updating SOURCES.md with detailed sourcing information
  4. It is also worth considering the handling of the other gapminder file in vega-datasets, gapminder-health-income.csv, which I haven't looked at.

Birdstrikes dataset missing

When trying to load the birdstrikes data, a 404 error is thrown.

vega_datasets: 0.8
Ubuntu 20.04 LTS

Python 3.8.3 (default, May 19 2020, 18:47:26) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.15.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from vega_datasets import data                                                            

In [2]: bird = data.birdstrikes()                                                                 
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-2-a4e393dbcf55> in <module>
----> 1 bird = data.birdstrikes()

~/miniconda3/envs/cookbook/lib/python3.8/site-packages/vega_datasets/core.py in __call__(self, use_local, **kwargs)
    222             parsed data
    223         """
--> 224         datasource = BytesIO(self.raw(use_local=use_local))
    225 
    226         kwds = self._pd_read_kwds.copy()

~/miniconda3/envs/cookbook/lib/python3.8/site-packages/vega_datasets/core.py in raw(self, use_local)
    201             return pkgutil.get_data("vega_datasets", self.pkg_filename)
    202         else:
--> 203             return urlopen(self.url).read()
    204 
    205     def __call__(self, use_local=True, **kwargs):

~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in http_response(self, request, response)
    638         # request was successfully received, understood, and accepted.
    639         if not (200 <= code < 300):
--> 640             response = self.parent.error(
    641                 'http', request, response, code, msg, hdrs)
    642 

~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    500         for handler in handlers:
    501             func = getattr(handler, meth_name)
--> 502             result = func(*args)
    503             if result is not None:
    504                 return result

~/miniconda3/envs/cookbook/lib/python3.8/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 404: Not Found

Can not load earthquakes dataset

I couldn't load the earthquakes dataset, then i tried to manually download the json file and tried to read with pandas, the same error occured.

Screenshot 2022-02-19 163946

Published build artifacts have the wrong version (2.5.0 instead of 2.5.1)

Looks like 2.5.1 #390 includes the wrong version 2.5.0 in the published artifacts. Search for “version” here:

https://cdn.jsdelivr.net/npm/[email protected]/build/vega-datasets.module.js

This results in broken URLs that point to 2.5.0, which is missing data. For example, this:

world = (await import('[email protected]')).default['world-110m.json'].url

https://cdn.jsdelivr.net/npm/[email protected]/data/world-110m.json

7 datasets that cannot be loaded

Description
Hello, Vega team!

I hope you are doing well. I came across an issue while exploring datasets in the Vega dataset repository. Specifically, I found that 7 datasets in the following directory cannot be loaded using the pd.read_json(url) method:

https://github.com/vega/vega-datasets/tree/main/data

I would greatly appreciate it if you could take a look at this issue and provide a possible solution. If you need any additional information from me, please let me know.

Thank you for your time and attention!

Steps to Reproduce
Go to https://github.com/vega/vega-datasets/tree/main/data
Select any of the following 7 datasets:
https://raw.githubusercontent.com/vega/vega-datasets/main/data/annual-precip.json
https://raw.githubusercontent.com/vega/vega-datasets/main/data/earthquakes.json
https://raw.githubusercontent.com/vega/vega-datasets/main/data/londonBoroughs.json
https://raw.githubusercontent.com/vega/vega-datasets/main/data/londonTubeLines.json
https://raw.githubusercontent.com/vega/vega-datasets/main/data/miserables.json
https://raw.githubusercontent.com/vega/vega-datasets/main/data/us-10m.json
https://raw.githubusercontent.com/vega/vega-datasets/main/data/world-110m.json

Load the dataset using pd.read_json(url) method.
Observe that the dataset cannot be loaded.

Expected behavior
The datasets in the aforementioned directory should be able to be loaded using pd.read_json(url) method.

Actual behavior
7 of the datasets in the directory cannot be loaded using pd.read_json(url) method.

Additional Information
Operating System: Windows
Python version: 3.10.9
Pandas version: 1.5.2

list of Erorrs received:
[['ValueError', 'All arrays must be of the same length'],
['ValueError', 'All arrays must be of the same length'],
['ValueError', 'All arrays must be of the same length'],
['ValueError', 'All arrays must be of the same length'],
['ValueError', 'All arrays must be of the same length'],
['ValueError',
'Mixing dicts with non-Series may lead to ambiguous ordering.'],
['ValueError',
'Mixing dicts with non-Series may lead to ambiguous ordering.']]

Are you open to including more datasets?

I work in data journalism and I think it would be cool to include some simple but famous datasets from our profession in your examples. If I submitted some would you be open to considering them?

An in-range update of rollup is breaking the build 🚨

The devDependency rollup was updated from 1.18.0 to 1.19.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

rollup is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build failed (Details).
  • Travis CI - Branch: The build failed.

Commits

The new version differs by 6 commits.

  • 9af119d 1.19.0
  • b3f361c Update changelog
  • 456f4d2 Avoid variable from empty module name be empty (#3026)
  • 17eaa43 Use id of last module in chunk as name base for auto-generated chunks (#3025)
  • 871bfa0 Switch to a code-splitting build and update dependencies (#3020)
  • 2443783 Unified file emission api (#2999)

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

CSV parser treats header rows as data

The CSV loader in data.ts uses the following call:
return d3.csvParseRows(await result.text(), (d3 as any).autoType)

However, this method does not parse the header row as field names. This is the intended documented behavior of csvParseRows (see https://github.com/d3/d3-dsv#dsv_parseRows).

We need to update vega-datasets to return a properly parsed CSV that uses the header row to determine object field names.

Add OHLC Data

I think the addition of an "Open High Low Close" dataset would useful.

The VL example Candlestick Chart hard codes data that is found in an earlier Protovis example found here .

The dataset contains the performance of the Chicago Board Options Exchange Volatility Index (VIX) in the summer of 2009.

I think including this dataset would be especially useful for people in finance.

Possible names: vix.json, vix-ohlc.json or just ohlc.json?

Let me know what your think and I can put together a PR.

build/vega-datasets.min.js is now an iife, breaking require

I noticed that #388 changed build/vega-datasets.min.js to be an IIFE:

{
file: "build/vega-datasets.min.js",
format: "iife",
sourcemap: true,
name: "vegaDatasets",
plugins: [terser()],
},

Whereas build/vega-datasets.js is still a UMD:

{
file: "build/vega-datasets.js",
format: "umd",
sourcemap: true,
name: "vegaDatasets",
},

The problem is that require("vega-datasets") on Observable will use your unpkg/jsdelivr entry point which points to the IIFE, and thus errors:

"unpkg": "build/vega-datasets.min.js",
"jsdelivr": "build/vega-datasets.min.js",

You can see it breaking here:

https://observablehq.com/@vega/vega-lite-input-binding

If you want to drop UMD support, we could fix that notebook (and presumably others) by using ES import instead of require:

world = (await import('vega-datasets')).default['world-110m.json'].url

But if you are supporting IIFE, maybe it’s worth continuing to support UMD for backwards compatibility?

Urls should point to stable source

Right now, the generated URLs point to github, which may change as we change the content of the files in master. Instead, we should generate URLs to jsdelivr and include the version number. This way we never change data without the user knowing.

What is the origin of the barley dataset?

As I continue to refine and expand the Altair example gallery, the barley dataset has become our standby for stacked bar charts.

It would be nice to fill out its sources entry in the same way we did the wheat dataset. Can someone here verify its origin?

Provide a js module

The idea would be that someone can just load vega-datsets and access the datasets (probably with async loading) or their URLs directly.

What is the source of wheat.json?

I'm finding it useful for creating simple bar chart examples in Altair. I'm interested to learn more about where the data comes from.

cars.json contains invalid data

It appears that some invalid data has crept into cars.json. Presumably, this is to highlight Voyager functionality, but it's had the knock-on effect of producing erroneous output with the Vega examples:

Cars Parallel Coords

Perhaps cars.json should contain the clean values, and a secondary duplicate data file introduces the error? Note: I believe this error was also introduced to the Vega test cases, so we'll want to update that too.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.