Comments (4)
Seems like this has fallen by the wayside, but, should it ever come back to development: it would be cool if there were some elementary statistics for each of the datasets. Like how many rows of data, the names of the columns, the types of those columns, etc. Basically the same collection of things that kaggle lists for lots of the datasets on there
from vega-datasets.
It should be great to include the license of the source files as well.
from vega-datasets.
it would be cool if there were some elementary statistics for each of the datasets
I think there are at least 2 components that this issue could be split up into
- Convert the
SOURCES.md
file into something machine readable, like a JSON file, or a folder of YAML files. We could adopt a process similar to what "awesome public datasets" ( https://github.com/awesomedata/awesome-public-datasets ) or "campusdata" did in the past: https://github.com/CampusData/campusdata.github.io/blob/master/_data/rankings.yml . - Add metadata about each sample file in the repo. Perhaps we might keep a script around that programmatically generates this info, and stores it. This way you can do things like query for a dataset with at least 1 datetime column, or a dataset with at least 3 quantitative columns and over 3000 rows.
In the meantime, there are at least 2 peer projects that can fulfill some of the data exploration usecases for the single file data requests
- https://github.com/pandas-profiling/pandas-profiling
- https://github.com/githubocto/flat-viewer (Take any of the file URLs, and add
flat
in front, like https://flatgithub.com/vega/vega-datasets/blob/next/data/birdstrikes.csv?filename=data%2Fairports.csv&sha=05fcb7c07b1d76206856e75129fc1e79dc61735c )
from vega-datasets.
world-110m.json looks like it could be from https://www.jsdelivr.com/package/npm/world-atlas?version=1.1.4&path=world (https://github.com/topojson/world-atlas).
from vega-datasets.
Related Issues (20)
- Movies release dates off by 100 years HOT 2
- Add OHLC Data HOT 7
- add vega-datasets JS notebook to docs ...
- Update sf-temps HOT 1
- Clean up for 2.0 HOT 6
- Update to 2017 Census HOT 8
- Urls should point to stable source
- Add penguin data
- Birdstrikes dataset missing HOT 2
- Is their a license for this dataset? HOT 1
- Can not load earthquakes dataset HOT 1
- build/vega-datasets.min.js is now an iife, breaking require HOT 3
- Published build artifacts have the wrong version (2.5.0 instead of 2.5.1) HOT 5
- 7 datasets that cannot be loaded HOT 1
- movies.json Release Date is sometimes in the future HOT 3
- CSV parser treats header rows as data
- An in-range update of rollup is breaking the build 🚨 HOT 1
- Add one dataset for the sports fans HOT 12
- An in-range update of rollup is breaking the build 🚨 HOT 1
- An in-range update of rollup is breaking the build 🚨 HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vega-datasets.