Comments (11)
FIXED! We added new datasets from open-ml at datahub.io/machine-learning and wrote a blog post about ARFF format specifically https://datahub.io/blog/attribute-relation-file-format-arff
from datahub-qa.
You can for example use any data set from OpenML and extract the relevant info to create the data package.
Example in R:
library("OpenML")
eyedat <- getOMLDataSet(data.name = "eeg-eye-state")
str(eyedat)
The same can be done in python, java, ... (whatever you prefer, see OpenML website).
from datahub-qa.
@HeidiSeibold great. Do you have any recommendations on a top-5 "getting started" datasets you'd recommend for us to data package?
from datahub-qa.
There is a collection called OpenML 100 with interesting (classification) data sets. It was collected as a quick way to do benchmarking with OpenML.
Maybe the first 5 of those 😉
from datahub-qa.
@HeidiSeibold thanks - if you have any tips on which ones you'd especially recommend let us know.
@Mikanebu please take a look through these datasets and see how they work and if there any that you think would be good
From my own quick browse i found:
- https://www.openml.org/d/458 - which itself is taken from this original source of 98 datasets http://people.stern.nyu.edu/jsimonof/AnalCatData/ -
http://people.stern.nyu.edu/jsimonof/AnalCatData/Data/ (however i'm not sure how we'd find metadata for each of these)
from datahub-qa.
@rufuspollock ok, I will take a look
from datahub-qa.
Also found https://archive.ics.uci.edu/ml/datasets.html
from datahub-qa.
@HeidiSeibold we've started data packaging some machine learning datasets here http://datahub.io/machine-learning
Any feedback or suggestions for next steps would be very welcome 😄
from datahub-qa.
Cool, thanks 🎉 . So there is at least one data set that is both in your collection as well as on OpenML:
I guess a first step now would be to check:
- which fields in the datapackage correspond to which fields in the ARFF file / XML file.
- if there is any information in the ARFF file / XML file that we cannot get from the data package.
Is the R package for data packages stable yet? That would make it very easy because then we could just convert between the file formats in R, I think.
from datahub-qa.
Awesome, thanks @Branko-Dj! Do you have code for the transformation from ARFF to data package that we could share?
from datahub-qa.
@HeidiSeibold Currently we would do it by transforming arff file to csv by using pre-existing scripts (https://github.com/haloboy777/arfftocsv for example) and then create data package for the csv file.
from datahub-qa.
Related Issues (20)
- [Epic] Improvements for status page
- Data Package viewer / preview tool
- Missing data between old.datahub.io and datahub.io? HOT 2
- too hard to find your own datasets HOT 1
- Redirect not working for https://data.okfn.org/tools/view HOT 1
- Broken link on the home page HOT 1
- Unable to Signup HOT 7
- Text overlap on collections while resizing HOT 1
- Server 521 Error, unable to access site at all HOT 3
- Revision processing fails for personal and organization accounts HOT 1
- Datopian logo is not displayed on home page HOT 1
- GitHub login is not working HOT 1
- Cannot login to the datahub HOT 1
- Resources on datahub not accessible (server timeout) HOT 1
- Visualising Data Using Vega-Lite
- Can't Register with GitHub Authentication HOT 1
- Emails to [email protected] are being rejected HOT 3
- Special Character '/' in Dataset names not being processed correctly HOT 6
- Character '.' (dot) in dataset names is not displayed properly (creating containers) HOT 2
- Encoding issues with `airport-codes_csv.csv` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datahub-qa.