Own collection of links to various datasets
Data sets for machine learning in Python (github repository)
ML-friendly Public Datasets (article on Kaggle)
Public Data (List of Data Marketplaces):
UC Irvine Machine Learning Repository:
Opendata by Socrata
Data from Publications:
KDnuggets datasets:
Networks Datasets:
Link to "Open Data" provided in the comments of ResearchGate.Net:
SNAP - Stanford Network Analysis Project:
- http://snap.stanford.edu/data/
- http://snap.stanford.edu/data/links.html
- http://snap.stanford.edu/data/other.html
The daily news cycle [9*~2.0 GB]:
S5 - A Labeled Anomaly Detection Dataset [~16M]
GeoLife GPS Trajectories [298.66 MB]
T-Drive trajectory data sample [~140 MB]
Public Data Sets on AWS:
Wikipedia Revision History [314 millions of rows]:
Data for Data Scientists:
Linked Open Data by NY Times[RDF]:
WikiData
DBpedia
Global Database of Events, Language, and Tone (GDELT)
PSLC DataShop in Pittsburgh
Labeled Faces in the Wild[images of faces]
Mobile Data Challenge (MDC) Dataset:
Machine Learning Datasets Repository:
Хаб открытых данных на русском языке:
US Open Data Action Plan and Datasets:
70+ websites to get large data repositories for free:
http://www.hpi.uni-potsdam.de/naumann/projekte/repeatability/datasets.html
StatLib---Datasets Archive
100+ Interesting Data Sets for Statistics
Big Data: 20 Free Big Data Sources Everyone Should Know
Find and Use Data
CKAN, the world’s leading open-source data portal platform
InfoChimps
Repositories of datasets
Quantnet - A Database-Driven Online Repository of Scientific Information
A community-curated database of well-known people, places, and things
Answer on Quora "Where can I find large datasets open to the public?":
Enron Email Dataset (Famous Public E-mails Dataset)
GENESIS-Online Database (Statistisches Bundesamt)
Microsoft
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents
Awesome Public Datasets (GitHub)
LETOR: Learning to Rank for Information Retrieval
Miscellaneous
Energy
- Solar Measurement Grid Oahu, Hawaii by NREL
- UK Domestic Appliance-Level Electricity (UK-DALE) dataset
- Github repository - https://github.com/JackKelly/UK-DALE_metadata
- Open PV Dataset (Tracking the Sun, Berkeley)
- Energy efficiency Data Set from UCI
- PLAID: the Plug Load Appliance Identification Dataset
Water