datopian / datahub-qa Goto Github PK
View Code? Open in Web Editor NEW:package: Bugs, issues and suggestions for datahub.io
Home Page: https://datahub.io/
:package: Bugs, issues and suggestions for datahub.io
Home Page: https://datahub.io/
Hello,
I am just testing / exploring what can be achieved with the datahub.io CLI tool and I have stumbled over the following:
With the following minimal data package:
{
"resources": [
{
"name": "test-resource",
"path": [ "myfile1.csv", "myfile2.csv" ]
}
]
}
and the following folder structure:
./
├── datapackage.json
├── myfile1.csv
└── myfile2.csv
The CLI tool provides the following:
$ data info datapackage.json
> Error! path_.replace is not a function
I get the same error even if path
is an array with a single item (but not if a single string path
is provided). This suggests the problem is with the array.
To be fair, I am very new to the frictionless spec, but I think that the above resource descriptor is valid... But the datahub.io tool doesn't seem to like it.
I think it is fine if datahub.io only handles a single (or, specifically non-array) url-or-path
but it would be useful if this was documented somewhere (or otherwise gracefully handled).
At the moment, it seems neither data push file.csv
nor data init file.csv
works when the csv file is semicolon separated (as is common in France, for example).
At the moment, we support scheduling in the following way: every 90s
, every 5m
, every 2d
… number is always an int, selector is s/m/h/d/w
(second -> week) and you can’t schedule for less than 60 seconds.
From @zelima :
I see we do not support monthly and annual schedules. Also, I was thinking, that some datasets need an annual update and if I want them to be updated every first of January I should push them on that exact date. Maybe we should support "starting date" at some point - would be useful for that case. Eg, one that are updated monthly you need them to be updated exactly on first of each month.
As discussed on gitter, the version of datapackage.json included in zip files downloaded from datahub.io lack basic metadata like title, license, description.
See a diff here between https://pkgstore.datahub.io/core/registry/6/datapackage.json (left hand side) and the version included in the zip file downloaded (right hand side): http://www.mergely.com/Xo9fTlfZ/
(Sorry for editing your initial message)
Issue is fixed on assembler side datopian/assembler#81
But now we need to redeploy all core datasets to put the metadata on place.
http://datahub.io/search and http://datahub.io/blog show me the first few datasets / blog entries but no way to go to the next page to see more data sets / older blog posts.
The current config file has a data structure designed around a single account on datahub. I have just started interacting with the system as a user and I already want two accounts (a pseudo org account, and my own account).
Many config files for CLIs support this elegantly. I suggest taking a good look at the gcloud
CLI and the aws
CLI in terms of user experience if running commands with different users, and the config file itself.
See https://datahub.io/joelgombin/ville_vitry_subventions_2017: the title of the page - which was passed by the CLI tool - has encoding issue, whereas the previewing of the CSV resource is fine.
I have a remote data somewhere (URL) and I want to create flow.yaml
in a way so I it scrapes data from that URL and does some processing steps (e.g., removes some rows). Although I know this functionality is provided by DataHub, I cannot find any guidelines. It would be very useful, if there is a tutorial or simple set of steps of what should I do.
Support inline resource data on data packages.
My question is how common a use case this is ...
It may be a little bit of a pain i suspect because i'm not sure how data package pipelines handles this ...
/cc @pwalsh
The datasets list on my profile is not linkable to the actual dataset(s).
The only place I can navigate to a dataset is from my event stream in the left-hand column.
Pipeline passed successfully, but there is no preview table for this dataset http://datahub.io/JohnSnowLabs/nys-mathematics-exam.
In console log Uncaught (in promise) TypeError: Cannot read property 'unique' of undefined
geoip2-ipv4
core dataset has not passed pipeline.
File "/usr/local/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8e in position 4681: invalid start byte
On datahub: http://datahub.io/core/geoip2-ipv4
I can't simply navigate from anywhere to my profile page. Clicking my avatar does nothing (goes to https://datahub.io/#
).
From https://datahub.io/core/s-and-p-500-companies please see https://datahub.io/core/s-and-p-500-companies/r/constituents.json
The JSON link returns CSV instead of JSON (see below)
Thanks
Symbol,Name,Sector
MMM,3M Company,Industrials
ABT,Abbott Laboratories,Health Care
ABBV,AbbVie,Health Care
ACN,Accenture plc,Information Technology
ATVI,Activision Blizzard,Information Technology
AYI,Acuity Brands Inc,Industrials
ADBE,Adobe Systems Inc,Information Technology
AAP,Advance Auto Parts,Consumer Discretionary
AES,AES Corp,Utilities
AET,Aetna Inc,Health Care
AMG,Affiliated Managers Group Inc,Financials
$ data info ./selected-crimes-local-authorities-2012-2015/
# sources/selected-crimes-local-authorities-2012-2015-*
Collection of data about Israeli Police events by local authorities and collection of selected crimes.
Data source: ... see more below
# RESOURCES
┌───────────────────────────┬────────┐
│ Name │ Format │
├───────────────────────────┼────────┤
│ selected_crimes_2012_2015 │ csv │
└───────────────────────────┴────────┘
$ ls -sh selected-crimes-local-authorities-2012-2015/data/
total 78M
78M selected_crimes_2012_2015.csv
E.g. if the csv file title is ville_vitry_Subventions_2017_comma_csv
it gives an error: https://datahub.io/joelgombin/test/pipelines
This EPIC contains all the issues related to data push
command
I can't log out of datahub.io - not in the browser, and not via a CLI command.
I cannot use data from the datahub following the instruction
Go to any dataset page on the datahub.io and try to use the dataset due to instructions, e.g. http://datahub.io/core/cofog#python
Relates to #26
As I can't logout of datahub.io, I tried to force it by revoking access to my GitHub account.
After revoking access, I am still logged in, and worse, I can still push data from the CLI.
There are no browser cookies that I can clear to forcibly flush my session either.
I am not sure if there is a potential security issue here, somewhere, or, if it is just breaking an implicit contract of trust to let me sign out.
There are a few links from the recline js demo page that link to the explorer.datahub.io pages.
I'm a first time user of the current CLI. After login, I want to just push something.
mkdir test
cd test
touch test.txt
data push
> Error! ENOENT: no such file or directory, open '/Users/pwalsh/test/datapackage.json'
I'm confused because the CLI tells me the following about data push
:
[path] Push data at
path
to the DataHub
I could be a user who does not know what datapackage.json
is.
As someone involved in the development of our Frictionless Data software, I know we have an infer method for Data Packages, so I wonder why we are not using it.
May I suggest we use ~/config
and namespace under there:
~.config/datahub.io/config.json
This is a convention, and many CLI's I use follow it (Digital Ocean, Heroku).
In general I'm -1 on ~/.datahub.json
As a user, who is reading the docs, I need an ability to give a feedback about what i'm reading immediately, without need to leave the page, So I can send my ideas&proposals quickly and being focused on my tasks.
This example may help:
https://djbook.ru/rel1.9/intro/tutorial01.html - user can leave a comment just by clicking on the field left from the text.
Wonder if there is an issue in utf8 for csv files output.
See Åland islands item on:
https://datahub.io/core/country-list#data
Fine in page but on here it looks weird:
https://pkgstore.datahub.io/core/country-list:data_csv/data/data_csv.csv
My guess is that it is correct in the actual file but that we may not be setting utf8 as encoding when showing file from pkgstore. This may be a non-issue btw (I'm not sure anyone cares about looking at version on pkgstore).
Currently, data get
acquires the html page for a dataset
Expected Behaviour: data get
should get the datapackage.zip of the dataset
Steps to reproduce this error:
Alis-MBP-3:~ alinaqvi$ data get http://datahub.io/core/co2-ppm
Time elapsed: 0.64 s
Dataset/file is saved in "co2-ppm."
Alis-MBP-3:~ alinaqvi$ data --version
0.6.3
Alis-MBP-3:~ alinaqvi$ file co2-ppm.
co2-ppm.: HTML document text, UTF-8 Unicode text, with very long lines
Alis-MBP-3:~ alinaqvi$ mv co2-ppm. co2-ppm.html
Alis-MBP-3:~ alinaqvi$ open co2-ppm.html
which shows:
https://www.dropbox.com/s/abu72lhip7yyvln/Screenshot%202018-01-16%2017.22.28.png?dl=0
So data get only obtains the html page of the dataset
As a non-experienced user of datahub I want to know how to handle and/or prepare data sets, so I could make data clean and clear.
As Rufus I want to push data packages to e.g. github and have a webhook so that auto-triggers an import to the datahub so that my datahub dataset is up to date (example: https://github.com/datasets/registry).
What i want is a webhook and github support for that ...
I have this on disk:
test/test.txt
test/datapackage.json
test.txt is an empty file.
datapackage.json is:
{
"name": "stuff",
"resources": [
{ "name": "stuff-resource", "data": [1,2,3] }
]
}
This is a valid Data Package. It passes data validate
.
I run data push
and get:
pwalsh:test pwalsh$ data push
> Error! [object Promise]
I have source data on disk with my Data Package. I refer to it with sources
.
As a non-experienced user of datahub I want to know how to handle and/or prepare data sets, so I could make data clean and clear.
Table:
Feature name | supported by automation pipelines
When trying to fetch https://datahub.io/core/country-codes/r/country-codes.json
I get this error:
Cannot GET /core/country-codes/r/data/json/data/country-codes.json
I think this msg started to pop up today.
Many potential users come from a machine learning context and may be interested in sample machine learning datasets so let's get some up on the DataHub.
See also openml/OpenML#482
Currently datahub.io supports two flavours: a simple spec, and Vega (v.2.x).
The simple spec is suggested to cover the 80/20 usecase, but with real-world data, I've found it not to be as useful for most cases. The data almost always needs some form of aggregation, transformation or graph specific tweaks like human readable labels in place of data keys.
On the other hand, Vega is very powerful, but the spec is harder to write by hand. There are few tools available to help iterate while developing a spec document and resulting visualisation; iterating by making changes and pushing to datahub.io isn't efficient.
Middle ground support for Vega-lite would be very desirable to help bridge the gap between the too-simple 'simple' spec, and the much more powerful, but complex Vega spec.
The problem of working with Vega is further compounded by datahub.io using v2.x, rather than the more recent v3. Online documentation and tools are centered around the newer version. For example, there is an online editor provided by Vega, that will take Vega-Lite, and 'compile' it to Vega, but this transpilation isn't supported by the Vega 2 compatible version of the editor (https://vega.github.io/vega-editor/).
Files in the pkg store are supposed to have CORS support turned on so that cross origin http requests work from javascript. However, it looks like this is not working atm which is breaking this site for example: http://rufuspollock.github.io/imf-weo/
I have an invalid datapackage.json
. I try to use data push
.
I get the message > Error! Unexpected end of JSON input
.
I happen to know as a developer that this error is raised from the method that validates the descriptor. Even without fixing the messages that get thrown by our use of JSON Schema validators on our descriptors, the user experience could be greatly improved by showing the user the context of this error.
Example:
> Running Data Package Validation Step on datapackage.json
> Error! Unexpected end of JSON input
Then I would at least know where the error comes from.
At the moment, for "simple" view spec we support line and vertical bar charts. Although these two types are popular and cover most of use-cases, it would be useful to have horizontal bar charts. E.g., consider this example: https://datahub.io/core/gini-index. In the second graph, we have too many countries in x axis so it is not possible to show names for all of them as we have limited width. In such situations, horizontal bar charts could be useful as we're not limited in height.
Currently, if you try and push excel file and define non-existing sheet it will not fail and push data from the first sheet of excel file.
Another problem is that if you select even existing sheet, it'd push the first one, which means we're not able to push a sheet other than first one.
Try to push excel file with 2 sheets named sheet1
and sheet2
. The following command will push the data from the first sheet
data push myexcel.xlsx --sheets=3
data push myexcel.xlsx --sheets=2
data push myexcel.xlsx --sheets=1,2
data push myexcel.xlsx --sheets=all
data-cli
The following datasets can't be downloaded
Airports
https://datahub.io/dataset/global-airports-in-rdf
http://rv1460.1blu.de/datasets/global-airports/global-airports.ttl
SIDER
https://datahub.io/de/dataset/fu-berlin-sider
http://wifo5-03.informatik.uni-mannheim.de/sider/sider_dump.nt.bz2
US security
https://datahub.io/de/dataset/sec-rdfabout
Added a new dataset using
data push https://raw.githubusercontent.com/okfn/licenses/master/licenses.csv --schedule="every 1d"
Returned 🙌 your data is published!
Went to link https://datahub.io/Stephen-Gates/licenses-black-rattlesnake-15
Got Sorry, this dataset was not found.
This EPIC contains all the issues related to data get
command
data get
only acquires the html page for the dataset itself #43npm install data hub-cli
data --version
data info
> Error! ENOENT: no such file or directory, open '/Users/pwalsh/datapackage.json'
When following the instructions at https://datahub.io/docs/getting-started/installing-data I consciously decided to run data info
instead of data info https://datahub.io/core/finance-vix
.
As a user of many CLI tools, I expect to be able to run a command without arguments and get some type of context-driven help, or, at least an error for the missing argument.
The error I do receive is related to a missing configuration step, I suspect, and this is confusing to me, especially because the instructions on this page make no mention of configuration.
I then went to http://datahub.io/docs/features/data-cli which is linked from the above page, and still do not have any idea how to configure the CLI. data help
works, but has no info I am looking for.
I then ran data push
which tells me to log in. This was successful (according to CLI messaging), but then I ran data info
again and got the same original error.
Using Ubuntu 16.04.3 LTS (xenial)
Anuar Ustayev @anuveyatsu 08:56
@ppKrauss @rufuspollock this is explained here http://datahub.io/docs/getting-started/installing-data#installing-binaries the problem is with xdg-open library on Linux
Peter @ppKrauss 09:00
Suggestion: change page http://datahub.io/docs/getting-started/installing-data#installing-binaries to link http://datahub.io/docs/getting-started/installing-data#installing-binaries
Hi @anuveyatsu , I do the cp /usr/bin/xdg-open /usr/local/bin/xdg-open, perhps need reboot. At now no effect, the login stops at prompt, "? Login with...
❯ Github"
Anuar Ustayev @anuveyatsu 09:04
@ppKrauss so after hitting enter, it doesn’t open your default browser?
Peter @ppKrauss 09:04
Thanks @rufuspollock , I will report at there
@anuveyatsu , after wait and ENTER ... "> Opening browser and waiting for you to authenticate online
Error! spawn /home/user/Downloads/working/DATAHUB/xdg-open ENOENT
I tried to run the R code on datahub.io/JohnSnowLabs/community-emergency-response-teams
and received a warning and an error.
library("jsonlite")
json_file <- "http://datahub.io/JohnSnowLabs/community-emergency-response-teams/datapackage.json"
json_data <- fromJSON(paste(readLines(json_file), collapse = ""))
#> Warning in readLines(json_file): incomplete final line found on
#> 'http://datahub.io/JohnSnowLabs/community-emergency-response-teams/
#> datapackage.json'
path_to_file = json_data$resources[[1]]$path
#> Error in json_data$resources[[1]]$path: $ operator is invalid for atomic vectors
Hi in github we have an organisation with some data packages, I would like to publish data with the name of the organisation not my username.
https://datahub.io/organisation_name/dataset
Thank you
Added by @AcckiyGerman
User can use his github organisation name to publish data under it.
E.g. @Mikanebu is a member of https://github.com/datopian so it would be great for him to publish data on http://datahub.io/datapian
Using github oauth scopes https://developer.github.com/apps/building-oauth-apps/scopes-for-oauth-apps/ we probably could read a list of organisations where user is a member; to use it when pushing data | (creating datahub user ?)
Hi Open Data friends,
The Datahub API has been broken for at least 12 days because of a bug in the way in which HTTP redirects are performed. I've posted this issue on the OKFN forum 12 days ago (link), but that issue was closed and I was asked to open a new issue here. So here we go...
The Datahub API uses query parameters in order to retrieve information, but these parameters are currently being removed because the server loses them in redirects. Here is a particular example; notice that the original request URI contains ?id=270a
, but the redirect URI no longer contains it:
$ curl -vL "http://datahub.io/api/action/organization_show?id=270a"
> GET /api/action/organization_show?id=270a HTTP/1.1
> Host: datahub.io
> User-Agent: curl/7.53.1
> Accept: */*
< HTTP/1.1 302 Found
< Date: Sun, 03 Sep 2017 06:25:14 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 73
< Connection: keep-alive
< Set-Cookie: __cfduid=d50eba1741b2be0cdef67ac675b9849e11504419913; expires=Mon, 03-Sep-18 06:25:13 GMT; path=/; domain=.datahub.io; HttpOnly
< X-Powered-By: Express
< Location: https://old.datahub.io/api/action/organization_show
< Vary: Accept
< set-cookie: connect.sid=s%3AadA_LdIs0_XUTekr2yRHpLSNhFwsAQLJ.zddMrtw53pGjwb3WzUks6%2F0WrsHlTOzxPjUA5m20vfs; Path=/; Expires=Sun, 03 Sep 2017 06:26:14 GMT; HttpOnly
< Server: cloudflare-nginx
< CF-RAY: 3986a16da09e2b9a-AMS
> GET /api/action/organization_show HTTP/2
> Host: old.datahub.io
> User-Agent: curl/7.53.1
> Accept: */*
< HTTP/2 409
< date: Sun, 03 Sep 2017 06:25:14 GMT
< content-type: application/json;charset=utf-8
< content-length: 160
< set-cookie: __cfduid=d27b5560cea8ffe0a0e91e8e93553f2f51504419914; expires=Mon, 03-Sep-18 06:25:14 GMT; path=/; domain=.datahub.io; HttpOnly
< cache-control: no-cache
< pragma: no-cache
< server: cloudflare-nginx
< cf-ray: 3986a17058780c2f-AMS
RFC: Feature Idea: Indication of time period covered by a dataset
Thinking about adding metadata about what time period time series datasets cover. What do people think and any suggestions ...?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.