ipazc / dhub Goto Github PK
View Code? Open in Web Editor NEWCLI for mldatahub, for Python 3
License: GNU General Public License v3.0
CLI for mldatahub, for Python 3
License: GNU General Public License v3.0
When it is specified an option to the filtered iteration, it may iterate over less elements because less elements match the desired options. When this is the case, the iterator keeps iterating until the page count is reached, even when there are no elements matching the options set.
Thus, iterator should stop whenever backend is not returning elements at all.
Save to folder of a dataset is not taking advantage of the iter smart cache.
Given as an example the dataset with url prefix "foo/bar", the dataset should also be retrievable by
providing only the dataset prefix "bar":
>> dataset = datasets["bar"]
by default it should complete the prefix.
The exception requests.exceptions.ChunkedEncodingError or similar while requesting should be caught, in order to transparently repeat the process.
The smart updater is not mixing up multiple updates of the same element. Under certain circunstances, this might lead to a race condition.
Modifications on an element are not reflected on the dataset inmediately.
Example:
>> element = dataset[0]
>> print(element.get_title())
"title1"
>> element.set_title("titlte2")
>> print(element.get_title())
"titlte2"
>> print(dataset[0].get_title())
"title1"
The metadata should be optional, and it should take it from the forked dataset in its absence.
When iterating a dataset with filter_iter(cache_content=True)
, there is an unexpected delay between page_size
requests. Seems that when the whole page_size is processed and the limit is reached, the new page is requested, stopping the run of the application until the next page is retrieved. It should be requesting the next page before reaching the limit to speed up the computations.
Even though that by default it is going to point to our backend, it should be a configurable parameter of a config file.
The sync()
method is not syncing correctly. The following code on a huge append of dataset elements:
>>> print("Syncing...")
>>> dataset.sync()
>>> print("Synced.")
gives this result almost instantly:
Syncing...
Synced.
For example, the following code will break its execution:
>>> dataset[200:]
requests.exceptions.ConnectionError: HTTPConnectionPool(host='', port=): Max retries exceeded with url: /datasets/ipazc-adience/fold1/elements?page=60&_tok=
(Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2bd3875c18>: Failed to establish a new connection: [Errno 110] Connection timed out',))
After this error, the client should automatically retry.
this retrieval will cause a page size exceeded error:
dataset[300:1000]
When it is wanted to iterate over the elements' headers of a dataset rather than the data, data is downloaded too because of the caching system in background. It should have an option to disable this behaviour.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.