scraperwiki / dumptruck Goto Github PK
View Code? Open in Web Editor NEWPainlessly move data in and out of a SQLite database.
Home Page: http://sensiblecode.io
License: BSD 2-Clause "Simplified" License
Painlessly move data in and out of a SQLite database.
Home Page: http://sensiblecode.io
License: BSD 2-Clause "Simplified" License
The schema is something like this
CREATE TABLE debt (
[City] TEXT NOT NULL,
);
And here's the problem.
In [7]: dt.execute("select City from debt where City = ''")
Out[7]:
[OrderedDict([(u'City', None)]),
OrderedDict([(u'City', None)]),
OrderedDict([(u'City', None)])]
from dumptruck import DumpTruck
dt = DumpTruck(dbname="test.db")
dt.get_var("test")
sqlite3.OperationalError: no such table: _dumptruckvars
I want to be able to save sets as a variable in dumptruck, like I can lists and dicts.
======================================================================
ERROR: test_list (__main__.TestSaveGetPickle)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./tests.py", line 257, in test_list
self.savegetvar(Pickle({30: None}))
File "./tests.py", line 247, in savegetvar
h.save_var(u'weird', var)
File "/home/paul/scraperwiki/repo/dumptruck/dumptruck/dumptruck.py", line 344, in save_var
self.execute(u'INSERT INTO %s (`value`) VALUES (?)' % tmp, [value], commit = False)
File "/home/paul/scraperwiki/repo/dumptruck/dumptruck/dumptruck.py", line 138, in execute
raise self.sqlite3.InterfaceError(unicode(msg) + '\nTry converting types or pickling.')
InterfaceError: Error binding parameter 0 - probably unsupported type.
Try converting types or pickling.
======================================================================
ERROR: test_zeroes (__main__.TestSaveLong1)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./tests.py", line 468, in test_zeroes
self.save_and_select(100000000000000000000000000000000)
File "./tests.py", line 423, in save_and_select
dt.insert({'foo': d})
File "/home/paul/scraperwiki/repo/dumptruck/dumptruck/dumptruck.py", line 284, in insert
self.execute(sql, values, commit=False)
File "/home/paul/scraperwiki/repo/dumptruck/dumptruck/dumptruck.py", line 136, in execute
self.cursor.execute(sql, *args)
OverflowError: long too big to convert
======================================================================
ERROR: test_select (__main__.TestSelect)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./tests.py", line 337, in test_select
shutil.copy(u'fixtures/landbank_branches.sqlite', u'.')
File "/usr/lib/python2.7/shutil.py", line 117, in copy
copyfile(src, dst)
File "/usr/lib/python2.7/shutil.py", line 82, in copyfile
with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: u'fixtures/landbank_branches.sqlite'
======================================================================
ERROR: test_show_tables (__main__.TestShowTables)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./tests.py", line 346, in test_show_tables
shutil.copy('fixtures/landbank_branches.sqlite','/tmp/test.db')
File "/usr/lib/python2.7/shutil.py", line 117, in copy
copyfile(src, dst)
File "/usr/lib/python2.7/shutil.py", line 82, in copyfile
with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: 'fixtures/landbank_branches.sqlite'
======================================================================
FAIL: test_adapt_list (__main__.TestAdaptersAndConverters)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./tests.py", line 98, in test_adapt_list
self.assertListEqual(dt.dump('foo'), [OrderedDict([('bar', [3,5])])])
AssertionError: Lists differ: [OrderedDict([(u'bar', u'[3,5]... != [OrderedDict([('bar', [3, 5])]...
First differing element 0:
OrderedDict([(u'bar', u'[3,5]')])
OrderedDict([('bar', [3, 5])])
- [OrderedDict([(u'bar', u'[3,5]')])]
? - -- -
+ [OrderedDict([('bar', [3, 5])])]
? +
======================================================================
FAIL: test_few_converters (__main__.TestNoAdaptersAndConverters)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./tests.py", line 114, in test_few_converters
self.assertLess(len(dt.sqlite3.converters), 5)
AssertionError: 7 not less than 5
======================================================================
FAIL: test_lambdas (__main__.TestNoAdaptersAndConverters)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./tests.py", line 119, in test_lambdas
self.assertEqual(f.__name__, '<lambda>')
AssertionError: 'convert_jsonset' != '<lambda>'
======================================================================
FAIL: test_special_type_list (__main__.TestNoAdaptersAndConverters)
Adapters and converters should not be enabled.
----------------------------------------------------------------------
Traceback (most recent call last):
File "./tests.py", line 128, in test_special_type_list
self.assertListEqual(observedData, [{u"week": u"[12,3,4]"}])
AssertionError: Lists differ: [OrderedDict([(u'week', [12, 3... != [{u'week': u'[12,3,4]'}]
First differing element 0:
OrderedDict([(u'week', [12, 3, 4])])
{u'week': u'[12,3,4]'}
- [OrderedDict([(u'week', [12, 3, 4])])]
+ [{u'week': u'[12,3,4]'}]
----------------------------------------------------------------------
Ran 80 tests in 16.240s
FAILED (failures=4, errors=4)
If I use UTF-8 characters as a key:
https://gist.github.com/2489849
Then I get this error:
Traceback (most recent call last):
File "./test.py", line 7, in
dt.insert({"name": "Super Digger", "payload": 10, "่ฑๅฝ": "yes"})
File "/Library/Python/2.7/site-packages/dumptruck/dumptruck.py", line 199, in insert
self.create_table(data, table_name)
File "/Library/Python/2.7/site-packages/dumptruck/dumptruck.py", line 186, in create_table
);""" % (quote(table_name), quote(k), get_column_type(startdata[k])), commit = False)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 47: ordinal not in range(128)
It should probably just work. Failing that, it should give a more useful error.
Likewise, other unicode everywhere.
Nose supports parallel testing, so it would make the tests run faster.
SQL isn't as case-sensitive as Python, and this disconnect isn't currently connected very much. For example, this doesn't work too nicely.
dt.insert({'FoO': 'baz'}, 'bar')
dt.execute('select fOo from BaR limit 1')[0]['foo']
Maybe execute could return a list of case-insensitive dictionaries instead of normal dictionaries.
Discovered by @IanHopkinson over here: scraperwiki/datatables-view-tool#48
Given a database created with the following SQL commands:
CREATE TABLE data("hello[1]", "hello[^]", "goodbye");
INSERT INTO data VALUES ("first", "london", "berlin");
INSERT INTO data VALUES ("second", null, null);
Querying the database directly with the scraperwiki-python
library:
import scraperwiki
print scraperwiki.sql.select('* from data')
Returns the incorrect column names:
[OrderedDict([
(u'hello', u'london'),
(u'goodbye', u'berlin')
]), OrderedDict([
(u'hello', None),
(u'goodbye', None)
])]
File "./main.py", line 381, in <module>
main()
File "./main.py", line 70, in main
b.push(row)
File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
File "/home/sm/Dev/itn_ocado/src/dshelpers/dshelpers.py", line 42, in batch_processor
processor.flush()
File "/home/sm/Dev/itn_ocado/src/dshelpers/dshelpers.py", line 65, in flush
self.callback(self.queue)
File "./main.py", line 327, in save_rows
data=rows)
File "/home/sm/Dev/itn_ocado/local/lib/python2.7/site-packages/scraperwiki/sqlite.py", line 33, in save
dt.create_table(data, table_name = table_name, error_if_exists = False)
File "/home/sm/Dev/itn_ocado/local/lib/python2.7/site-packages/dumptruck/dumptruck.py", line 232, in create_table
self.__check_and_add_columns(table_name, row)
File "/home/sm/Dev/itn_ocado/local/lib/python2.7/site-packages/dumptruck/dumptruck.py", line 179, in __check_and_add_columns
column_type = get_column_type(value)
File "/home/sm/Dev/itn_ocado/local/lib/python2.7/site-packages/dumptruck/dumptruck.py", line 64, in get_column_type
return u'pickle text' if isinstance(obj, Pickle) else PYTHON_SQLITE_TYPE_MAP[type(obj)]
KeyError: <class 'decimal.Decimal'>
Changed to string; worked fine.
h = DumpTruck(dbname = '/tmp/test.db')
h.create_table({'foo': 0, 'bar': 1, 'baz': 2, 'jam': None}, 'zombies')
The column 'jam' does not get created, which is silly. This is used by scraperwiki-python (scraperwiki.sqlite.save) which is bizarre behaviour.
The responsible file is convert.py:
# Delete nones
for key, value in row.items():
if value == None:
del(row[key])
Because this convert function is used by other dumptruck calls, I'm not currently sure of the impact of removing this section.
When I am concerned about column order, I want to be able to specify the column order by passing a collection.OrderedDict
to DumpTruck.create_table or DumpTruck.insert.
I can do this with a CREATE TABLE
statement, but I'd like to be able to do it in Python.
print dt.execute('select * from twitter limit 1')
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/dumptruck/dumptruck.py", line 112, in execute
self.cursor.execute(sql, *args)
File "/usr/local/lib/python2.7/dist-packages/dumptruck/adapters_and_converters.py", line 93, in convert_datetime
hours, minutes, seconds = map(int, timepart_full[0].split(':'))
ValueError: invalid literal for int() with base 10: '42+00'
This is a column with a value like '2012-09-22 07:49:42+00:00'
People whose religion is Python, need dicti to be removed from DumpTruck because dicti is not Pythonic and is thus against their religion.
The SQL case-insensitivity can instead be handled on insert by checking that there aren't duplicates when case is ignored.
Scraperwiki classic lets you say:
data['link'] = root.xpath("//a/@href")[0]
and then save data directly to the database.
This doesn't work in dumptruck / scraperwiki_local because it's not a string, it's a lxml.etree._ElementStringResult
Traceback (most recent call last): File "insolv.py", line 52, in <module> last = doindex(letter=letter, page = i) File "insolv.py", line 44, in doindex scraperwiki.sqlite.save(table_name = 'list', data = builder, unique_keys=['link']) File "/usr/local/lib/python2.7/dist-packages/scraperwiki/sqlite.py", line 27, in save dt.create_table(data, table_name = table_name, error_if_exists = False) File "/usr/local/lib/python2.7/dist-packages/dumptruck/dumptruck.py", line 195, in create_table );''' % (if_not_exists, quote(table_name), quote(k), get_column_type(startdata[k])) File "/usr/local/lib/python2.7/dist-packages/dumptruck/dumptruck.py", line 51, in get_column_type return u'pickle text' if isinstance(obj, Pickle) else PYTHON_SQLITE_TYPE_MAP[type(obj)] KeyError: <class 'lxml.etree._ElementStringResult'>
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.