Git Product home page Git Product logo

dumptruck's People

Contributors

aidanhs avatar drj11 avatar frabcus avatar matthewhughes avatar morty avatar stevenmaude avatar tlevine avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dumptruck's Issues

vars_table isn't created by default

from dumptruck import DumpTruck

dt = DumpTruck(dbname="test.db")
dt.get_var("test")

sqlite3.OperationalError: no such table: _dumptruckvars

Save sets in dumptruck

I want to be able to save sets as a variable in dumptruck, like I can lists and dicts.

Tests are failing

======================================================================
ERROR: test_list (__main__.TestSaveGetPickle)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./tests.py", line 257, in test_list
    self.savegetvar(Pickle({30: None}))
  File "./tests.py", line 247, in savegetvar
    h.save_var(u'weird', var)
  File "/home/paul/scraperwiki/repo/dumptruck/dumptruck/dumptruck.py", line 344, in save_var
    self.execute(u'INSERT INTO %s (`value`) VALUES (?)' % tmp, [value], commit = False)
  File "/home/paul/scraperwiki/repo/dumptruck/dumptruck/dumptruck.py", line 138, in execute
    raise self.sqlite3.InterfaceError(unicode(msg) + '\nTry converting types or pickling.')
InterfaceError: Error binding parameter 0 - probably unsupported type.
Try converting types or pickling.

======================================================================
ERROR: test_zeroes (__main__.TestSaveLong1)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./tests.py", line 468, in test_zeroes
    self.save_and_select(100000000000000000000000000000000)
  File "./tests.py", line 423, in save_and_select
    dt.insert({'foo': d})
  File "/home/paul/scraperwiki/repo/dumptruck/dumptruck/dumptruck.py", line 284, in insert
    self.execute(sql, values, commit=False)
  File "/home/paul/scraperwiki/repo/dumptruck/dumptruck/dumptruck.py", line 136, in execute
    self.cursor.execute(sql, *args)
OverflowError: long too big to convert

======================================================================
ERROR: test_select (__main__.TestSelect)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./tests.py", line 337, in test_select
    shutil.copy(u'fixtures/landbank_branches.sqlite', u'.')
  File "/usr/lib/python2.7/shutil.py", line 117, in copy
    copyfile(src, dst)
  File "/usr/lib/python2.7/shutil.py", line 82, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: u'fixtures/landbank_branches.sqlite'

======================================================================
ERROR: test_show_tables (__main__.TestShowTables)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./tests.py", line 346, in test_show_tables
    shutil.copy('fixtures/landbank_branches.sqlite','/tmp/test.db')
  File "/usr/lib/python2.7/shutil.py", line 117, in copy
    copyfile(src, dst)
  File "/usr/lib/python2.7/shutil.py", line 82, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: 'fixtures/landbank_branches.sqlite'

======================================================================
FAIL: test_adapt_list (__main__.TestAdaptersAndConverters)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./tests.py", line 98, in test_adapt_list
    self.assertListEqual(dt.dump('foo'), [OrderedDict([('bar', [3,5])])])
AssertionError: Lists differ: [OrderedDict([(u'bar', u'[3,5]... != [OrderedDict([('bar', [3, 5])]...

First differing element 0:
OrderedDict([(u'bar', u'[3,5]')])
OrderedDict([('bar', [3, 5])])

- [OrderedDict([(u'bar', u'[3,5]')])]
?                -       --     -

+ [OrderedDict([('bar', [3, 5])])]
?                          +


======================================================================
FAIL: test_few_converters (__main__.TestNoAdaptersAndConverters)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./tests.py", line 114, in test_few_converters
    self.assertLess(len(dt.sqlite3.converters), 5)
AssertionError: 7 not less than 5

======================================================================
FAIL: test_lambdas (__main__.TestNoAdaptersAndConverters)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./tests.py", line 119, in test_lambdas
    self.assertEqual(f.__name__, '<lambda>')
AssertionError: 'convert_jsonset' != '<lambda>'

======================================================================
FAIL: test_special_type_list (__main__.TestNoAdaptersAndConverters)
Adapters and converters should not be enabled.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./tests.py", line 128, in test_special_type_list
    self.assertListEqual(observedData, [{u"week": u"[12,3,4]"}])
AssertionError: Lists differ: [OrderedDict([(u'week', [12, 3... != [{u'week': u'[12,3,4]'}]

First differing element 0:
OrderedDict([(u'week', [12, 3, 4])])
{u'week': u'[12,3,4]'}

- [OrderedDict([(u'week', [12, 3, 4])])]
+ [{u'week': u'[12,3,4]'}]

----------------------------------------------------------------------
Ran 80 tests in 16.240s

FAILED (failures=4, errors=4)

What's the policy on Unicode keys?

If I use UTF-8 characters as a key:

https://gist.github.com/2489849

Then I get this error:

Traceback (most recent call last):
File "./test.py", line 7, in
dt.insert({"name": "Super Digger", "payload": 10, "่‹ฑๅ›ฝ": "yes"})
File "/Library/Python/2.7/site-packages/dumptruck/dumptruck.py", line 199, in insert
self.create_table(data, table_name)
File "/Library/Python/2.7/site-packages/dumptruck/dumptruck.py", line 186, in create_table
);""" % (quote(table_name), quote(k), get_column_type(startdata[k])), commit = False)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 47: ordinal not in range(128)

It should probably just work. Failing that, it should give a more useful error.

Likewise, other unicode everywhere.

Does not support column names with square brackets

Discovered by @IanHopkinson over here: scraperwiki/datatables-view-tool#48

Given a database created with the following SQL commands:

CREATE TABLE data("hello[1]", "hello[^]", "goodbye");
INSERT INTO data VALUES ("first", "london", "berlin");
INSERT INTO data VALUES ("second", null, null);

Querying the database directly with the scraperwiki-python library:

import scraperwiki
print scraperwiki.sql.select('* from data')

Returns the incorrect column names:

[OrderedDict([
  (u'hello', u'london'), 
  (u'goodbye', u'berlin')
]), OrderedDict([
  (u'hello', None),
  (u'goodbye', None)
])]

Balks at values that are decimal.Decimal

  File "./main.py", line 381, in <module>                                   
    main()                                                                  
  File "./main.py", line 70, in main                                        
    b.push(row)                                                             
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__             
    self.gen.next()                                                         
  File "/home/sm/Dev/itn_ocado/src/dshelpers/dshelpers.py", line 42, in batch_processor
    processor.flush()                                                       
  File "/home/sm/Dev/itn_ocado/src/dshelpers/dshelpers.py", line 65, in flush
    self.callback(self.queue)                                               
  File "./main.py", line 327, in save_rows                                  
    data=rows)                                                              
  File "/home/sm/Dev/itn_ocado/local/lib/python2.7/site-packages/scraperwiki/sqlite.py", line 33, in save
    dt.create_table(data, table_name = table_name, error_if_exists = False) 
  File "/home/sm/Dev/itn_ocado/local/lib/python2.7/site-packages/dumptruck/dumptruck.py", line 232, in create_table
    self.__check_and_add_columns(table_name, row)                           
  File "/home/sm/Dev/itn_ocado/local/lib/python2.7/site-packages/dumptruck/dumptruck.py", line 179, in __check_and_add_columns
    column_type = get_column_type(value)                                    
  File "/home/sm/Dev/itn_ocado/local/lib/python2.7/site-packages/dumptruck/dumptruck.py", line 64, in get_column_type
    return u'pickle text' if isinstance(obj, Pickle) else PYTHON_SQLITE_TYPE_MAP[type(obj)]
KeyError: <class 'decimal.Decimal'> 

Changed to string; worked fine.

create_table doesn't create columns with a null value

h = DumpTruck(dbname = '/tmp/test.db')
h.create_table({'foo': 0, 'bar': 1, 'baz': 2, 'jam': None}, 'zombies')

The column 'jam' does not get created, which is silly. This is used by scraperwiki-python (scraperwiki.sqlite.save) which is bizarre behaviour.

The responsible file is convert.py:

# Delete nones
for key, value in row.items():
  if value == None:
    del(row[key])

Because this convert function is used by other dumptruck calls, I'm not currently sure of the impact of removing this section.

Specifying column order without SQL when inserting

When I am concerned about column order, I want to be able to specify the column order by passing a collection.OrderedDict to DumpTruck.create_table or DumpTruck.insert.

I can do this with a CREATE TABLE statement, but I'd like to be able to do it in Python.

Doesn't work with timezones in datetimes

print dt.execute('select * from twitter limit 1')
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/dumptruck/dumptruck.py", line 112, in execute
self.cursor.execute(sql, *args)
File "/usr/local/lib/python2.7/dist-packages/dumptruck/adapters_and_converters.py", line 93, in convert_datetime
hours, minutes, seconds = map(int, timepart_full[0].split(':'))
ValueError: invalid literal for int() with base 10: '42+00'

This is a column with a value like '2012-09-22 07:49:42+00:00'

DumpTruck blasphemously uses dicti

People whose religion is Python, need dicti to be removed from DumpTruck because dicti is not Pythonic and is thus against their religion.

The SQL case-insensitivity can instead be handled on insert by checking that there aren't duplicates when case is ignored.

Can't pass lxml.html strings to DB - related to scraperwiki_local

Scraperwiki classic lets you say:

data['link'] = root.xpath("//a/@href")[0]

and then save data directly to the database.

This doesn't work in dumptruck / scraperwiki_local because it's not a string, it's a lxml.etree._ElementStringResult

Traceback (most recent call last): File "insolv.py", line 52, in <module> last = doindex(letter=letter, page = i) File "insolv.py", line 44, in doindex scraperwiki.sqlite.save(table_name = 'list', data = builder, unique_keys=['link']) File "/usr/local/lib/python2.7/dist-packages/scraperwiki/sqlite.py", line 27, in save dt.create_table(data, table_name = table_name, error_if_exists = False) File "/usr/local/lib/python2.7/dist-packages/dumptruck/dumptruck.py", line 195, in create_table );''' % (if_not_exists, quote(table_name), quote(k), get_column_type(startdata[k])) File "/usr/local/lib/python2.7/dist-packages/dumptruck/dumptruck.py", line 51, in get_column_type return u'pickle text' if isinstance(obj, Pickle) else PYTHON_SQLITE_TYPE_MAP[type(obj)] KeyError: <class 'lxml.etree._ElementStringResult'>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.