Git Product home page Git Product logo

sqlitedict's Introduction

sqlitedict -- persistent dict, backed by SQLite

GithubActions License

A lightweight wrapper around Python's sqlite3 database with a simple, Pythonic dict-like interface and support for multi-thread access:

Usage

Write

>>> from sqlitedict import SqliteDict
>>> db = SqliteDict("example.sqlite")
>>>
>>> db["1"] = {"name": "first item"}
>>> db["2"] = {"name": "second item"}
>>> db["3"] = {"name": "yet another item"}
>>>
>>> # Commit to save the objects.
>>> db.commit()
>>>
>>> db["4"] = {"name": "yet another item"}
>>> # Oops, forgot to commit here, that object will never be saved.
>>> # Always remember to commit, or enable autocommit with SqliteDict("example.sqlite", autocommit=True)
>>> # Autocommit is off by default for performance.
>>>
>>> db.close()

Read

>>> from sqlitedict import SqliteDict
>>> db = SqliteDict("example.sqlite")
>>>
>>> print("There are %d items in the database" % len(db))
There are 3 items in the database
>>>
>>> # Standard dict interface. items() values() keys() etc...
>>> for key, item in db.items():
...     print("%s=%s" % (key, item))
1={'name': 'first item'}
2={'name': 'second item'}
3={'name': 'yet another item'}
>>>
>>> db.close()

Efficiency

By default, sqlitedict's exception handling favors verbosity over efficiency. It extracts and outputs the outer exception stack to the error logs. If you favor efficiency, then initialize the DB with outer_stack=False.

>>> from sqlitedict import SqliteDict
>>> db = SqliteDict("example.sqlite", outer_stack=False)  # True is the default
>>> db[1]
{'name': 'first item'}

Context Manager

>>> from sqlitedict import SqliteDict
>>>
>>> # The database is automatically closed when leaving the with section.
>>> # Uncommitted objects are not saved on close. REMEMBER TO COMMIT!
>>>
>>> with SqliteDict("example.sqlite") as db:
...     print("There are %d items in the database" % len(db))
There are 3 items in the database

Tables

A database file can store multiple tables. A default table is used when no table name is specified.

Note: Writes are serialized, having multiple tables does not improve performance.

>>> from sqlitedict import SqliteDict
>>>
>>> products = SqliteDict("example.sqlite", tablename="product", autocommit=True)
>>> manufacturers = SqliteDict("example.sqlite", tablename="manufacturer", autocommit=True)
>>>
>>> products["1"] = {"name": "first item",  "manufacturer_id": "1"}
>>> products["2"] = {"name": "second item", "manufacturer_id": "1"}
>>>
>>> manufacturers["1"] = {"manufacturer_name": "afactory", "location": "US"}
>>> manufacturers["2"] = {"manufacturer_name": "anotherfactory", "location": "UK"}
>>>
>>> tables = products.get_tablenames('example.sqlite')
>>> print(tables)
['unnamed', 'product', 'manufacturer']
>>>
>>> products.close()
>>> manufacturers.close()

In case you're wondering, the unnamed table comes from the previous examples, where we did not specify a table name.

Serialization

Keys are strings. Values are any serializeable object.

By default Pickle is used internally to (de)serialize the values.

It's possible to use a custom (de)serializer, notably for JSON and for compression.

>>> # Use JSON instead of pickle
>>> import json
>>> with SqliteDict("example.sqlite", encode=json.dumps, decode=json.loads) as mydict:
...     pass
>>>
>>> # Apply zlib compression after pickling
>>> import zlib, pickle, sqlite3
>>>
>>> def my_encode(obj):
...     return sqlite3.Binary(zlib.compress(pickle.dumps(obj, pickle.HIGHEST_PROTOCOL)))
>>>
>>> def my_decode(obj):
...     return pickle.loads(zlib.decompress(bytes(obj)))
>>>
>>> with SqliteDict("example.sqlite", encode=my_encode, decode=my_decode) as mydict:
...     pass

It's also possible to use a custom (de)serializer for keys to allow non-string keys.

>>> # Use key encoding instead of default string keys only
>>> from sqlitedict import encode_key, decode_key
>>> with SqliteDict("example.sqlite", encode_key=encode_key, decode_key=decode_key) as mydict:
...     pass

More

Functions are well documented, see docstrings directly in sqlitedict.py or call help(sqlitedict).

Beware: because of Python semantics, sqlitedict cannot know when a mutable SqliteDict-backed entry was modified in RAM. You'll need to explicitly assign the mutated object back to SqliteDict:

>>> from sqlitedict import SqliteDict
>>> db = SqliteDict("example.sqlite")
>>> db["colors"] = {"red": (255, 0, 0)}
>>> db.commit()
>>>
>>> colors = db["colors"]
>>> colors["blue"] = (0, 0, 255) # sqlite DB not updated here!
>>> db["colors"] = colors  # now updated
>>>
>>> db.commit() # remember to commit (or set autocommit)
>>> db.close()

Features

  • Values can be any picklable objects (uses pickle with the highest protocol).

  • Support for multiple tables (=dicts) living in the same database file.

  • Support for access from multiple threads to the same connection (needed by e.g. Pyro). Vanilla sqlite3 gives you ProgrammingError: SQLite objects created in a thread can only be used in that same thread.

    Concurrent requests are still serialized internally, so this "multithreaded support" doesn't give you any performance benefits. It is a work-around for sqlite limitations in Python.

  • Support for custom serialization or compression:

# use JSON instead of pickle
>>> import json
>>> mydict = SqliteDict('./my_db.sqlite', encode=json.dumps, decode=json.loads)

# apply zlib compression after pickling
>>> import zlib, pickle, sqlite3
>>> def my_encode(obj):
...     return sqlite3.Binary(zlib.compress(pickle.dumps(obj, pickle.HIGHEST_PROTOCOL)))
>>> def my_decode(obj):
...     return pickle.loads(zlib.decompress(bytes(obj)))
>>> mydict = SqliteDict('./my_db.sqlite', encode=my_encode, decode=my_decode)
  • sqlite is efficient and can work effectively with large databases (multi gigabytes), not limited by memory.
  • sqlitedict is mostly a thin wrapper around sqlite.
  • items() keys() values() are iterating one by one, the rows are loaded in a worker thread and queued in memory.
  • len() is calling sqlite to count rows, that is scanning the whole table.
  • For better performance, write objects in batch and commit() once.

Installation

The module has no dependencies beyond Python itself. The minimum supported Python version is 3.7, continuously tested on Python 3.7, 3.8, 3.9, and 3.10 on Travis.

Install or upgrade with:

pip install -U sqlitedict

or from the source tar.gz:

python setup.py install

Contributions

Testing

Install:

$ pip install pytest coverage pytest-coverage

To perform all tests:

$ mkdir -p tests/db
$ pytest tests
$ python -m doctest README.rst

To perform all tests with coverage:

$ pytest tests --cov=sqlitedict

Comments, bug reports

sqlitedict resides on github. You can file issues or pull requests there.

License

sqlitedict is open source software released under the Apache 2.0 license. Copyright (c) 2011-now Radim Řehůřek and contributors.

Housekeeping

Clean up the test database to keep each doctest run idempotent:

>>> import os
>>> if __name__ == '__main__':
...     os.unlink('example.sqlite')

sqlitedict's People

Contributors

and34 avatar biostu24 avatar bit avatar cologler avatar da4089 avatar digenis avatar ecederstrand avatar edwardbetts avatar endlisnis avatar ergoithz avatar hholst80 avatar jquast avatar jtatum avatar menshikh-iv avatar morotti avatar mpenkov avatar nrhinehart avatar ownport avatar piskvorky avatar plague006 avatar pombredanne avatar rdyro avatar shoyu63 avatar thechief389 avatar tirkarthi avatar tmylk avatar toddrme2178 avatar transfluxus avatar vcalv avatar vrescobar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sqlitedict's Issues

Multi thread example

I trigger 10 thread of workers that fill the userDb, without any locking mechanisem

def worker(name, value):
    userDb[name] = value

def sync():
    userDb.commit()

userDb = SqliteDict('./usernames.sqlite', encode=json.dumps, decode=json.loads, autocommit=False)

I also triger sync thread every 5 minutes, is it safe to do so ?

sqlitedict freezes when there is Error in one of the threads

When I am trying to access sqlite via sqlitedict with some unsupported type whole system freezes and it's even impossible to break it with ctrl + c.

It is happening with following stack trace:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 813, in __bootstrap_inner
    self.run()
  File "/home/hadoop/.vew/iab/lib/python2.7/site-packages/sqlitedict.py", line 270, in run
    cursor.execute(req, arg)
InterfaceError: Error binding parameter 0 - probably unsupported type.  

It seems that there is thrown an Error in one thread, but other threads keeps working and probably hangs in the system.

Key ambiguity

When using integer keys, they are converted to strings. Therefore:

mysqlitedict['1'] = 'ONE'
mysqlitedict[1] = 1

all refer to the same thing.

SqliteDict memory issue

Building LSI model from SqliteDict corpus loads the whole SqliteDict into memory.

How to simulate:

from sqlitedict import SqliteDict
from gensim.models import TfidfModel, LsiModel
from gensim.models.lsimodel import Projection
from gensim.utils import grouper
from gensim import matutils

s = SqliteDict('documents.sqlite')  # contains namedtuples with field `tokens` containing unicode token tuple
tfidf = TfidfModel.load('tfidf_model.pkl')  # trained on a dictionary built from `s`


class SegmentCorpus(object):
    def __init__(self, storage, built_dictionary):
        self.storage = storage
        self.inner_dictionary = built_dictionary

    def __iter__(self):
        for segment in self.storage.itervalues():
            yield self.inner_dictionary.doc2bow(segment.tokens)


corpus_tfidf = tfidf[SegmentCorpus(s, tfidf.id2word)]
for i in grouper(corpus_tfidf, 5000):
    nnz = sum(len(doc) for doc in i)
    job = matutils.corpus2csc(i, num_docs=len(i), num_nnz=nnz, num_terms=len(tfidf.id2word))
    Projection(len(tfidf.id2word), 100, job, extra_dims=100, power_iters=1)

Calling Projection will increase RAM usage to the size of SqliteDict db (in my case 22GB).

Interrupted iteration over a SqliteDict instance

I have a file created by sqlitedict in python2.7, file size is 11.37GB (number of keys:761951).
Recently I opened it with read-only flag in Python3.6. Something interesting:
Scenario A: If it is iterated over all 761951 keys (for key,value in dict.items()), the memory consumption is very low. Like this:
image

Scenario B: if the iteration was break prematurely, say at iteration counter=100, then call dict.close(). the memory consumption raises high. Like this:
image

Scenario C: if the iteration was break prematurely, say at iteration counter=100, but not to call dict.close(). The memory consumption keeps growing over 10 seconds after the execution has came out from the iteration. the memory consumption raises high as Scenario B.

(All the Scenario A,B,C are running in a thread by twisted "reactor.callInThread")

It suggests that the iteration was still undergoing even it breaks.

nested keys ?

when trying to use nested dict it won't allow me:
db['first']['second'], what is the proper solution ?

Remove downloads

PyPI is broken (has been broken for months) and no longer records package download numbers.

Remove the download badge, it looks silly.

docs: add section on motivation

This isn't a bug, just a question that maybe could be addressed in the README.

Python ships with https://docs.python.org/3/library/shelve.html, it is a persistent, dictionary-like object backed by a db to disk. How does sqlitedict improve things? Perhaps shelve's storage engine is old and less efficient (larger files? slower access?) than using sqlite3?

https://dataset.readthedocs.org/en/latest/ goes a set further to enable (but not require) SQL-like querys on a dict-like structure, backed by any SQLAlchemy db, with exports to JSON and CSV. The goal is to keep the simplicity of data handling that CSVs provide, without the downsides of CSVs (at least - that's how I understand it!).

Any thoughts on how sqlitedict fit into the picture would be well received.

To create value as non-blob type

Hi,
I notice that all the values created and entered into sqlite file are of type BLOB.
MAKE_TABLE = 'CREATE TABLE IF NOT EXISTS "%s" (key TEXT PRIMARY KEY, value BLOB)' % self.tablename
When I need to use a GUI tool like https://github.com/sqlitebrowser/sqlitebrowser to check and adjust the table content, the value can not be shown appropriately.
If possible, how to customize the vaule type?
Thanks.

Question: Issues with json encode? Also, is it multiprocess safe?

Sorry, not a bug, just a question. I am very interested in this tool for a data store on a tiny web app. I do not want to just use JSON files since I want it to be thread-safe (even though it won't really ever be an issue). Is this safe for multiple processes to try to write it? I know multiple threads work.

Anyway, is there a downside to using json.dumps and json.loads for the encoder and decoder? Is it just a bit slower?

Thanks!

Wrapper object using a destructor appears to hang on closing the sqlitedict

I have found that wrapping sqlitedict in an class with a destructor doesn't seem to work.

Here's just a snippet of my code:

from sqlitedict import SqliteDict

class Store():

    def __init__(self, path):
        self._map = SqliteDict(path, autocommit=False)

    def __del__(self):
        self._map.close()

store = Store('tmp/ext')

Run this with python ./script.py. And it just hangs.

Instead I have to use close(force=True) for it actually properly close.

setdefault seems not to be working

    assert list() == d.setdefault('tkey', [])
    assert list(d) == ['tkey']
    assert len(d['tkey']) == 0
    d.setdefault('tkey', []).append(1)
    print list(d)
    print d['tkey']
    assert d['tkey'] == [1]
['tkey']
[]
Traceback (most recent call last):
  File "sqlitedict.py", line 368, in 
    assert d['tkey'] == [1]
AssertionError

setdefault doesn't set list object in storage. I had to implement it manually like:

if key in storage:
    oldVal = storage[key]
    oldVal.append(val)
    storage[key] = oldVal
else:
    storage[key] = val

License file

I am trying to package sqlitedict for openSUSE. However, there is no license file I can find, either in the github or tarball. In order for people to know how they can use your code it is really important to have a license file. Would it be possible to add one? Thank you.

Find a way to test with sqlite latest version in Travis

Current sqlite version is 3.16

Some linux distributions have this version and we can't reproduce the issues reported in them because Travis is outdated.

We use Travis Ubuntu Trusty env with 3.8 version of sqlite.
Travis Ubuntu Precise has 3.6. It also has an option to install sqlite 3.7.15 from a white-listed ppa which is not maintatined.

Suggested solution: create our own ppa with sqlite, white-list it in Travis, maintain it.

splitting by key into multiple sqlite files

To improve concurrent access, it would help to split the sqlite database file into multiple files, with the key hash determining which file stores the full key-value mapping for the key. Then concurrent writes would be ok as long as the keys are assigned to different sqlite database files. You'd initialize sqlitedict not with a file but with a directory, and under that directory would create the sqlite db files.

Storing unicode keys returns bytestrings that must be decoded as utf8.

I know sqlite is utf-8 internally, that's fine.

But I end up with boiler plate code that says:

  • for dictionaries of keys of str, decode as utf-8 -- on retrieval, only.

This seems wrong. Shouldn't sqlitedict do this for us?

>>> mydict = sqlitedict.SqliteDict('/tmp/some_.db', autocommit=True)
>>> hamsterface = u'🐹'
>>> print hamsterface, repr(hamsterface), type(hamsterface)
🐹 u'\U0001f439' <type 'unicode'>
>>> mydict[hamsterface] = {hamsterface: hamsterface}
>>> print mydict.keys()[0], repr(mydict.keys()[0]), type(mydict.keys()[0])
🐹 '\xf0\x9f\x90\xb9' <type 'str'>
>>> ## Here, it is raw bytes, not unicode.  It still "prints" because my terminal happens to also be utf-8
>>> mydict.values()[0]
{u'\U0001f439': u'\U0001f439'}
>>> ## Yet, the value remains unicode. The only way to get the same unicode dict that we tried to represent back, is to decode it manually.
>>> decoded = { mydict.keys()[0].decode('utf-8'): mydict.values()[0] }
>>> decoded
{u'\U0001f439': {u'\U0001f439': u'\U0001f439'}}

If you agree, that the .keys() and .items() and so on should automatically decode str-type as utf-8 into unicode-type, then I will be happy to submit a PR.

in-memory use

A sqlitedict db is created and saved to disk eg.

with SqliteDict('mydb'), flag='c') as db:

db.close()

How can you now use 'mydb' in-memory using the sqlite :memory option with sqlitedict?

There is a "static" error in SqliteDict destructor.

There are 2 lines with "... self.conn.conn ..." in SqliteDict.del.
They should by replaced by " ...self.conn ..." because self.conn.conn does not exist.

Thank you for your work.

Best wishes.

Stéphane.

program without close does not stop

Python 3.5.1
Windows 10
sqlitedict 1.4.0

t.py:

from sqlitedict import SqliteDict
db = SqliteDict('test.db', autocommit=True)
#db.close()
print('end')
    def __del__(self):
        # like close(), but assume globals are gone by now (do not log!)
        self.close(do_log=False)  # <- never return

i guess
thread is daemon.
when del
silently already thread has stopped
never process event in queue

ENH: make multithreading support optional

In the special case that I just open the database, read some item and close it again, the multi-threading support causes significant overhead. I would therefore suggest to make it optional.

How could this be achieved in a way that requires only small code changes?

Deadlock on exception when opening connection

When SqliteDict is initialized, a connection to SQLite is opened (in here). This connection might fail.

If an exception happens, the connection does not get opened. However, the initialization still tries to create the table. In the commit of that operation, the execution is deadlocked, because the code keeps waiting infinitely in the req Queue.

SqliteDict doesn't seem to accept frozensets as keys.

Consider the following script:

#!/usr/bin/env python
from sqlitedict import SqliteDict
d = SqliteDict("test.sqlite")
d[frozenset(["c", "d"])] = "ab"

An exception is raised when executing it:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "<my_home>/.local/lib/python2.7/site-packages/sqlitedict.py", line 296, in run
    cursor.execute(req, arg)
InterfaceError: Error binding parameter 0 - probably unsupported type.

updateing a list: performance.

What is the recomanded flow for storing a list of values (like sql tables) under a dict key ?
Will appending to a list recommit the whole list ? only the last value ?

setdefault does not work as for dict

When I use sqlitedict I see that there is no working setdefault as I am used to from standard dict.

My not working code looks as follows:

with SqliteDict(d, autocommit=True) as c:
    c.setdefault(k, []).append("d")

My working code looks as:

if k in c:
    l = c[k]
    l.append("d")
    c[k] = l
else:
    c[k] = ["d"]

NOTE:
Even this did not work for me:

if k in c:
    l = c[k]
    l.append("d")
else:
    c[k] = ["d"]

I can see that there is only inherited setdefault and there are not tests for it. Should setdefault work? If not is that wanted as new feature?

it works awesome

This is not an issue, sorry, but didn't know where to write this. I wanted to say that this module works really awesome. Totally beats builtin solutions like gdbm, in speed and size handling. Thanks!

Using context manager fails if used more than once

>>> foo = SqliteDict('foo.sqlite')
>>> with foo:
...    foo[1] = 2
... 
>>> with foo:
...    foo[1] = 2
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/lib/python3.4/site-packages/sqlitedict.py", line 233, in __setitem__
    self.conn.execute(ADD_ITEM, (key, encode(value)))
AttributeError: 'NoneType' object has no attribute 'execute'

This is because __exit__ calls close() which deletes self.conn.

it seems __enter__ should check that self.conn exists and create it otherwise. I could just go through __init__ once more, but __init__ contains unnecessary actions like CREATE TABLE ...

I can create a patch if this issue is accepted.

__bool__ is broken

Hi,
I just found out that the bool method is broken: the bool descriptor implemented on SqliteDict is not working; this descriptor is introduced in python 3.0 by first time; in python 2.X the same functionality is named nonzero.

See this code in python 2.7 :

>>> class A(object):
...     def __bool__(self):
...         return False
...     
... 
>>> a=A()
>>> bool(a)
True
>>> not a
False
>>> A.__nonzero__ = A.__bool__
>>> not a
True
>>> not bool(a)
True

Usually for a python 2/3 compatibility with a simple renaming is enough:

SqliteDict.__nonzero__ = SqliteDict.__bool__

I tried in my private in my fork but then I see the boolean functionality is actually broken (test cases); they were just comparing the raw class instance (not None, therefore True).

Feature Request: Store nested dictionaries as new tables

This is both a feature request and to get an idea if people would be interested. Essentially, if you have a sub-dictionary that is not at the top level, the entire thing also has to be encoded.

I propose that when a user explicitly says to, make a subdictionary a new table as well. The idea is still very rough, but essentially modify the main class to be able to have a parent class (and therefore share the connection). Then, add a function that creates a new SqliteDict child instance. The table name should either something like tablename + uuid4(). Then this new object will behave just like its own SqliteDict. The only difference will be when you __setitem__ (or update) a child instance, it will instead make a pointer to that table and when you __getitem__ it will recreate it and when you __delitem__, it will drop it.

I think autocommit should still work and carry through.

A final idea along the same vein (though certainly harder to do) is to make a list-like object that behaves the same way. Use a sub SqliteDict under the hood but allow for storage. THis may be farther fetched.

Any ideas?

Maximum number of keys and multiple dicts in a database?

Hi, I was wondering about (a) what is the maximum number of keys that can be used in sqlitedict? and (b) how to use to use multiple dicts in the same database (that's a feature you listed in the readme I guess) and what is the maximum number of such dicts that can be present in the database?

Typically, is it possible to have like 20 billion keys whose values are lists of integers in a single dict? Also can I have hundreds of such dicts in the same db file?

Thanks,
Ayaan

Saving tuples as keys using sqlitedict

This issue is more a new feature idea.

It would be nice to be able to use Python tuples as keys as you can do it in standard Python dicts. My idea is to do something like serialisation from tuple to string (for example via user_defined_character.join(tuple) or via some user defined conversion function).

For querying there would be used just the same transformation function and for keys reconstruction there might be something like, either key.split(user_defined_character) or some other user defined conversion function.

should we have a .copy() method?

my use case, i need an in-memory copy of the contents of the whole database for comparison, but I wouldn't like to use my_copy = dict(sdb.items()), but instead just my_copy = sdb.copy()

sound ok?

autocommit does not commit if program exits

I ran into a problem in one of my programs where it was not committing some changes because my script exited shortly after the updates.

Here is a example script that reliably reproduces the effect (for me):

import json
import sqlitedict
import time

class Dict(sqlitedict.SqliteDict):
    def __init__(self, name):
        sqlitedict.SqliteDict.__init__(
            self,
            f'{name}.sqlite', autocommit=True,
            encode=json.dumps, decode=json.loads )

d = Dict('d')
now = int(time.time())

for i in range(100):
    d[i] = now

If I follow that up with a select, I only see 18 entries (sometimes I see zero entries):

$ sqlite3 d.sqlite 'SELECT * from unnamed'
0|1553346705
1|1553346705
2|1553346705
3|1553346705
4|1553346705
5|1553346705
6|1553346705
7|1553346705
8|1553346705
9|1553346705
10|1553346705
11|1553346705
12|1553346705
13|1553346705
14|1553346705
15|1553346705
16|1553346705
17|1553346705
18|1553346705

Fix license

An Apache license is used by the project, but not detected by Github, which views it as being a custom license. If the license file could be changed so that Github recognizes it as Apache that'd make it easier to figure out licensing issues.

Stuck at commit?

Sorry I can't figure out a reproducible example for this. But in a real world project, whenever 100 keys were added, reopen the Sqlitedict will stuck at the commit below.

        logger.info("opening Sqlite table %r in %s" % (tablename, filename))
        MAKE_TABLE = 'CREATE TABLE IF NOT EXISTS "%s" (key TEXT PRIMARY KEY, value BLOB)' % self.tablename
        logger.info(MAKE_TABLE)
        self.conn = self._new_conn()
        logger.info("execute")
        self.conn.execute(MAKE_TABLE)
        logger.info("commit")
        self.conn.commit()

I tried to reproduce this in an isolated example, but the problem doesn't happen in this example. But I attached it below anyway. My values contains nested dictionaries and about 30 fields, about 180K on disk. I switched to shelve because I have no idea what could be the cause. JFYI.

import sqlitedict


def ss():
    return sqlitedict.SqliteDict("cache.sqlite", autocommit=True)

for i in range(3000000):
    key = f"key-{i}"
    print(key)
    with ss() as sd:
        sd[key] = dict(foo=1, nested=dict(bar=1))

sets as value in an SqliteDict not updated

Is the following behaviour expected?

>>> from sqlitedict import SqliteDict
>>> d = SqliteDict("test.sqlite", autocommit=True)
>>> d["a"] = set([])
>>> d["a"].add(1)
>>> d["a"]
set([])
>>> d.commit()
>>> d["a"]
set([])
>>> d["a"].add(1)
>>> d.commit()
>>> d["a"]
set([])
>>> 

disk read/write rise and application get stuck.

when the system is busy with multiple thread writing to the sql dict sometimes I get the following beehive:
image

I don't think it is my application since it is only happens when many threads do lots of read/write to disk using sqlitedict.

any suggestions ? I thought of schedule "sync" writing to disk instead of commit every operation.

Pull request proposal: Fixing the python 2->3 UnicodeError

Hello,

The python2-> python3 break in pickling has caused many a developer much headache. While trying to get my SqliteDict from Py2 to Py3 the other day, I encountered the unicodeError that is caused by this issue. After a little research, I discovered that adding an encoding parameter to the loads and setting it to 'bytes' solves the issue. That's a simple change in line 105 of sqlitedict.py.

I'm raising this issue for general awareness, and not to jump the PR out of nowhere. : ) Since I encountered the issue after the latest release, I'm fairly certain it's not been fixed as of yet.

I present to you the simple tests I performed in Py3 to check that the change does not break anything. Everything prints as expected below. (Sorry for the formatting error. Can't seem to get code formatting to work).
`

Test 1:

a = pickle.dumps("This is a test to check byte encoding does not break Py3 ")
b = pickle.loads(a, encoding = 'bytes')
print (b)

Test2:

pickle.dump("This is another test", open('temp.txt', 'wb'))
print (pickle.load(open('temp.txt', 'rb'), encoding = 'bytes')
`

Update: I don't have Python 2.7 access currently, but Ideone seems to give a runtime error for Test 1. In such a case, a parameter to the SqliteDict function, something like legacy = True, might have to be added? Will confirm when I get home.

Nested dictionaries not supported

Is there any plan to support the persistence of nested dictionaries?

E.g.
mydict = SqliteDict('./my_db.sqlite', autocommit=True)
newdict = dict()
dict['key-1'] = 'value-1'
mydict['newdict'] = newdict

  • Untill here everything is persistent. But if I modify the internal dictionary now, it is not persisted.

newdict1 = mydict['newdict']
newdict1['key-2'] = 'value-2'

But this new key ('key-2') is not reflected in the database.

Feature Proposal: Multiple simultaneous dictionaries with one db file

Hello,

I use sqlitedict quite extensively in my research code, and overall find it a wonderful piece of software. Recently however, I discovered an issue that I had to create an interesting workaround for, and was thinking that this might be useful to others as well, if made into an official feature.

Use case:
I often use sqlitedict to store results from experiments (time series or just simple attributes). My structure might look like this:

A single dbfile with:

  • Attribute table for each experiment, with attribute keys underneath
  • Timeseries table for each experiment/timeseries attribute, with timestamp keys inside.
  • Metadata table to list experiments, and some mappings to attributes.

This works great with a small number of experiments, but when this grows to thousands, it is hard to work with the data because it is necessary to instantiate a sqlitedict for each dbfile/table combo.

Now, for my proposal:

I realized that it is possible to just switch out the .tablename attribute for an already open sqlitedict, and have it work just fine (the connection is already open, etc just need to change the table for the queries). This cut a recent processing time from ~5000s to 250s.

It would be cool to have an alternate interface that works like this:

db = sqlitedict.SqliteDictDb(dbfilename)
dictionary_foo = db.get_dict("foo")
dictionary_bar = db.get_dict("bar")

The dictionary interfaces themselves could be a lightweight wrapper that maintains a reference to the db object. These could share the same thread, connection, etc. and allow for much faster access to a variety of tables.

If this seems reasonable, I am more than happy to implement it and submit a pull request. If not, thats ok too, I'll just keep it in my personal library. I just figured I would ask in case there is interest, and if there is any preference for how it is done.

Thank you!
-Bill Katsak

Question regarding commit()

I don't understand this part in the documentation:

     |  commit(self, blocking=True)
     |      Persist all data to disk.
     |      
     |      When `blocking` is False, the commit command is queued, but the data is
     |      not guaranteed persisted (default implication when autocommit=True).

Does this mean blocking is False or True when using autocommit, e.g. can I be sure my data is persisted to disk?

Maybe it's me but as I understand the doc here, its meaning is ambiguous, also because the usage example says in the comments it's persisted to disk with autocommit=True.

Fix exit errors: TypeError("'NoneType' object is not callable",)

When a Python process exits that has some sqlitedict objects still in memory, the final pre-exit garbage collection can cause some unsightly errors being printed:

Exception TypeError: TypeError("'NoneType' object is not callable",) in <bound method SqliteDict.__del_ of SqliteDict(/data/db.sqlite)> ignored
Exception TypeError: TypeError("'NoneType' object is not callable",) in ignored
Exception TypeError: TypeError("'NoneType' object is not callable",) in ignored
Exception TypeError: TypeError("'NoneType' object is not callable",) in ignored
Exception TypeError: TypeError("'NoneType' object is not callable",) in ignored
Exception TypeError: TypeError("'NoneType' object is not callable",) in ignored
Exception TypeError: TypeError("'NoneType' object is not callable",) in ignored
Exception TypeError: TypeError("'NoneType' object is not callable",) in ignored

Investigate why this happens (there are already checks in place inside __del__) and fix it.

Desired behaviour: if the sqlite connection is still open/active, close it; if not, or if any error happens (object already partially self-destructed by exit, or any other error.. we have little control over what we can do at this point), just ignore it and proceed.

autocommit = off/on.

When I do autocommit on, will it retrieve the data from disk or from memory?
If I want to retrieve last 1000 values from memory and the rest from disk, should I commit once in 1000 new values ?

In general what is the memory consumption of the dict in RAM ?

Fails in multiprocessing even with 'r' mode

As one would expect with a normal dictionary, it cannot be used (easily) with multiprocessing for writing values. However, a normal diction can be used for reading.

sqlitedict fails even with flag='r' in multiprocessing. This is almost certainly due to the threaded connection, but that connection is not needed for flag='r' since you can concurrently read an sqlite file.

Is not too hard to fix, but would add a major change in code flow

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.