Comments (10)
This is a bug in APSW.
The open flags are SQLITE_OPEN_CREATE | SQLITE_OPEN_DELETEONCLOSE | SQLITE_OPEN_EXCLUSIVE | SQLITE_OPEN_READWRITE | SQLITE_OPEN_TRANSIENT_DB
The SQLite doc does say NULL is allowed. I vaguely recall in the past that a zero length string was passed.
This exact combination only occurs for OP_OpenAutoindex
/ OP_OpenEphemeral
in the source
The reason you are hitting this is because as you add larger offsets to your LIMIT
, more data has to be kept in memory to be sorted until memory limits are hit and disk is used instead.
Once I have a reproducer I'll let you know of a workaround.
from apsw.
I was unable to find a workaround such as increasing the cache size. I am about to commit the fix. It will be part of the next release which is soon.
from apsw.
You are measuring one thing:
select * from hello order by a limit ?, ?
To do the order by
SQLite has to do a sort. The sort time is proportional to the number and size of items being sorted as you have found.
Turn on logging and SQLite will tell you if it is creating an automatic index.
Use EXPLAIN QUERY PLAN to see how SQLite executes your query. There is good documentation on the principles behind query execution.
You can do this to get a shell in the right place that knows how to format the query plan:
apsw.shell.Shell(db=conn).process_sql("explain query plan select * from hello order by a limit ?, ?", (i*500, (i*500+500)))
With your code above I get this query plan:
id parent detail
9 0 SCAN hello
24 0 USE TEMP B-TREE FOR ORDER BY
SCAN
means it is going through the whole table start to finish. The TEMP B-TREE
is SQLite having to sort everything on demand.
I did this:
create index hello_a on hello(a)
Then the query plan becomes:
id parent detail
10 0 SCAN hello USING INDEX hello_a
It is instant because SQLite just has to seek to the right location in the index to produce the results and doesn't need to do a sort.
You also don't need that separate cursor and it isn't doing anything for you other than providing a potential source of bugs in the future when you use it while using it. All the cur.execute
can and should be replaced with conn.execute
.
from apsw.
APSW doesn't change SQLite's behaviour when it comes to executing SQL.
TypeError: filename should be a string
That is the error you are getting. You should print out what the value of filename actually is, as well as the flags. augmented tracebacks will also contain the same information. My guess is SQLite is spilling from memory to disk.
If you want further help you'll need to include the APSW version, as well as enough for me to reproduce the problem.
from apsw.
APSW doesn't change SQLite's behaviour when it comes to executing SQL.
TypeError: filename should be a string
That is the error you are getting. You should print out what the value of filename actually is, as well as the flags. augmented tracebacks will also contain the same information. My guess is SQLite is spilling from memory to disk.
If you want further help you'll need to include the APSW version, as well as enough for me to reproduce the problem.
Sorry, here is what I got :
File "c:\Users\abdelouahab\Desktop\stocks\application\app.py", line 1008, in get
cur.execute('''
contenu = 'Année Passée'
exc = TypeError('filename should be a string')
fin = 500
nombre = (8099,)
now = datetime.datetime(2024, 1, 5, 18, 16, 43, 611, tzinfo=zoneinfo.ZoneInfo(key='Africa/Algiers'))
pagination = 16
self = <__main__.Recette object at 0x000001EB7FED2390>
start = 6000
url = '/admin/recette?q=l'
what = 'l&skip'
File "c:\Users\abdelouahab\Desktop\stocks\application\app.py", line 57, in xOpen
return ObfuscatedVFSFile(self.base_vfs, name, flags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
flags = [1054, 0]
name = None
self = <__main__.ObfuscatedVFS object at 0x000001EB7F8932A0>
File "c:\Users\abdelouahab\Desktop\stocks\application\app.py", line 63, in __init__
apsw.VFSFile.__init__(self, inheritfromvfsname, filename, flags)
filename = None
flags = [1054, 0]
inheritfromvfsname = ''
self = <__main__.ObfuscatedVFSFile object at 0x000001EB104A7F70>
TypeError: filename should be a string
Windows 10 64bits,
APSW 3.44.2.0,
Python Python 3.11.7
Seems it loses the filename argument after a certain cursor limit ?
from apsw.
This is a bug in APSW.
The open flags are
SQLITE_OPEN_CREATE | SQLITE_OPEN_DELETEONCLOSE | SQLITE_OPEN_EXCLUSIVE | SQLITE_OPEN_READWRITE | SQLITE_OPEN_TRANSIENT_DB
The SQLite doc does say NULL is allowed. I vaguely recall in the past that a zero length string was passed.
This exact combination only occurs for
OP_OpenAutoindex
/OP_OpenEphemeral
in the sourceThe reason you are hitting this is because as you add larger offsets to your
LIMIT
, more data has to be kept in memory to be sorted until memory limits are hit and disk is used instead.Once I have a reproducer I'll let you know of a workaround.
Thank you very much ^^
from apsw.
Reproduced it with the code :
import apsw
import uuid
import time
import sys
from apsw.ext import print_augmented_traceback
def obfuscate(data):
if not data: return data
return bytes([x ^ 0xcc for x in data])
class ObfuscatedVFS(apsw.VFS):
def __init__(self, vfsname="obfuscated", basevfs=""):
self.vfs_name = vfsname
self.base_vfs = basevfs
apsw.VFS.__init__(self, self.vfs_name, self.base_vfs)
def xOpen(self, name, flags: int):
return ObfuscatedVFSFile(self.base_vfs, name, flags)
class ObfuscatedVFSFile(apsw.VFSFile):
def __init__(self, inheritfromvfsname, filename, flags):
apsw.VFSFile.__init__(self, inheritfromvfsname, filename, flags)
def xRead(self, amount, offset):
return obfuscate(super().xRead(amount, offset))
def xWrite(self, data, offset):
super().xWrite(obfuscate(data), offset)
obfuvfs = ObfuscatedVFS()
open_flags = apsw.SQLITE_OPEN_READWRITE | apsw.SQLITE_OPEN_CREATE
open_flags |= apsw.SQLITE_OPEN_URI
conn = apsw.Connection("my.db", vfs=obfuvfs.vfs_name)
cur = conn.cursor()
cur.execute('''
CREATE TABLE IF NOT EXISTS "hello" (
"a" CLOB NOT NULL,
"b" CLOB NOT NULL,
"c" CLOB NOT NULL,
"d" CLOB NOT NULL,
"e" CLOB NOT NULL
)''')
with conn:
for i in range(100000):
cur.execute('''
insert into hello (`a`, `b`, `c`, `d`, `e`) VALUES (?, ?, ?, ?, ?)
''', (str(uuid.uuid4()), str(uuid.uuid4()), str(uuid.uuid4()), str(uuid.uuid4()), str(uuid.uuid4())))
for i in range(9000):
try:
print("from {0} to {1}".format( i*500, i*500+500) )
print("reached {} characters".format((i+1)*500*36*5))
cur.execute("select * from hello order by a limit ?, ?", (i*500, (i*500+500)))
print(cur.fetchall())
time.sleep(1)
except Exception as exc:
print_augmented_traceback(*sys.exc_info())
break
It stops at :
from 4000 to 4500
reached 810000 characters
Even if I have 32 GB of ram on 64bits OS ?
Traceback (most recent call last):
File "c:\Users\abdelouahab\Desktop\New folder\db_tester.py", line 64, in <module>
cur.execute("select * from hello order by a limit ?, ?", (i*500, (i*500+500)))
ObfuscatedVFS = <class '__main__.ObfuscatedVFS'>
ObfuscatedVFSFile = <class '__main__.ObfuscatedVFSFile'>
__annotations__ = {}
__builtins__ = <module 'builtins' (built-in)>
__cached__ = None
__doc__ = None
__file__ = 'c:\\Users\\abdelouahab\\Desktop\\New folder\\db_tester.py'
__loader__ = <_frozen_importlib_external.SourceFileLoader object at 0x00000232AD8DBBF0>
__name__ = '__main__'
__package__ = None
__spec__ = None
apsw = <module 'apsw' from 'C:\\Users\\abdelouahab\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\apsw\\__init__.cp312-win_amd64.pyd'>
conn = <apsw.Connection object at 0x00000232ADBB5F30>
cur = <apsw.Cursor object at 0x00000232AE028040>
exc = TypeError('filename should be a string')
i = 8
obfuscate = <function obfuscate at 0x00000232AD8CA3E0>
obfuvfs = <__main__.ObfuscatedVFS object at 0x00000232ADBE74D0>
open_flags = 70
print_augmented_traceback = <function print_augmented_traceback at 0x00000232AE0142C0>
sys = <module 'sys' (built-in)>
time = <module 'time' (built-in)>
uuid = <module 'uuid' from 'C:\\Users\\abdelouahab\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\uuid.py'>
File "c:\Users\abdelouahab\Desktop\New folder\db_tester.py", line 19, in xOpen
return ObfuscatedVFSFile(self.base_vfs, name, flags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
flags = [1054, 0]
name = None
self = <__main__.ObfuscatedVFS object at 0x00000232ADBE74D0>
File "c:\Users\abdelouahab\Desktop\New folder\db_tester.py", line 25, in __init__
apsw.VFSFile.__init__(self, inheritfromvfsname, filename, flags)
filename = None
flags = [1054, 0]
inheritfromvfsname = ''
self = <__main__.ObfuscatedVFSFile object at 0x00000232ADB83E30>
TypeError: filename should be a string
Edit : tried it on Linux (PopOS 64bits) same error at the same iteration
from apsw.
It worked now, performance is a hit (takes more time after that limitation, but at least it works ! )
Here is the result on Linux
from apsw.
I measured performances with this :
for i in range(90):
try:
t1 = time.time()
print("from {0} to {1}".format( i*500, i*500+500) )
cur.execute("select * from hello order by a limit ?, ?", (i*500, (i*500+500)))
# print(cur.fetchall())
print("reached {} characters".format((i+1)*500*36*5))
perfor.append(time.time() - t1)
print("performance is {}".format(perfor))
time.sleep(1)
except Exception as exc:
print_augmented_traceback(*sys.exc_info())
break
And it seems it's getting bigger and bigger !
from apsw.
You are measuring one thing:
select * from hello order by a limit ?, ?
To do the
order by
SQLite has to do a sort. The sort time is proportional to the number and size of items being sorted as you have found.Turn on logging and SQLite will tell you if it is creating an automatic index.
Use EXPLAIN QUERY PLAN to see how SQLite executes your query. There is good documentation on the principles behind query execution.
You can do this to get a shell in the right place that knows how to format the query plan:
apsw.shell.Shell(db=conn).process_sql("explain query plan select * from hello order by a limit ?, ?", (i*500, (i*500+500)))
With your code above I get this query plan:
id parent detail 9 0 SCAN hello 24 0 USE TEMP B-TREE FOR ORDER BY
SCAN
means it is going through the whole table start to finish. TheTEMP B-TREE
is SQLite having to sort everything on demand.I did this:
create index hello_a on hello(a)
Then the query plan becomes:
id parent detail 10 0 SCAN hello USING INDEX hello_a
It is instant because SQLite just has to seek to the right location in the index to produce the results and doesn't need to do a sort.
You also don't need that separate cursor and it isn't doing anything for you other than providing a potential source of bugs in the future when you use it while using it. All the
cur.execute
can and should be replaced withconn.execute
.
Wow ! this just amazing !
Indexing the ordering reduced not only the execute time, but even the bug disapeared without the patch, the patch will improve even better !
Thank you so much !
from apsw.
Related Issues (20)
- Unknown apsw attribute 'SQLiteValues' HOT 3
- Cursor.executemany should be annotated as accepting Iterable[Bindings] instead of Sequence[Bindings] HOT 1
- Implement Unicode TR-29 and TR-14 HOT 1
- error: static declaration of ‘PyLong_AsInt’ in Python 3.13+ HOT 3
- Switch to cosign for signing
- Incorrect values in `colUsed` HOT 2
- Update to Unicode 16.0 when released September 2024 HOT 1
- connection.set_row_trace - excepts; Connection.setrowtrace works HOT 4
- `connection.setrowtrace` stops working when adapters are registered HOT 5
- Datetimes into sqlite via APSW HOT 1
- Revise logging levels
- query execution row and time limits HOT 1
- Support subtypes in functions
- Don't cache pragmas
- con.pragma can't accept blobs HOT 1
- Update Connection.pragma with schema parameter
- Implement SQLITE_FCNTL_VFSNAME
- VFS xRandomness errors need to be unraisable HOT 1
- apsw.ext.query_info should find bound parameters
- Comprehensive dump module
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from apsw.