Git Product home page Git Product logo

Comments (10)

rogerbinns avatar rogerbinns commented on July 19, 2024 1

This is a bug in APSW.

The open flags are SQLITE_OPEN_CREATE | SQLITE_OPEN_DELETEONCLOSE | SQLITE_OPEN_EXCLUSIVE | SQLITE_OPEN_READWRITE | SQLITE_OPEN_TRANSIENT_DB

The SQLite doc does say NULL is allowed. I vaguely recall in the past that a zero length string was passed.

This exact combination only occurs for OP_OpenAutoindex / OP_OpenEphemeral in the source

The reason you are hitting this is because as you add larger offsets to your LIMIT , more data has to be kept in memory to be sorted until memory limits are hit and disk is used instead.

Once I have a reproducer I'll let you know of a workaround.

from apsw.

rogerbinns avatar rogerbinns commented on July 19, 2024 1

I was unable to find a workaround such as increasing the cache size. I am about to commit the fix. It will be part of the next release which is soon.

from apsw.

rogerbinns avatar rogerbinns commented on July 19, 2024 1

You are measuring one thing:

select * from hello order by a limit ?, ?

To do the order by SQLite has to do a sort. The sort time is proportional to the number and size of items being sorted as you have found.

Turn on logging and SQLite will tell you if it is creating an automatic index.

Use EXPLAIN QUERY PLAN to see how SQLite executes your query. There is good documentation on the principles behind query execution.

You can do this to get a shell in the right place that knows how to format the query plan:

apsw.shell.Shell(db=conn).process_sql("explain query plan select * from hello order by a limit ?, ?", (i*500, (i*500+500)))

With your code above I get this query plan:

  id  parent  detail
   9       0  SCAN hello
  24       0  USE TEMP B-TREE FOR ORDER BY

SCAN means it is going through the whole table start to finish. The TEMP B-TREE is SQLite having to sort everything on demand.

I did this:

create index hello_a on hello(a)

Then the query plan becomes:

  id  parent  detail
  10       0  SCAN hello USING INDEX hello_a

It is instant because SQLite just has to seek to the right location in the index to produce the results and doesn't need to do a sort.

You also don't need that separate cursor and it isn't doing anything for you other than providing a potential source of bugs in the future when you use it while using it. All the cur.execute can and should be replaced with conn.execute.

from apsw.

rogerbinns avatar rogerbinns commented on July 19, 2024

APSW doesn't change SQLite's behaviour when it comes to executing SQL.

TypeError: filename should be a string

That is the error you are getting. You should print out what the value of filename actually is, as well as the flags. augmented tracebacks will also contain the same information. My guess is SQLite is spilling from memory to disk.

If you want further help you'll need to include the APSW version, as well as enough for me to reproduce the problem.

from apsw.

abdelouahabb avatar abdelouahabb commented on July 19, 2024

APSW doesn't change SQLite's behaviour when it comes to executing SQL.

TypeError: filename should be a string

That is the error you are getting. You should print out what the value of filename actually is, as well as the flags. augmented tracebacks will also contain the same information. My guess is SQLite is spilling from memory to disk.

If you want further help you'll need to include the APSW version, as well as enough for me to reproduce the problem.

Sorry, here is what I got :

File "c:\Users\abdelouahab\Desktop\stocks\application\app.py", line 1008, in get
   cur.execute('''
   contenu = 'Année Passée'
   exc = TypeError('filename should be a string')
   fin = 500
   nombre = (8099,)
   now = datetime.datetime(2024, 1, 5, 18, 16, 43, 611, tzinfo=zoneinfo.ZoneInfo(key='Africa/Algiers'))
   pagination = 16
   self = <__main__.Recette object at 0x000001EB7FED2390>
   start = 6000
   url = '/admin/recette?q=l'
   what = 'l&skip'

 File "c:\Users\abdelouahab\Desktop\stocks\application\app.py", line 57, in xOpen
   return ObfuscatedVFSFile(self.base_vfs, name, flags)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   flags = [1054, 0]
   name = None
   self = <__main__.ObfuscatedVFS object at 0x000001EB7F8932A0>

 File "c:\Users\abdelouahab\Desktop\stocks\application\app.py", line 63, in __init__
   apsw.VFSFile.__init__(self, inheritfromvfsname, filename, flags)
   filename = None
   flags = [1054, 0]
   inheritfromvfsname = ''
   self = <__main__.ObfuscatedVFSFile object at 0x000001EB104A7F70>

TypeError: filename should be a string


Windows 10 64bits,
APSW 3.44.2.0,
Python Python 3.11.7

Seems it loses the filename argument after a certain cursor limit ?

from apsw.

abdelouahabb avatar abdelouahabb commented on July 19, 2024

This is a bug in APSW.

The open flags are SQLITE_OPEN_CREATE | SQLITE_OPEN_DELETEONCLOSE | SQLITE_OPEN_EXCLUSIVE | SQLITE_OPEN_READWRITE | SQLITE_OPEN_TRANSIENT_DB

The SQLite doc does say NULL is allowed. I vaguely recall in the past that a zero length string was passed.

This exact combination only occurs for OP_OpenAutoindex / OP_OpenEphemeral in the source

The reason you are hitting this is because as you add larger offsets to your LIMIT , more data has to be kept in memory to be sorted until memory limits are hit and disk is used instead.

Once I have a reproducer I'll let you know of a workaround.

Thank you very much ^^

from apsw.

abdelouahabb avatar abdelouahabb commented on July 19, 2024

Reproduced it with the code :

import apsw
import uuid
import time
import sys
from apsw.ext import print_augmented_traceback

def obfuscate(data):
    if not data: return data
    return bytes([x ^ 0xcc for x in data])

class ObfuscatedVFS(apsw.VFS):

    def __init__(self, vfsname="obfuscated", basevfs=""):
        self.vfs_name = vfsname
        self.base_vfs = basevfs
        apsw.VFS.__init__(self, self.vfs_name, self.base_vfs)

    def xOpen(self, name, flags: int):
        return ObfuscatedVFSFile(self.base_vfs, name, flags)


class ObfuscatedVFSFile(apsw.VFSFile):

    def __init__(self, inheritfromvfsname, filename, flags):
        apsw.VFSFile.__init__(self, inheritfromvfsname, filename, flags)

    def xRead(self, amount, offset):
        return obfuscate(super().xRead(amount, offset))

    def xWrite(self, data, offset):
        super().xWrite(obfuscate(data), offset)

obfuvfs = ObfuscatedVFS()

open_flags = apsw.SQLITE_OPEN_READWRITE | apsw.SQLITE_OPEN_CREATE
open_flags |= apsw.SQLITE_OPEN_URI


conn = apsw.Connection("my.db", vfs=obfuvfs.vfs_name)


cur = conn.cursor()

cur.execute('''
            CREATE TABLE IF NOT EXISTS "hello" (
                "a"	CLOB NOT NULL,
                "b"	CLOB NOT NULL,
                "c"	CLOB NOT NULL,
                "d"	CLOB NOT NULL,
                "e" CLOB NOT NULL
            )''')


with conn:
    for i in range(100000):
        cur.execute('''
        insert into hello (`a`, `b`, `c`, `d`, `e`) VALUES (?, ?, ?, ?, ?)
        ''', (str(uuid.uuid4()), str(uuid.uuid4()), str(uuid.uuid4()), str(uuid.uuid4()), str(uuid.uuid4())))


for i in range(9000):
    try:
        print("from {0} to {1}".format( i*500, i*500+500) )
        print("reached {} characters".format((i+1)*500*36*5))
        cur.execute("select * from hello order by a limit ?, ?", (i*500, (i*500+500)))
        print(cur.fetchall())
        time.sleep(1)
    except Exception as exc:
        print_augmented_traceback(*sys.exc_info())
        break

It stops at :

from 4000 to 4500
reached 810000 characters

Even if I have 32 GB of ram on 64bits OS ?

Traceback (most recent call last):

  File "c:\Users\abdelouahab\Desktop\New folder\db_tester.py", line 64, in <module>
    cur.execute("select * from hello order by a limit ?, ?", (i*500, (i*500+500)))
    ObfuscatedVFS = <class '__main__.ObfuscatedVFS'>
    ObfuscatedVFSFile = <class '__main__.ObfuscatedVFSFile'>
    __annotations__ = {}
    __builtins__ = <module 'builtins' (built-in)>
    __cached__ = None
    __doc__ = None
    __file__ = 'c:\\Users\\abdelouahab\\Desktop\\New folder\\db_tester.py'
    __loader__ = <_frozen_importlib_external.SourceFileLoader object at 0x00000232AD8DBBF0>
    __name__ = '__main__'
    __package__ = None
    __spec__ = None
    apsw = <module 'apsw' from 'C:\\Users\\abdelouahab\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\apsw\\__init__.cp312-win_amd64.pyd'>
    conn = <apsw.Connection object at 0x00000232ADBB5F30>
    cur = <apsw.Cursor object at 0x00000232AE028040>
    exc = TypeError('filename should be a string')
    i = 8
    obfuscate = <function obfuscate at 0x00000232AD8CA3E0>
    obfuvfs = <__main__.ObfuscatedVFS object at 0x00000232ADBE74D0>
    open_flags = 70
    print_augmented_traceback = <function print_augmented_traceback at 0x00000232AE0142C0>
    sys = <module 'sys' (built-in)>
    time = <module 'time' (built-in)>
    uuid = <module 'uuid' from 'C:\\Users\\abdelouahab\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\uuid.py'>

  File "c:\Users\abdelouahab\Desktop\New folder\db_tester.py", line 19, in xOpen
    return ObfuscatedVFSFile(self.base_vfs, name, flags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    flags = [1054, 0]
    name = None
    self = <__main__.ObfuscatedVFS object at 0x00000232ADBE74D0>

  File "c:\Users\abdelouahab\Desktop\New folder\db_tester.py", line 25, in __init__
    apsw.VFSFile.__init__(self, inheritfromvfsname, filename, flags)
    filename = None
    flags = [1054, 0]
    inheritfromvfsname = ''
    self = <__main__.ObfuscatedVFSFile object at 0x00000232ADB83E30>

TypeError: filename should be a string

Edit : tried it on Linux (PopOS 64bits) same error at the same iteration

from apsw.

abdelouahabb avatar abdelouahabb commented on July 19, 2024

It worked now, performance is a hit (takes more time after that limitation, but at least it works ! )
Here is the result on Linux

2024-01-06_16-21

from apsw.

abdelouahabb avatar abdelouahabb commented on July 19, 2024

I measured performances with this :

for i in range(90):
    try:
        t1 = time.time()
        print("from {0} to {1}".format( i*500, i*500+500) )
        cur.execute("select * from hello order by a limit ?, ?", (i*500, (i*500+500)))
        # print(cur.fetchall())
        print("reached {} characters".format((i+1)*500*36*5))
        perfor.append(time.time() - t1)
        print("performance is {}".format(perfor))
        time.sleep(1)
    except Exception as exc:
        print_augmented_traceback(*sys.exc_info())
        break

And it seems it's getting bigger and bigger !

2024-01-06_16-33

from apsw.

abdelouahabb avatar abdelouahabb commented on July 19, 2024

You are measuring one thing:

select * from hello order by a limit ?, ?

To do the order by SQLite has to do a sort. The sort time is proportional to the number and size of items being sorted as you have found.

Turn on logging and SQLite will tell you if it is creating an automatic index.

Use EXPLAIN QUERY PLAN to see how SQLite executes your query. There is good documentation on the principles behind query execution.

You can do this to get a shell in the right place that knows how to format the query plan:

apsw.shell.Shell(db=conn).process_sql("explain query plan select * from hello order by a limit ?, ?", (i*500, (i*500+500)))

With your code above I get this query plan:

  id  parent  detail
   9       0  SCAN hello
  24       0  USE TEMP B-TREE FOR ORDER BY

SCAN means it is going through the whole table start to finish. The TEMP B-TREE is SQLite having to sort everything on demand.

I did this:

create index hello_a on hello(a)

Then the query plan becomes:

  id  parent  detail
  10       0  SCAN hello USING INDEX hello_a

It is instant because SQLite just has to seek to the right location in the index to produce the results and doesn't need to do a sort.

You also don't need that separate cursor and it isn't doing anything for you other than providing a potential source of bugs in the future when you use it while using it. All the cur.execute can and should be replaced with conn.execute.

Wow ! this just amazing !
Indexing the ordering reduced not only the execute time, but even the bug disapeared without the patch, the patch will improve even better !
Thank you so much !

from apsw.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.