Git Product home page Git Product logo

Comments (12)

sitingren avatar sitingren commented on July 27, 2024

You can delete rows one by one by one with Cursor.executemany API and prepared statement:

# delete uid in (2,4,6)
cursor.executemany("DELETE FROM MyTable WHERE uid=?", [[2],[4],[6]], use_prepared_statements=True)
while True:  # Fetch the result set for each DELETE statement
    rows = cursor.fetchall()
    print("DELETED:", rows[0][0])
    if not cursor.nextset():
        break
# DELETED: 1
# DELETED: 1
# DELETED: 1

or you can delete rows batch by batch with Cursor.execute API and without prepared statement:

# delete uid in (2,4,6)
cursor.execute("DELETE FROM MyTable WHERE uid IN %s", [(2,4,6)], use_prepared_statements=False)
# query text to execute: DELETE FROM MyTable WHERE uid IN (2,4,6)
print("DELETED:", cur.fetchone()[0])  # DELETED: 3

from vertica-python.

nicolaerosia avatar nicolaerosia commented on July 27, 2024

@sitingren thank you!

the first option is not really an option as it will create lots of modified files.

The second option will interpolate the string locally and send it which seems a not very efficient size wise, an int can be encoded as 4 bytes, while the string takes more time, and probably more time to parse it on Vertica side.

This is the only option?

from vertica-python.

sitingren avatar sitingren commented on July 27, 2024

the first option is not really an option as it will create lots of modified files.

What does "create lots of modified files" mean? I'm not sure your context of the big integer list (~200k).

If you don't want to interpolate the string locally and send it to the server as a whole (i.e. internally send a single query message), you can use prepared statements (split into multiple messages to send). With prepared statements, you cannot get rid of splitting your list elements into batches.

cursor.executemany("DELETE FROM MyTable WHERE uid=?", [[2],[4],[6]], use_prepared_statements=True)
# Internally send "DELETE FROM MyTable WHERE uid=?" in a message
# Internally send [2] in a message
# Internally send [4] in a message
# Internally send [6] in a message

Another prepared statements option. The parameter is a string formatted as an array literal:

cursor.execute("DELETE FROM MyTable WHERE uid = ANY(?)", ["[2,4,6]"], use_prepared_statements=True)
# Internally send "DELETE FROM MyTable WHERE uid = ANY(?)" in one message
# Internally send "[2,4,6]" in another message
print("DELETED:", cur.fetchone()[0])  
# DELETED: 3

cursor.executemany("DELETE FROM MyTable WHERE uid = ANY(?)", [["[7,8,9]"],["[1,3]"]], use_prepared_statements=True)
# Internally send "DELETE FROM MyTable WHERE uid = ANY(?)" in a message
# Internally send "[7,8,9]" in a message
# Internally send "[1,3]" in a message

Those are all options I can think of.

from vertica-python.

nicolaerosia avatar nicolaerosia commented on July 27, 2024

I'm thinking about the ROS containers that get created.
If I have 200k uid rows that I want to delete, it feels inefficient to serialize/send 200k DELETE commands.

I'm going to benchmark the option with ANY, I was not aware of it.

Thank you!

from vertica-python.

nicolaerosia avatar nicolaerosia commented on July 27, 2024

@sitingren is there a way to avoid the string conversion? I have the uids as int in a Python list

from vertica-python.

nicolaerosia avatar nicolaerosia commented on July 27, 2024

Later edit: added 3 variants I tried

@sitingren I think there is a bug handling big strings binding, can you please try below sample:

    uids = range(200000)
    uids_str = f'{",".join(str(x) for x in uids)}'

    cur.execute("CREATE TABLE IF NOT EXISTS MYTABLE(uid int NOT NULL)")

    cur.executemany(
        "DELETE FROM MYTABLE WHERE uid IN (?)",
        [[uids_str]],
        use_prepared_statements=True,
    )

    # suggested by sitingren - I think the brackets should not be here ( [ ] )
    # does not work - if you check SELECT * FROM V_MONITOR.QUERY_REQUESTS, you won't see the query!
    uids_str = f'[{",".join(str(x) for x in uids)}]'
    cur.executemany(
        "DELETE FROM MYTABLE WHERE uid = ANY(?)",
        [[uids_str]],
        use_prepared_statements=True,
    )

    # suggested by sitingren - removed the [ ]
    # does not work - I cannot see the query...
    uids_str = f'{",".join(str(x) for x in uids)}'
    cur.executemany(
        "DELETE FROM MYTABLE WHERE uid = ANY(?)",
        [[uids_str]],
        use_prepared_statements=True,
    )

    cur.execute("COMMIT")
    cur.fetchall()
    dt.stop()

I get the following error which do not include all integers in the list

Traceback (most recent call last):
  File "<string>", line 5, in <module>
  File "/lib64/python3.11/site-packages/vertica_python/vertica/cursor.py", line 271, in executemany
    self._execute_prepared_statement(seq_of_parameters)
  File "/lib64/python3.11/site-packages/vertica_python/vertica/cursor.py", line 911, in _execute_prepared_statement
    self.connection.read_expected_message(messages.BindComplete)
  File "/lib64/python3.11/site-packages/vertica_python/vertica/connection.py", line 713, in read_expected_message
    raise errors.DatabaseError(message.error_message())
vertica_python.errors.DatabaseError: Severity: ERROR, Message: Invalid input syntax for integer: "0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,53, Sqlstate: 22V02, Routine: scanint8, File: int8.c, Line: 124, Error Code: 3681

from vertica-python.

nicolaerosia avatar nicolaerosia commented on July 27, 2024

@sitingren I have even a simpler example

with 100k it works, with 200k it does not

    conn = vertica_python.connect(**conn_info)
    cur = conn.cursor()
    uids = list(range(100000))
    uids_str = f'{",".join(str(x) for x in uids)}'
    cur.execute(f"DELETE FROM MYTABLE WHERE uid IN ({uids_str})")
    cur.execute("COMMIT")

200k does not work - I don't see the query in Vertica and I don't see the API throwing an error

    conn = vertica_python.connect(**conn_info)
    cur = conn.cursor()
    uids = list(range(200000))
    uids_str = f'{",".join(str(x) for x in uids)}'
    cur.execute(f"DELETE FROM MYTABLE WHERE uid IN ({uids_str})")
    cur.execute("COMMIT")

from vertica-python.

alippai avatar alippai commented on July 27, 2024

You can insert into a temporary table using a single prepared statement and then do WHERE uid in (select uid from temp_table)

from vertica-python.

nicolaerosia avatar nicolaerosia commented on July 27, 2024

@sitingren are you aware of any limitation w.r.t. to query size?

from vertica-python.

sitingren avatar sitingren commented on July 27, 2024

For the first sample

The error you see is from execution of "DELETE FROM MYTABLE WHERE uid IN (?)".
For execution of "DELETE FROM MYTABLE WHERE uid = ANY(?)", the error is

vertica_python.errors.DatabaseError: Severity: ERROR, 
Message: string_to_array parse error: Output array is not big enough: 
Type limit is 65000 bytes, but value needs 1600000 bytes, 
Sqlstate: 22023, Routine: StringToArray, 
File: /data/jenkins/workspace/RE-ReleaseBuilds/RE-Knuckleboom_2/server/vertica/EE/Functions/Arrays.cpp, 
Line: 41, Error Code: 9329

ANY will consider the list as an array and it's limited by DefaultArrayBinarySize

The maximum binary size, in bytes, for an unbounded collection, if a maximum size is not specified at creation time.
Default: 65000

For the second sample

For f"DELETE FROM MYTABLE WHERE uid IN ({uids_str})" in your second sample, you should call Cursor.fetchall() to reveal the error. (See the second topic in Q&A)

From Vertica, very large queries can fail if MaxParsedQuerySizeMB is reached. Check your setting:

select * from configuration_parameters where parameter_name='MaxParsedQuerySizeMB';

temporary table

The best solution might be to create a temp table with all the values in their in list (you can COPY or INSERT), then do DELETE FROM MYTABLE WHERE uid in (select uid from temp_table).

from vertica-python.

nicolaerosia avatar nicolaerosia commented on July 27, 2024

@sitingren thank you very much, I was not aware of these all parameters! I'll check them out!

I think it makes sense to avoid the creation of the temporary table, but we need to benchmark it

from vertica-python.

sitingren avatar sitingren commented on July 27, 2024

Reopen if you have more questions.

from vertica-python.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.