Comments (12)
You can delete rows one by one by one with Cursor.executemany API and prepared statement:
# delete uid in (2,4,6)
cursor.executemany("DELETE FROM MyTable WHERE uid=?", [[2],[4],[6]], use_prepared_statements=True)
while True: # Fetch the result set for each DELETE statement
rows = cursor.fetchall()
print("DELETED:", rows[0][0])
if not cursor.nextset():
break
# DELETED: 1
# DELETED: 1
# DELETED: 1
or you can delete rows batch by batch with Cursor.execute API and without prepared statement:
# delete uid in (2,4,6)
cursor.execute("DELETE FROM MyTable WHERE uid IN %s", [(2,4,6)], use_prepared_statements=False)
# query text to execute: DELETE FROM MyTable WHERE uid IN (2,4,6)
print("DELETED:", cur.fetchone()[0]) # DELETED: 3
from vertica-python.
@sitingren thank you!
the first option is not really an option as it will create lots of modified files.
The second option will interpolate the string locally and send it which seems a not very efficient size wise, an int can be encoded as 4 bytes, while the string takes more time, and probably more time to parse it on Vertica side.
This is the only option?
from vertica-python.
the first option is not really an option as it will create lots of modified files.
What does "create lots of modified files" mean? I'm not sure your context of the big integer list (~200k).
If you don't want to interpolate the string locally and send it to the server as a whole (i.e. internally send a single query message), you can use prepared statements (split into multiple messages to send). With prepared statements, you cannot get rid of splitting your list elements into batches.
cursor.executemany("DELETE FROM MyTable WHERE uid=?", [[2],[4],[6]], use_prepared_statements=True)
# Internally send "DELETE FROM MyTable WHERE uid=?" in a message
# Internally send [2] in a message
# Internally send [4] in a message
# Internally send [6] in a message
Another prepared statements option. The parameter is a string formatted as an array literal:
cursor.execute("DELETE FROM MyTable WHERE uid = ANY(?)", ["[2,4,6]"], use_prepared_statements=True)
# Internally send "DELETE FROM MyTable WHERE uid = ANY(?)" in one message
# Internally send "[2,4,6]" in another message
print("DELETED:", cur.fetchone()[0])
# DELETED: 3
cursor.executemany("DELETE FROM MyTable WHERE uid = ANY(?)", [["[7,8,9]"],["[1,3]"]], use_prepared_statements=True)
# Internally send "DELETE FROM MyTable WHERE uid = ANY(?)" in a message
# Internally send "[7,8,9]" in a message
# Internally send "[1,3]" in a message
Those are all options I can think of.
from vertica-python.
I'm thinking about the ROS containers that get created.
If I have 200k uid rows that I want to delete, it feels inefficient to serialize/send 200k DELETE commands.
I'm going to benchmark the option with ANY, I was not aware of it.
Thank you!
from vertica-python.
@sitingren is there a way to avoid the string conversion? I have the uids as int in a Python list
from vertica-python.
Later edit: added 3 variants I tried
@sitingren I think there is a bug handling big strings binding, can you please try below sample:
uids = range(200000)
uids_str = f'{",".join(str(x) for x in uids)}'
cur.execute("CREATE TABLE IF NOT EXISTS MYTABLE(uid int NOT NULL)")
cur.executemany(
"DELETE FROM MYTABLE WHERE uid IN (?)",
[[uids_str]],
use_prepared_statements=True,
)
# suggested by sitingren - I think the brackets should not be here ( [ ] )
# does not work - if you check SELECT * FROM V_MONITOR.QUERY_REQUESTS, you won't see the query!
uids_str = f'[{",".join(str(x) for x in uids)}]'
cur.executemany(
"DELETE FROM MYTABLE WHERE uid = ANY(?)",
[[uids_str]],
use_prepared_statements=True,
)
# suggested by sitingren - removed the [ ]
# does not work - I cannot see the query...
uids_str = f'{",".join(str(x) for x in uids)}'
cur.executemany(
"DELETE FROM MYTABLE WHERE uid = ANY(?)",
[[uids_str]],
use_prepared_statements=True,
)
cur.execute("COMMIT")
cur.fetchall()
dt.stop()
I get the following error which do not include all integers in the list
Traceback (most recent call last):
File "<string>", line 5, in <module>
File "/lib64/python3.11/site-packages/vertica_python/vertica/cursor.py", line 271, in executemany
self._execute_prepared_statement(seq_of_parameters)
File "/lib64/python3.11/site-packages/vertica_python/vertica/cursor.py", line 911, in _execute_prepared_statement
self.connection.read_expected_message(messages.BindComplete)
File "/lib64/python3.11/site-packages/vertica_python/vertica/connection.py", line 713, in read_expected_message
raise errors.DatabaseError(message.error_message())
vertica_python.errors.DatabaseError: Severity: ERROR, Message: Invalid input syntax for integer: "0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,53, Sqlstate: 22V02, Routine: scanint8, File: int8.c, Line: 124, Error Code: 3681
from vertica-python.
@sitingren I have even a simpler example
with 100k it works, with 200k it does not
conn = vertica_python.connect(**conn_info)
cur = conn.cursor()
uids = list(range(100000))
uids_str = f'{",".join(str(x) for x in uids)}'
cur.execute(f"DELETE FROM MYTABLE WHERE uid IN ({uids_str})")
cur.execute("COMMIT")
200k does not work - I don't see the query in Vertica and I don't see the API throwing an error
conn = vertica_python.connect(**conn_info)
cur = conn.cursor()
uids = list(range(200000))
uids_str = f'{",".join(str(x) for x in uids)}'
cur.execute(f"DELETE FROM MYTABLE WHERE uid IN ({uids_str})")
cur.execute("COMMIT")
from vertica-python.
You can insert into a temporary table using a single prepared statement and then do WHERE uid in (select uid from temp_table)
from vertica-python.
@sitingren are you aware of any limitation w.r.t. to query size?
from vertica-python.
For the first sample
The error you see is from execution of "DELETE FROM MYTABLE WHERE uid IN (?)".
For execution of "DELETE FROM MYTABLE WHERE uid = ANY(?)", the error is
vertica_python.errors.DatabaseError: Severity: ERROR,
Message: string_to_array parse error: Output array is not big enough:
Type limit is 65000 bytes, but value needs 1600000 bytes,
Sqlstate: 22023, Routine: StringToArray,
File: /data/jenkins/workspace/RE-ReleaseBuilds/RE-Knuckleboom_2/server/vertica/EE/Functions/Arrays.cpp,
Line: 41, Error Code: 9329
ANY will consider the list as an array and it's limited by DefaultArrayBinarySize
The maximum binary size, in bytes, for an unbounded collection, if a maximum size is not specified at creation time.
Default: 65000
For the second sample
For f"DELETE FROM MYTABLE WHERE uid IN ({uids_str})" in your second sample, you should call Cursor.fetchall()
to reveal the error. (See the second topic in Q&A)
From Vertica, very large queries can fail if MaxParsedQuerySizeMB is reached. Check your setting:
select * from configuration_parameters where parameter_name='MaxParsedQuerySizeMB';
temporary table
The best solution might be to create a temp table with all the values in their in list (you can COPY or INSERT), then do DELETE FROM MYTABLE WHERE uid in (select uid from temp_table)
.
from vertica-python.
@sitingren thank you very much, I was not aware of these all parameters! I'll check them out!
I think it makes sense to avoid the creation of the temporary table, but we need to benchmark it
from vertica-python.
Reopen if you have more questions.
from vertica-python.
Related Issues (20)
- Connection Loggers Leaking Memory
- Inserting complex data types HOT 3
- 'utf-8' codec can't decode byte HOT 10
- Support client_os_hostname
- Errors in Invalid SQL should be warnings HOT 4
- Unable to connect to vertica in fips mode HOT 3
- SSL dh key too small error HOT 18
- doesn't support behind proxy HOT 4
- Client removes columns from output if they are not aliased HOT 4
- database doesn't return protocol version
- BUG 1.3.3 version - copy from local broken HOT 5
- calling cursor.nextset() after cursor.fetchone() can cause query hang HOT 1
- Python version support HOT 6
- Unexpected message: RowDescription HOT 7
- Make default connection label meaningful HOT 3
- Connector crashes when querying a json map or json array values HOT 1
- Query returns something other than what query asked for - possible error message not parsed? HOT 1
- Return transaction_id/statement_id result from PROFILE HOT 3
- String values are concatenated somewhere HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vertica-python.