Comments (13)
You can control the buffer size yourself when working with blobs. Through Blob.setBytes/getBytes you provide your own buffer. The same goes for InputStream/OutputStream, whatever size buffers you pass to read/write get written straight to the server. In both cases loread/lowrite are used, which I am pretty sure are the same functions used in psql (lo_import/lo_export is a special case in the libpq client that ends up using loread/lowrite).
Try sending really large buffers, see how that affects performance and report back.
from pgjdbc-ng.
I'm experiencing that it takes much longer to write data to a Blob than it takes uploading the same data to a web-form using POST and wonder if there is something obvious I can do to optimize.
I will test different sizes of buffers. Thanks
from pgjdbc-ng.
For reference: Copying a 410MB file went from 76 000 ms to 51 000 ms by using a 1024_1024-size byte[] instead of default (1024_4). I'm using org.apache.commons.io.IOUtils.copyLarge to copy the stream so it hasn't got anything to do with the byte-buffer settings in PGBlob-related classes.
Still, 51s. is quite long for writing 410MB I think, but it's maybe out of the driver-writers hands to fix?
from pgjdbc-ng.
Pardon me butting in. I'm not fully up to speed on the pgsql-ng codebase, but it looks to me like each write() on the Blob is calling connection.executeForFirstResultValue()
which calls executeForFirstResult()
which calls executeForFirstResultBatch()
which calls BasicContext.queryBatch()
, which ultimately prepares the "select lowrite($1,$2)" call, binds the parameters, and then executes.
Does it really need to be prepared on every write? I would image this is causing a large amount of overhead that can be avoided if the already created PrepareCommand
object is reused on every write (of the same blob) -- simply binding new parameters before executing.
Maybe caching PrepareCommand
objects based on the queryTxt
in BasicContext.queryResults()
. I remember contributing a similar optimization to the jTDS driver for MS SQL about 9 years ago, where they were continually preparing statements to get the metadata for binding. Caching these objects on a short-term basis provided huge speed-ups (like several hundred percent).
from pgjdbc-ng.
@brettwooldridge This is good insight and these queries can definitely be cached. Although, he tested using a 1MB buffer which means there is only 410 prepared calls. Prepares of simple queries takes milliseconds so this shouldn't add more than a second or two at most (unless there is something very weird with these queries).
@andreak To save me from debugging a non-issue... can you try it with a 50 or 100MB buffer? Obviously only for testing purposes but this should narrow down the problem area.
from pgjdbc-ng.
Hm, with 100MB buffer the time spent is up at 73s, going back down to 51s when setting to 1MB.
from pgjdbc-ng.
I did some tests on my own...
Using localhost for client and server, my psql times were very comparable with my driver times. I then ran a "select lo_import('...')" query; which runs the import entirely inside the server (file must be accessible by the server). I get the exact nearly the exact same times as psql and driver.
I encourage you to try this yourself and confirm it but it seems this is an issue with how fast postgres updates blobs.
Note: I forgot to mention... I was using a 1MB buffer for driver test. Also, my driver times were a bit slower at first (only a couple of seconds). I did switch it to using prepared queries to get the times to match psql.
from pgjdbc-ng.
As a hack, I put caching in the PrepareCommand
class, to avoid the overhead of the additional prepare calls, and the difference was basically nil.
(Not that this invalidates the idea of a cache in PrepareCommand
in general, some extremely complex SQL can actually take minutes for a database to prepare. But it does rule out the overhead for simple cases like "select lowrite($1,$2)")
from pgjdbc-ng.
In 6e71e22, I changed LargeObject to use prepared queries. This was facilitated by a core change that allows any code to prepare a "utility" query and use that later.
As @brettwooldridge mentioned I don't believe this will have any real net effect on loading large objects but it was low hanging fruit so I added it.
from pgjdbc-ng.
So... I did more testing...
I am now positive the problem you are experiencing is with Postgres itself. Blobs get split up and stored in separate rows of the pg_largeobject table and use a block size of 2kb (read about it here http://www.postgresql.org/docs/9.3/static/catalog-pg-largeobject.html). This results in 230400 rows being inserted into the database for a 450MB blob; this is the issue!
I then tried inserting a 450MB file into a single "bytea" field of a row. It took less than half the time! 22 seconds down from around 47 seconds.
It looks like a daunting task but you should try to increase the "LOBLKSIZE" of your Postgres installation. Even then it might not speed it up that much. Postgres's scheme optimizes for random read/write at the expense of initial storage speed.
Toast fields can be ~4GB I believe but you have to read/write them as one atomic item. The driver streams send data; so updating from a file is not a memory concern. Reading the data back will be an issue. The way the server sends data will require it to be in memory all at once.
from pgjdbc-ng.
Wow, surprising and somewhat disappointing. More disappointing is that LOBLKSIZE can only be changed by recompiling PostgreSQL, which makes deploying a product with simple 'yum' dependency on impossible. And equally disappointing is that LOBLKSIZE is not independent, but is calculated from BLCKSIZE which affects all tables in the system. So if you want a target LOBLKSIZE of 16KB, you're going to be looking at a simple block size for all tables of 64Kb.
from pgjdbc-ng.
I agree somewhat but really for Blobs of this size save a URL and stick the file on the file system, S3 or someplace similar. Better yet... get creative and use a FDW to store it in one of those places.
Either way... I am closing this for now as I think the PGBlob functions as fast as it can at the moment.
from pgjdbc-ng.
Nice analysis, thanks. I need my Blobs to participate in transactions, and to stream both in and out (so no bytea), so storing on external files is no option. Hopefully PG will improve over time in this area.
from pgjdbc-ng.
Related Issues (20)
- Avoid use of prepared statement and still use placeholders
- Allow underscore _ in the host name of a postgres database HOT 1
- Issue with RDS IAM Role Authentication
- update protocol properties at the time of connection creation
- What's the status and direction of the project? HOT 17
- Does it support MATCH_RECOGNIZE? HOT 4
- SCRAM channel binding check failed HOT 3
- Connection leak when timing out connection attempts
- Transfer to PostgreSQL Organization HOT 12
- Committer approval for transfer of copyright HOT 29
- Prepared statement already exists
- Parameter Parsing fails on concat operator in SQLText#parse
- Version upgrades HOT 1
- Execution of executeUpdate Closes Previously Acquired ResultSet
- NullPointerException on Calling getMoreResults
- Syntax Error with Statement.RETURN_GENERATED_KEYS
- Inconsistency between rs.getType() and stmt.getResultSetType()
- Inconsistent Handling of Invalid setFetchDirection Input between ResultSet and Statement
- ResultSet.beforeFirst() Unexpectedly Succeeds on TYPE_FORWARD_ONLY ResultSet
- Execution of executeUpdate Closes Previously Acquired ResultSet HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pgjdbc-ng.