Git Product home page Git Product logo

Comments (13)

vl85 avatar vl85 commented on July 17, 2024 8

Connections and cursors are closed automatically:

CPython implementation detail: CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. See the documentation of the gc module for information on controlling the collection of cyclic garbage. Other implementations act differently and CPython may change. Do not depend on immediate finalization of objects when they become unreachable (so you should always close files explicitly).
See https://docs.python.org/3/reference/datamodel.html#index-2
So RAII is not available as an option.

I would rather see connections/cursors closed by context manager and commits are invoked explicitly than connections/cursors not closed and commits made implicitly. Main expectation from context managers is resource management and current design fails to manage connection/cursor. I've read that one might see it as a transaction context, not connection/cursors' one. But that doesn't make sense. One can look at cursor as a transaction, but not at connection. It also doesn't make sense to commit transaction on error.
For select queries it doesn't do anything useful.

If there is an issue with backward compatibility, then a separate method could be used for that matter. Like sconnect. And scursor. Meaning safe or secure.

from pyodbc.

scholer avatar scholer commented on July 17, 2024 7

I'd also like to voice in on the unexpected behavior of pyodbc's context manager not closing connections.

import pandas as pd, pyodbc
with pyodbc.connect(connection_string) as con:
    test_df = pd.read_sql_query(query, con)

con.closed  # False. pyodbc connection is not closed when leaving the context.

@mkleehammer I understand why you were opposed to adding a context manager in the first place, and I agree that adding this context manager certainly adds confusion.

You mentioned that in your typical work, you just depend on the connection to go out of scope and then relying invocation of the connection's destructor method to close the connection. I want to add that, if the connection object is created as part of a long-running job, then the connection object will not go out of scope and the connection will remain open. An example of this would be if the connection is created within a Jupyter notebook, which are very popular among data scientists, who also use SQL a lot.

Example: A data scientist writing this in his notebook might assume he is safe to use this in his notebook to retrieve data from his SQL database:

import pandas as pd, pyodbc
with pyodbc.connect(connection_string) as con:
    test_df = pd.read_sql_query(query, con)

# (the data scientist continues working with the test_df results and forgets `con`, assuming the context manager will naturally close it, just like context managers typically do). 
# (the notebook keeps running and the connection is kept open until the server closes it. 😯)
# (the database administrators get super angry/annoyed with our data scientist because he makes a lot of long-lived connections and never closes them. 😡)

Honestly, the expected behavior for a context manager is that the context manager will take care of closing connections and handles, leaving the system in a "safe" state when the context is closed. Otherwise, what is the point of a context manager?

@mkleehammer would you ever be open to adding another context manager to pyodbc that actually closes the connection when leaving the context? I know it is fairly trivial for each user to implement, but it just adds one more custom function or class that us data science users will have to import (or copy/paste) into our notebooks whenever we want to use an sql connection. And again, many users of pyodbc might not be aware that context-managed connections are not actually being closed when leaving the context.

Example of user-implemented context manager:

from contextlib import contextmanager

@contextmanager
def connect(connection_string):
    con = pyodbc.connect(connection_string)
    try:
        yield con
    finally:
        con.close()

# Example usage:
with connect(connection_string) as con:
    test_df = pd.read_sql_query(cmsobjectmodel_query, con)
    
con.closed  # True. Connection is closed when leaving the context.

For the record, SQL-Alchemy does close context-managed connections when leaving the context:

import sqlalchemy, pandas as pd

engine = sqlalchemy.create_engine(connection_string)
with engine.connect() as con:
    test_df = pd.read_sql_query(query, con)

con.closed  # True. SQLAlchemy closes context-managed connections when leaving the context.

However, adding SQLAlchemy to a project's dependencies, just to get a "sane" connection context manager seems a bit bloated.

With all that said, I want to just say a quick thank you for your work on pyodbc. 🙏 I've been using it a lot and it works great! (context-manager quirks aside) 😄

from pyodbc.

mkleehammer avatar mkleehammer commented on July 17, 2024 6

I think the behavior needs to stay as it is. As you pointed out, it would be consistent with other database libraries. It would also be consistent with early copies of PEP 343. Note the example does exactly what everyone is doing.

I want to reiterate that I feel they are completely unnecessary. I use pyodbc in multiple 24x7 servers in the financial industry and have never used a context manager with it, or with anything else like files. I gave examples of how you could write code without a context manager which is much clearer. (And that is what I meant about people getting confused - not the action of the context manager itself, but the fact that they are completely unnecessary in C Python today.)

For example, this code from your post would never be written that way:

with file('/tmp/myfile', 'w') as f:
    f.write('hello\n')

You would just do this:

open('/tmp/myfile', 'w').write('hello\n')

You'll see code like this all the time. (I know your example was shortened and there are usually more lines written, but stick with me.) Objects in Python are deleted automatically and the all "do the right thing" when they are deleted. Files close. Sockets close. Database connections rollback and close.

In a longer function with more lines, it would look like this:

def f():
    fd = open('/tmp/myfile', 'w')
    fd.write('1\n')
    fd.write('2\n')

The file will be closed as soon as the function is exited, either normally at the bottom or if an exception occurs.

The only time it could be useful is in a loop where you catch exceptions. In that case you can eliminate a finally clause:

def f():
    while x > y:
        try:
            data = wait_for_data()
            with fd as open('filename', 'w'):
                fd.write(data + '\n')
       except:
           logger.error('Error occurred', exc_info=True)

In this case fd does get closed before the next iteration calls wait_for_data.

In my servers connections are primarily used in two patterns:

In "task loops", a connection is created and used by each task in the queue. When an error occurs, the error is logged and the connection is closed as part of the cleanup. A new one is allocated and the process continues. This type of code is too complex for a with since there is a lot of code. There aren't just a couple of lines to indent.

The second type is a function allocates a connection, uses it, then commits, just like the examples from my previous post. These are very clean and easy. Connections and cursors are closed automatically:

def somefunction():
    cursor = pyodbc.connect('mydsn').cursor()
    cursor.execute("delete something")
    cursor.commit()

I recommend giving this a try in a couple of places and see if it works for you. It is one of the things I greatly prefer about Python over Javascript and Java. Both of those languages can use context managers. (C++ and C# support object types on the stack that run destructors, so a simple "wrapper" object would be just as easy.)

from pyodbc.

keitherskine avatar keitherskine commented on July 17, 2024 1

I think I'm finally starting to understand what you mean about not needing context managers. As a database guy dealing with terabytes of data, and single database transactions that involve updating multiple table with multiple SQL statements, which in turn add gigabytes to the database transaction log, database transactions are kinda important to me. The data integrity of my database depends on it. Yet database transactions are not explicitly part of PEP 249. Sure, they are implicitly there, with pyodbc.connect() and cnxn.commit()/close(), but there is no explicit PEP 249 equivalent of:

BEGIN TRANSACTION

  UPDATE TABLE T1 SET ...
  UPDATE TABLE T2 SET ...
  UPDATE TABLE T3 SET ...

END TRANSACTION

My assumption was that context managers would help manage this kind of behavior. Apparently this isn't the case.

I was also concerned that cursors would maintain locks on tables or rows unless they were explicitly closed, and I didn't fully comprehend that cursors would definitely be closed (and the locks released) when they left the scope. Again, locking of table rows is a big deal for me.

Thank you for taking the time to give comprehensive answers to my questions, Michael. It's definitely appreciated.

from pyodbc.

snargleplax avatar snargleplax commented on July 17, 2024 1

I'll add my voice here, after dealing with the same issue this week. I was surprised and misled by this behavior, as were multiple other veteran engineers on my team. The revelation by @vl85 about immediate GC finalization not being reliable appears to unrecoverably dismantle the justification provided up to that point for rejecting this suggestion.

I expect context managers to work like RAII. The hazard of reasonable assumptions creating a production incident due to misleading behavior is a far greater concern than whatever benefit is perceived to accrue from the "convenience" of using it to autocommit without autocommit. Too much sugar rots your teeth. I find design precedent from other Python ODBC managers unpersuasive, and would apply the same objection there.

from pyodbc.

mkleehammer avatar mkleehammer commented on July 17, 2024

I was actually opposed to adding the context at all because it just confuses things. If I understand correctly, the functionality you are looking for can be accomplished without it like this:

def somefunction():
    cnxn = pyodbc.connect('mydsn')
    cursor = cnxn.cursor()
    cursor.execute("delete something")
    cnxn.commit()

If everything is successful, the code reaches the bottom and commits. If an exception occurs, both cursor and cnxn go out of scope and are immediately deleted. Each will close when they are deleted and closing the connection will roll back the transaction.

C Python uses both a reference counting scheme plus a cycle-detecting garbage collector so items that go out of scope and aren't referenced elsewhere are immediately deleted, unlike garbage collection in Java, Javascript, or lisp. There is no way this is going to be changed any time soon and an enormous amount of code takes advantage of that. (It is also what keeps Python programs from using so much memory, but it is slower than standard garbage collection.) I was careful to design the objects so cursors refer to connections but not vice versa so there are no cycles.

With autocommit turned on, each statement is committed as it occurs, so there is no reason to use a context manager.

Fun fact: the cursor has a reference to the connection and a commit method specifically so you can use a cursor without needing to manage the connection object also:

def somefunction():
    cursor = pyodbc.connect('mydsn').cursor()
    cursor.execute("delete something")
    cursor.commit()

from pyodbc.

keitherskine avatar keitherskine commented on July 17, 2024

I certainly agree that context managers can cause confusion because the functionality is deliberately hidden away behind the syntactic sugar. Hence, the onus should be on making the context manager functionality as simple and intuitive as possible. Right now that doesn't appear to be the case, especially to me. From what I can make out, it seems the current functionality is simply a way of getting around setting the autocommit flag to True. In effect, use a context manager and all your SQL statements are going to be committed, regardless of whether you set the autocommit flag or not. Although this is convenient for programmers in many cases, it is not at all obvious.

As I mentioned in my original comment, who would intuitively think that:

with pyodbc.connect('mydsn') as cnxn:
    do_stuff

actually means:

cnxn = pyodbc.connect('mydsn')
do_stuff
if not cnxn.autocommit:
    cnxn.commit()

Not me. Even when it's spelled out, it seems odd, and very different from the way context managers work with files, etc.

At the very least, it might be worth putting this in the docs.

from pyodbc.

ramprax avatar ramprax commented on July 17, 2024

Hi,

I know this is an old thread. Just wanted to say that the detailed answer was very helpful for me.
Thanks a lot.

I was worried that the connection may get closed on calling __exit__(). This would create problems while implementing connection pooling.

For implementing my connection pool, I wrote a class PooledConnection as a wrapper around pyodbc connection. The PooledConnection's __enter__() & __exit__() methods call the __enter__() & __exit__() methods of the underlying pyodbc connection.

Now, I do want/need all the things that pyodbc connection's __exit__() method does, but definitely do not want it to close the connection.
In the PooledConnection.__exit__(), after calling pyobc connection's __exit__(), I would just mark the connection as free and "return" it to the pool.

So, basically, I count on __exit__() not closing the connection.

Thanks,
Ram

from pyodbc.

jxu avatar jxu commented on July 17, 2024

Example: A data scientist writing this in his notebook might assume he is safe to use this in his notebook to retrieve data from his SQL database:

import pandas as pd, pyodbc
with pyodbc.connect(connection_string) as con:
    test_df = pd.read_sql_query(query, con)

# (the data scientist continues working with the test_df results and forgets `con`, assuming the context manager will naturally close it, just like context managers typically do). 
# (the notebook keeps running and the connection is kept open until the server closes it. 😯)
# (the database administrators get super angry/annoyed with our data scientist because he makes a lot of long-lived connection

I ran into almost this exact issue, with my explicit con.close() not being called properly in a notebook cell and the angry DB admins.

from pyodbc.

mkleehammer avatar mkleehammer commented on July 17, 2024

I am open to both adding a new context manager or even changing the behavior of the old one if possible without breaking everything.

I will say, this is the first time I've seen a very good example of when one could be useful - Jupyter Notebooks. In the past, the only way one could be open forever is if it is a global variable or you have a function you literally never leave. In those cases, I don't think it should surprise someone if it is still connected. In a sense, Jupyter notebooks are nothing but a script where everything is a global variable. Therefore decisions that make sense in a larger program might not make sense.

That said, I would not characterize GC finalization as being unreliable. I've purposely ensured the internals don't create cycles the GC system can see for this reason. Unless you create a chain by adding attributes to a connection that somehow point back to the same connection, it should be safe if you are sticking with CPython.

Now, what options are there?

  • Break compatibility in a later version, but provide a flag when connecting to restore the old (current) behavior?
  • Create two new functions safe_connect and safe_cursor or sconnect and scursor? These would have to set a flag on the objects to know how to behave, so we would also have a flag you could set, I guess.
  • Provide a separate object in the pyodbc namespace that is a context manager? I don't like this as it doesn't match expectations.

Anything else? Any votes?

from pyodbc.

keitherskine avatar keitherskine commented on July 17, 2024

The basic question seems to be whether the context manager on the Connection class should either manage the connection (i.e. close it on exit) or manage a database transaction (i.e. commit on exit). Some people think the former, some people think the latter. Full disclosure, I'm very much in the former camp, which I think is more Pythonic. The current implementation of the context manager does the latter.

Ref:
https://realpython.com/python-with-statement/#managing-resources-in-python
https://book.pythontips.com/en/latest/context_managers.html#context-managers
https://docs.python.org/3/library/contextlib.html#contextlib.contextmanager

I agree it's a bit of a pain to add a custom context manager in code specifically to manage a pyodbc connection but I just wanted to remind folks that there is a built-in context manager utility contextlib.closing that essentially does the job of closing the connection on exit, for example:

from contextlib import closing
import pyodbc
with closing(pyodbc.connect('mydsn', autocommit=False)) as cnxn:
    crsr = cnxn.cursor()
    crsr.execute("UPDATE T1 SET ...")
    cnxn.commit()
cnxn.closed  # True

from pyodbc.

jxu avatar jxu commented on July 17, 2024

The basic question seems to be whether the context manager on the Connection class should either manage the connection (i.e. close it on exit) or manage a database transaction (i.e. commit on exit).

Well it is called pyodbc.connect and not say pyodbc.transact

from pyodbc.

keitherskine avatar keitherskine commented on July 17, 2024

Just a thought, but one possibility is to add a context manager to pyodbc specifically to encapsulate a database transaction. Something equivalent to:

from contextlib import contextmanager

@contextmanager
def transaction(*args, **kwargs) -> Cursor:
    if kwargs.get('autocommit', False) is not False:
        raise ValueError('Cannot set autocommit in the transaction context manager')

    cnxn = connect(*args, **kwargs)
    crsr = cnxn.cursor()
    try:
        yield crsr
        cnxn.commit()
    finally:
        crsr.close()
        cnxn.close()  # includes a rollback under the hood

The parameters for this context manager would be the same as for the connect() function. It would be used as follows:

with pyodbc.transaction('mydsn') as crsr:
    crsr.execute("UPDATE T1 SET ...")
    crsr.execute("UPDATE T2 SET ...")

This would at least make it clear the context is a database transaction.

from pyodbc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.