Git Product home page Git Product logo

pycassa's Introduction

pycassa

Build Status

pycassa is a Thrift-based python client library for Apache Cassandra

pycassa does not support CQL or Cassandra's native protocol, which are a replacement for the Thrift interface that pycassa is based on. If you are starting a new project, it is highly recommended that you use the newer DataStax python driver instead of pycassa.

pycassa is open source under the MIT license.

Documentation

Documentation can be found here:

http://pycassa.github.com/pycassa/

It includes installation instructions, a tutorial, API documentation, and a change log.

Getting Help

IRC:

Mailing List:

Installation

If pip is available, you can install the lastest pycassa release with:

pip install pycassa

If you want to install from a source checkout, make sure you have Thrift installed, and run setup.py as a superuser:

pip install thrift
python setup.py install

Basic Usage

To get a connection pool, pass a Keyspace and an optional list of servers:

>>> import pycassa
>>> pool = pycassa.ConnectionPool('Keyspace1') # Defaults to connecting to the server at 'localhost:9160'
>>>
>>> # or, we can specify our servers:
>>> pool = pycassa.ConnectionPool('Keyspace1', server_list=['192.168.2.10'])

To use the standard interface, create a ColumnFamily instance.

>>> pool = pycassa.ConnectionPool('Keyspace1')
>>> cf = pycassa.ColumnFamily(pool, 'Standard1')
>>> cf.insert('foo', {'column1': 'val1'})
>>> cf.get('foo')
{'column1': 'val1'}

insert() will also update existing columns:

>>> cf.insert('foo', {'column1': 'val2'})
>>> cf.get('foo')
{'column1': 'val2'}

You may insert multiple columns at once:

>>> cf.insert('bar', {'column1': 'val3', 'column2': 'val4'})
>>> cf.multiget(['foo', 'bar'])
{'foo': {'column1': 'val2'}, 'bar': {'column1': 'val3', 'column2': 'val4'}}
>>> cf.get_count('bar')
2

get_range() returns an iterable. You can use list() to convert it to a list:

>>> list(cf.get_range())
[('bar', {'column1': 'val3', 'column2': 'val4'}), ('foo', {'column1': 'val2'})]
>>> list(cf.get_range(row_count=1))
[('bar', {'column1': 'val3', 'column2': 'val4'})]

You can remove entire keys or just a certain column:

>>> cf.remove('bar', columns=['column1'])
>>> cf.get('bar')
{'column2': 'val4'}
>>> cf.remove('bar')
>>> cf.get('bar')
Traceback (most recent call last):
...
pycassa.NotFoundException: NotFoundException()

See the tutorial for more details.

pycassa's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pycassa's Issues

Few questions

  1. Is there any ORM using pycassa? I would like to use cassandra with django and pycassa seems to give enough low-level interface.
  2. New feature of Cassandra 0.7 is secondary indexes created by Cassandra itself (without having to support them manually). Does pycassa support it?
    Thanks a lot for answers.

import pycassa

Traceback (most recent call last):
File "", line 1, in
File "pycassa/init.py", line 4, in
from pycassa.columnfamily import *
File "pycassa/columnfamily.py", line 3, in
pkg_resources.require('Thrift')
File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/pkg_resources.py", line 620, in require
needed = self.resolve(parse_requirements(requirements))
File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/pkg_resources.py", line 518, in resolve
raise DistributionNotFound(req) # XXX put more info here
pkg_resources.DistributionNotFound: Thrift

create_index doesn't seem to work

Hello,

in pycassa 1.0.3, I'm using the create_index method of system_manager. It doesn't seem to have any effect, even though other methods of the same class do. I check with cassandra-cli, or with describe_column_family, and the index is not created. However, I can create it in cassandra-cli, and it is duly displayed in both describe and cli. TIA.

ColumnFamily.get() NotFoundException is ambiguous

A NotFoundException is raised when:

  1. A key is requested that does not exist
  2. The column slice requested is empty
  3. column_count is 0
  4. All of the columns passed in the 'columns' argument do not exist.

This makes the exception cause ambiguous. get() should only raise a NotFoundException in cases where Cassandra itself raises a NotFoundException. Namely, cases 1 and 4.

Strange error when creating a pool

I spent time looking into that but I just can't get through, please help:

Python 2.6.6 (r266:84292, Apr 29 2011, 11:49:08)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import pycassa
import thrift
hosts = ['cassandra01:9160','cassandra02:9160','cassandra03:9160']
pool = pycassa.ConnectionPool('PANDA', hosts, max_retries=10, timeout=10.0, pool_timeout=10)
Traceback (most recent call last):
File "", line 1, in
File "/home/test/lib/python2.6/site-packages/pycassa-1.0.8-py2.6.egg/pycassa/pool.py", line 622, in init
self._q.put(self._create_connection(), False)
File "/home/test/lib/python2.6/site-packages/pycassa-1.0.8-py2.6.egg/pycassa/pool.py", line 118, in _create_connection
wrapper = self._get_new_wrapper(server)
File "/home/test/lib/python2.6/site-packages/pycassa-1.0.8-py2.6.egg/pycassa/pool.py", line 652, in _get_new_wrapper
credentials=self.credentials)
File "/home/test/lib/python2.6/site-packages/pycassa-1.0.8-py2.6.egg/pycassa/pool.py", line 313, in init
super(ConnectionWrapper, self).init(_args, *_kwargs)
File "/home/test/lib/python2.6/site-packages/pycassa-1.0.8-py2.6.egg/pycassa/connection.py", line 50, in init
self.set_keyspace(keyspace)
File "/home/test/lib/python2.6/site-packages/pycassa-1.0.8-py2.6.egg/pycassa/connection.py", line 58, in set_keyspace
if not self.keyspace or keyspace != self.keyspace:
AttributeError: 'ConnectionWrapper' object has no attribute 'keyspace'

TypeError: get() takes exactly 4 arguments (1 given)

This error crops up using both the git pull, and the v1.0.0 release of pycassa with cassandra 0.7 rc1

here is a traceback:

  1. Traceback (most recent call last):
  2. File "/home/tomfarvour/NetBeansProjects/PythonCassandraTest/src/pythoncassandratest.py", line 137, in
  3. main()
    
  4. File "/home/tomfarvour/NetBeansProjects/PythonCassandraTest/src/pythoncassandratest.py", line 128, in main
  5. col_fam = create_column_family(pycassa, connection, ksName, cfName)
    
  6. File "/home/tomfarvour/NetBeansProjects/PythonCassandraTest/src/pythoncassandratest.py", line 47, in create_column_family
  7. return pyc_o.ColumnFamily(pyc_conn, cfName)
    
  8. File "/usr/home/tomfarvour/NetBeansProjects/PythonCassandraTest/src/pycassa/columnfamily.py", line 115, in init
  9. self.client = self.pool.get()
    
  10. File "/usr/home/tomfarvour/NetBeansProjects/PythonCassandraTest/src/pycassa/pool.py", line 402, in new_f
  11. result = getattr(super(ConnectionWrapper, self), f.**name**)(_args, *_kwargs)
    
  12. TypeError: get() takes exactly 4 arguments (1 given)

Thrift errors leak

TypeError: expected string or Unicode object, int found

no apparent way to figure out which column is causing the problem

alter_column for super column families broken?

It looks like alter_column for supercolumn families is broken in 1.0.6 (but not 1.0.4). I create a supercolumn family with a time uuid comparator and a UTF8 subcomparator. When I alter_column('keyspace', 'column_family', 'foo_column', pycassa.UTF8_TYPE), pycassa complains that 'foo_column is not valid for a time uuid.

Will try to provide more detail and sample code tonight.

OverflowError: mktime argument out of range

This exception (see the title) cause when I try to save a date as pycassa date object (pycassa.DateTime()), the date came form a date of birth value ie: 1956-05-03. So my question is how do I avoid this issue?

error in test in python 2.5.4 ('with' statement will be reserved in 2.6)

...........:1: Warning: 'with' will become a reserved keyword in Python 2.6
E..........S..S........................S...................
ERROR: test_contextmgr (tests.test_batch_mutation.TestMutator)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/me/Downloads/pycassa/tests/test_batch_mutation.py", line 98, in test_contextmgr
    assert cf.get('3') == ROWS['3']"""
  File "", line 1
    with cf.batch(queue_size=2) as b:
          ^
SyntaxError: invalid syntax

----------------------------------------------------------------------
Ran 70 tests in 45.069s

FAILED (SKIP=3, errors=1)

"from future import with_statement" should fix it

Python 2.5 Multithreading Issues

#
# Test for verifying that pycassa works consistently when reads and writes are done in QUORUM consistency level
#
# NOTE: this was made, because test setup with one node with trunk cassandra and pycassa 1.0.6 
#       failed when python-2.5 interpreter was used. When interpreter was changed to 2.6 everything 
#       started to work as expected. So basically pycassa 1.0.6 has bug with multithreading when it
#       is used with python-2.5 (or my python 2.5 installation is buggy) 
#
# @author Mikael Lepisto
#

import pycassa
import unittest
import datetime
import uuid
from threading import Thread
from multiprocessing import Pool

import logging
log = pycassa.PycassaLogger()
log.set_logger_name('pycassa_library')
log.set_logger_level('warn')
log.get_logger().addHandler(logging.StreamHandler())

cf_name = "ThreadingTest"

DISABLE_MULTIPROCESS_TEST = False
PROCESS_COUNT = 10
MULTIPROCESS_THREAD_COUNT = 5
SINGLE_PROCESS_THREAD_COUNT = 50
RUN_TIME = 5

keyspace = "test_keyspace"
servers = ['localhost']

class TestThread(Thread):

    def __init__(self):
        super(TestThread,self).__init__()
        sys = pycassa.system_manager.SystemManager('localhost')
        connection_pool = pycassa.pool.ConnectionPool(keyspace, server_list=servers, 
                                                      credentials=None, timeout=0.5, use_threadlocal=True, 
                                                      pool_size=5, max_overflow=0, prefill=True, pool_timeout=30, 
                                                      recycle=10000, max_retries=5, listeners=[], logging_name=None, 
                                                      framed_transport=True)

        cf = None
        try:
            cf = pycassa.columnfamily.ColumnFamily(connection_pool, cf_name, autopack_names=False, autopack_values=False)
        except pycassa.cassandra.ttypes.NotFoundException:
            sys.create_column_family(keyspace, cf_name, comparator_type=pycassa.system_manager.UTF8_TYPE)
            cf = pycassa.columnfamily.ColumnFamily(connection_pool, cf_name, autopack_names=False, autopack_values=False)
        self.cf = cf
        self.write_success = 0
        self.write_fail = 0
        self.read_found = 0
        self.read_not_found = 0

    def run(self):
        until = datetime.datetime.now() + datetime.timedelta(seconds=RUN_TIME)
        while datetime.datetime.now() < until:
            new_key = str(uuid.uuid1())
            try:
                self.cf.insert(new_key, {'test_value' : '1'}, 
                               write_consistency_level=pycassa.cassandra.ttypes.ConsistencyLevel.QUORUM)
                self.write_success += 1
            except Exception, e:
                self.write_fail += 1
                raise e

            written_value = None
            try:
                written_value = self.cf.get(new_key, 
                                            read_consistency_level=pycassa.cassandra.ttypes.ConsistencyLevel.QUORUM)
                self.read_found += 1
            except pycassa.cassandra.ttypes.NotFoundException,e:
                self.read_not_found += 1

            if written_value:
                self.cf.remove(new_key, 
                               write_consistency_level=pycassa.cassandra.ttypes.ConsistencyLevel.QUORUM)

def run_multithread(args):
    pid = args['pid']
    thread_count = args['thread_count']

    # start threads
    threads = []
    for i in range(thread_count):
        test = TestThread()
        test.start()
        threads.append(test)

    # wait to complete and collect results
    write_success = 0
    write_fail = 0
    read_found = 0
    read_not_found = 0

    for test in threads:
        test.join()
        write_success += test.write_success
        write_fail += test.write_fail
        read_found += test.read_found
        read_not_found += test.read_not_found

    ret_val = "-------- %s threads: %i ----------\n" % (pid, thread_count)
    ret_val += "\n".join(["%s : %s" % (key,value) for key,value in [('write_success', write_success),
                                                                    ('read_found', read_found),
                                                                    ('write_fail', write_fail),
                                                                    ('read_not_found', read_not_found)]])
    failed = False
    if write_success != read_found:
        ret_val += "\nWriteError: Some results were not found after they were written to db."
        failed = True

    return pid,ret_val,failed


class ThreadingTest(unittest.TestCase):

    def test_with_multiple_threads(self):        
        pid,result,failed = run_multithread(dict(pid="SingleProcess", 
                                                 thread_count=SINGLE_PROCESS_THREAD_COUNT))
        print result
        assert not failed, "There were errors during threading."

    def test_with_multiple_process(self):

        if DISABLE_MULTIPROCESS_TEST:
            return

        p = Pool(PROCESS_COUNT)
        results = p.map(run_multithread, 
                        [dict(thread_count=MULTIPROCESS_THREAD_COUNT, 
                              pid="Process-%i" % i) for i in range(PROCESS_COUNT)])

        failed_processes = 0
        for pid,result,failed in results:
            print result
            if failed: failed_processes += 1

        assert failed_processes == 0,"There was failures in %i processes" % failed_processes

Strange scaling when adding nodes

Hello, pycassa 1.04 vs a cluster of three machines -- the read latency as reported by cfstats becomes much worse when I grow the cluster from one to three. Is there a possibility that read consistency level was set to all? I didn't do it, but can there be some side effect I'm not aware of?

Thanks,
Maxim

Mandatory keyspace per connection breaks system commands

Not a big issue, but if we'd like to add system commands to pycassa, the current model of hardwiring a session to a particular keyspace will cause some issues.

To examplify, connecting to a freshly installed Cassandra 0.7, there is no keyspace initially.

cassandra.ttypes issue

I am trying to setup Twissandra. Everything seems to work until I try to create the keyspace for Cassandra. I run 'python manage.py sync_cassandra' and it returns the following error.

ImportError: No module named cassandra.ttypes

Any ideas?

Pycassa expects integers to be stored in a different format from cassandra-cli

If you set a value through the cassandra-cli for a column with IntegerType as the validation class like this:

set CFName['key']['value']=1;

Then reading it back through pycassa results in:

[...application stack...]
File "/Library/Python/2.6/site-packages/pycassa-1.0.4-py2.6.egg/pycassa/columnfamily.py", line 496, in get_indexed_slices
key_slice.columns, include_timestamp))
File "/Library/Python/2.6/site-packages/pycassa-1.0.4-py2.6.egg/pycassa/columnfamily.py", line 179, in _convert_ColumnOrSuperColumns_to_dict_class
    ret[self._unpack_name(col.name)] = self._convert_Column_to_base(col, include_timestamp)
File "/Library/Python/2.6/site-packages/pycassa-1.0.4-py2.6.egg/pycassa/columnfamily.py", line 160, in _convert_Column_to_base
    value = self._unpack_value(column.value, column.name)
File "/Library/Python/2.6/site-packages/pycassa-1.0.4-py2.6.egg/pycassa/columnfamily.py", line 293, in _unpack_value
    return self._unpack(value, self._get_data_type_for_col(col_name))
File "/Library/Python/2.6/site-packages/pycassa-1.0.4-py2.6.egg/pycassa/columnfamily.py", line 341, in _unpack
    return _int_packer.unpack(b)[0]
error: unpack requires a string argument of length 4

The problem is that pycassa is expecting '\u0000\u0000\u0000\u0000' and is getting '\u0001' instead.

Cassandra version is 0.7.3 and pycassa 1.0.4

column_reversed order incorrect!

CfDef(keyspace, cf_name, column_type='Standard', comparator_type='LongType')

client= connect(keyspace)
cf= ColumnFamily(client, cf_name)
for i in range(1,100):
cf.insert(key=keyname, columns={int(time.time()*1e6): str(i)})

result= cf.get(key=keyname, column_count=3, column_reversed=True, column_start='')

result is not reversing the order

Cassandra 0.7, pycassa 0.4.0

super column get_indexed_slices bug?

Having problems with the following:
expr = create_index_expression('lastName', 'Smith')
clause = create_index_clause([expr])
list(columnFamily.get_indexed_slices(clause, super_column='userData'))
[]

Works fine when not a super column family...

Currently running:
pycassa 1.0.4
python 2.4.3
cassandra 0.7
thrift 0.5.0

syntax error in 1.0.7?

ERROR: Failure: SyntaxError (invalid syntax (pool.py, line 418))
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/nose-1.0.0-py2.5.egg/nose/loader.py", line 390, in loadTestsFromName
    addr.filename, addr.module)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/nose-1.0.0-py2.5.egg/nose/importer.py", line 39, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/nose-1.0.0-py2.5.egg/nose/importer.py", line 86, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/Users/me/Downloads/pycassa/tests/__init__.py", line 1, in 
    from pycassa.system_manager import *
  File "/Users/me/Downloads/pycassa/pycassa/__init__.py", line 5, in 
    from pycassa.pool import *
  File "/Users/me/Downloads/pycassa/pycassa/pool.py", line 418
    return new_f(self, *args, reset=True, **kwargs)
                                  ^
SyntaxError: invalid syntax

----------------------------------------------------------------------

import error pycassa module

I'm getting this error when I try to import pycassa from python interceptor
Traceback (most recent call last):
File "", line 1, in
File "pycassa/init.py", line 4, in
from pycassa.columnfamily import *
File "pycassa/columnfamily.py", line 1, in
import pkg_resources
ImportError: No module named pkg_resources

Both in Redhat linux and Windows XP

P:S pycassa version--: pycassa-0.2.0

Make connection pool more robust w.r.t connection problems

I'm running pycassa from master, commit 7821aa4...

The connection pool does not handle network failure as well as it could. Specifically, pulling a node from under the connection pool in a way that results in Broken Pipe or TTransportException (due to TSocket problems) is not handled.

How to reproduce

One way to reproduce this problem is to set up a small cluster that requires QUORUM for writes, point a pool to this cluster, then shut down cassandra instances that would result in a write failure. There perhaps exist an easier way to do this.

Problem in detail

I've seen two exceptions, one for TTransportException and one for broken pipe. Here they are (there are some variations depending on what requests is performed after the nodes were put down):

(Note that the line numbers may be slightly off due to debug printing when investigating this problem)

   File "/var/lib/python-support/python2.5/project/storage.py", line 130, in write_single
     cf = pycassa.ColumnFamily(self.client, self.COLUMN_FAMILY_NAME)
   File "/var/lib/python-support/python2.5/pycassa/columnfamily.py", line 116, in __init__
     col_fam = self.client.get_keyspace_description(use_dict_for_col_metadata=True)[self.column_family]
   File "/var/lib/python-support/python2.5/pycassa/pool.py", line 484, in get_keyspace_description
     ks_def = self.describe_keyspace(keyspace)
   File "/var/lib/python-support/python2.5/pycassa/cassandra/Cassandra.py", line 1075, in describe_keyspace
     return self.recv_describe_keyspace()
   File "/var/lib/python-support/python2.5/pycassa/cassandra/Cassandra.py", line 1086, in recv_describe_keyspace
     (fname, mtype, rseqid) = self._iprot.readMessageBegin()
   File "/usr/lib/python2.5/site-packages/thrift/protocol/TBinaryProtocol.py", line 126, in readMessageBegin
     sz = self.readI32()
   File "/usr/lib/python2.5/site-packages/thrift/protocol/TBinaryProtocol.py", line 203, in readI32
     buff = self.trans.readAll(4)
   File "/usr/lib/python2.5/site-packages/thrift/transport/TTransport.py", line 58, in readAll
     chunk = self.read(sz-have)
   File "/usr/lib/python2.5/site-packages/thrift/transport/TTransport.py", line 272, in read
     self.readFrame()
   File "/usr/lib/python2.5/site-packages/thrift/transport/TTransport.py", line 276, in readFrame
     buff = self.__trans.readAll(4)
   File "/usr/lib/python2.5/site-packages/thrift/transport/TTransport.py", line 58, in readAll
     chunk = self.read(sz-have)
   File "/usr/lib/python2.5/site-packages/thrift/transport/TSocket.py", line 108, in read
     raise TTransportException(type=TTransportException.END_OF_FILE, message='TSocket read 0 bytes') TTransportException: TSocket read 0 bytes

And here's broken pipe:

   File "/var/lib/python-support/python2.5/project/storage.py", line 130, in write_single
     cf = pycassa.ColumnFamily(self.client, self.COLUMN_FAMILY_NAME)
   File "/var/lib/python-support/python2.5/pycassa/columnfamily.py", line 116, in __init__
     col_fam = self.client.get_keyspace_description(use_dict_for_col_metadata=True)[self.column_family]
   File "/var/lib/python-support/python2.5/pycassa/pool.py", line 484, in get_keyspace_description
      ks_def = self.describe_keyspace(keyspace)
   File "/var/lib/python-support/python2.5/pycassa/cassandra/Cassandra.py", line 1074, in describe_keyspace
      self.send_describe_keyspace(keyspace)
   File "/var/lib/python-support/python2.5/pycassa/cassandra/Cassandra.py", line 1083, in send_describe_keyspace
     self._oprot.trans.flush()
   File "/usr/lib/python2.5/site-packages/thrift/transport/TTransport.py", line 293, in flush
     self.__trans.write(buf)
   File "/usr/lib/python2.5/site-packages/thrift/transport/TSocket.py", line 117, in write
     plus = self.handle.send(buff)
   File "/usr/lib/python2.5/site-packages/gevent/socket.py", line 458, in send
     return sock.send(data, flags) error: (32, 'Broken pipe')

I've looked at the pool and it seems to me there are two versions of this problem:

  • Requests in general have the ConnectionWrapper::retry decorator that will close a broken connection and replace it. This is a desirable behaviour for broken pipe. However, the decorator does only handle TimedOutException, UnavailableException, when we may want to handle for example Thrift.TException, socket.error, IOError.
  • ConnectionWrapper::get_keyspace_description in particular has no retry logic and if there's a connection problem there it will fail. On broken pipe, the socket will be returned to the pool, and may cause arbitrary errors when the connection is reused later.

A workaround

Here is a small patch that works around the problem. I'm not advocating that this is a proper fix, rather it's meant to highlight the problem experienced. What the workaround does is make retry handle socket errors and arbitrary thrift exceptions. Then there's an ugly hack so we can use _retry on get_keyspace_description.

diff --git a/pycassa/pool.py b/pycassa/pool.py
index f84f649..e36da1d 100644
--- a/pycassa/pool.py
+++ b/pycassa/pool.py
@@ -11,6 +11,7 @@ import weakref, time, threading, random

 import connection
 import queue as pool_queue
+import socket
 from logging.pool_logger import PoolLogger
 from util import as_interface
 from cassandra.ttypes import TimedOutException, UnavailableException
@@ -399,10 +400,14 @@ class ConnectionWrapper(connection.Connection):
         def new_f(self, *args, **kwargs):
             self.operation_count += 1
             try:
-                result = getattr(super(ConnectionWrapper, self), f.__name__)(*args, **kwargs)
+                try:
+                    result = getattr(super(ConnectionWrapper, self), f.__name__)(*args, **kwargs)
+                except AttributeError:
+                    # Hack for get_keyspace_description
+                    result = f(self, *args, **kwargs)
                 self._retry_count = 0 # reset the count after a success
                 return result
-            except (TimedOutException, UnavailableException), exc:
+            except (TimedOutException, UnavailableException, Thrift.TException, socket.error, IOError), exc:
                 self._pool._notify_on_failure(exc, server=self.server,
                                               connection=self)

@@ -424,6 +429,7 @@ class ConnectionWrapper(connection.Connection):
                     self._pool._replace_wrapper()
                 self._replace(self._pool.get())
                 return new_f(self, *args, **kwargs)
+
         new_f.__name__ = f.__name__
         return new_f

@@ -466,7 +472,7 @@ class ConnectionWrapper(connection.Connection):
     def truncate(self, *args, **kwargs):
         pass

-
+    @_retry
     def get_keyspace_description(self, keyspace=None, use_dict_for_col_metadata=False):
         """
         Describes the given keyspace.

Cheers,
Björn

sudo python setup.py install

I get the following error when running the command in the title for pycassa:

byte-compiling /usr/lib/python2.4/site-packages/pycassa/connection.py to connection.pyc
File "/usr/lib/python2.4/site-packages/pycassa/connection.py", line 125
self._logins = logins if logins is not None else {}
^

SyntaxError: invalid syntax

Please help!

connection.py __getattr__ retry smashes the stack

In connection.py,

  def __getattr__(self, attr):
    def _client_call(*args, **kwargs):

On UnavailableException, TimedOutException, _client_call will recursively be retried.

This will eventually smash the stack, and you will get a RuntimeError.

A suggestion is to either raise and let the application handle retries, or have some recursion depth limit, perhaps with a sleep between retries.

ColumnFamilyMap.remove with SuperColumn Error

seems the key work argument should be super_column instead of column
return self.column_family.remove(instance.key, super_column=instance.super_column)
and the key word arguments better renamed to columns(with s) instead of column, to be
consist with ColumnFamily

if comparator = 'BytesType' and subcomparator = 'LongType'

create column family CFTEST with column_type = 'Super' and comparator = 'BytesType' and subcomparator = 'LongType' and memtable_throughput = 32;

CFTEST.insert('KEY', {'lowercase': {1: 1, 2: 2, 3: 3}})
then

CFTEST.get('KEY', super_column='lowercase', column_start=1 )
=> A str or unicode column name was expected

CFTEST.get('KEY', super_column='lowercase', column_start='1' )
=> InvalidRequestException(why='Expected 8 or 0 byte long (16)')

There is no NE inex clause in Pycassa

Hello,
sometimes we need to query a large table for error codes resulting from production jobs. Often, I want to select non-zero values (and corresponding errors) only. However, there is no 'NE' (non-equal) clause to be created. I understand that this may be due to the nature of secondary index, but then again, maybe it's just an oversight. If you look into that, this will be much appreciated. Thanks!

I can emulate it using existing clauses, but it would be cleaner natively.

Maxim

Can we have "describe cf" back in System Manager?

Greetings,

I really miss the describe column functionality that was refactored out of SM some time ago. I now run two scripts with different python path to do stuff -- I don't want to load pycassa shell just for that. What was the reason for the removal? One would think that the manager already does have quite similar functionality.

Thank you,
Maxim

Cassandra Up and Running but NoServerAvailable Exception with Pycassa

pycassa 0.3.0
cassandra 0.6.3

/usr/src# apache-cassandra-0.6.3/bin/cassandra-cli
Welcome to cassandra CLI.

Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
cassandra> connect localhost/9160
Connected to: "Test Cluster" on localhost/9160
cassandra>

import pycassa
client = pycassa.connect('Keyspace1', timeout=3.5)
cf = pycassa.ColumnFamily(client, 'Standard1')
cf.insert('foo', {'column1': 'val1'})
/usr/lib/python2.5/site-packages/pycassa/columnfamily.pyc in insert(self, key, c     olumns, write_consistency_level)
    330                 cols.append(Mutation(column_or_supercolumn=ColumnOrSuper     Column(column=column)))
    331         self.client.batch_mutate({key: {self.column_family: cols}},
  --> 332                                  self._wcl(write_consistency_level))
    333         return clock.timestamp
    334

/usr/lib/python2.5/site-packages/pycassa/connection.pyc in client_call(*args, **     kwargs)
    129         def client_call(*args, **kwargs):
    130             if self._client is None:
--> 131                 self._find_server()
    132             try:
    133                 return getattr(self._client, attr)(*args, **kwargs)

/usr/lib/python2.5/site-packages/pycassa/connection.pyc in _find_server(self)
    167                 continue
    168         self._client = None
--> 169         raise NoServerAvailable()
    170
    171 class ThreadLocalConnection(object):

NoServerAvailable:

Inserting/Retrieving Values for Columns with LongType Comparators Doesn't Also Work as Expected

Following the blog post here: http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes

I created a column family through the CLI:

[default@unknown] create keyspace demo;  
ad80bf08-41fa-11e0-b4ef-e700f669bcfc
[default@unknown] use demo;
Authenticated to keyspace: demo
[default@demo] create column family users with comparator=UTF8Type                       
... and column_metadata=[{column_name: full_name, validation_class: UTF8Type},
... {column_name: birth_date, validation_class: LongType, index_type: KEYS}]; 
be542659-41fa-11e0-b4ef-e700f669bcfc
[default@demo] set users[bsanderson][full_name] = 'Brandon Sanderson';
Value inserted.
[default@demo] set users[bsanderson][birth_date] = 1975;
Value inserted.
[default@demo] get users['bsanderson'];
=> (column=birth_date, value=, timestamp=1298760619049000)
=> (column=full_name, value=Brandon Sanderson, timestamp=1298760612881000)
Returned 2 results.
[default@demo] 

Next, I did the following with pycassa:

import pycassa
pool = pycassa.connect('demo', ['localhost:9160'])
cf = pycassa.ColumnFamily(pool, 'users')
cf.insert('odoe', {'full_name': 'John Doe'})
cf.insert('odoe', {'birth_date': 1999})
res = cf.get('odoe')

Running the above gives:

Traceback (most recent call last):
  File "bug.py", line 9, in <module>
    res = cf.get('odoe')
  File "/home/posulliv/repos/pycassa/pycassa/columnfamily.py", line 344, in get
    return self._convert_ColumnOrSuperColumns_to_dict_class(list_col_or_super,     include_timestamp)
  File "/home/posulliv/repos/pycassa/pycassa/columnfamily.py", line 149, in     _convert_ColumnOrSuperColumns_to_dict_class
    ret[self._unpack_name(col.name)] = self._convert_Column_to_base(col, include_timestamp)
  File "/home/posulliv/repos/pycassa/pycassa/columnfamily.py", line 130, in _convert_Column_to_base
    value = self._unpack_value(column.value, column.name)
  File "/home/posulliv/repos/pycassa/pycassa/columnfamily.py", line 264, in _unpack_value
    return util.unpack(value, self._get_data_type_for_col(col_name))
  File "/home/posulliv/repos/pycassa/pycassa/util.py", line 176, in unpack
return _long_packer.unpack(byte_array)[0]
struct.error: unpack requires a string argument of length 8

And if I go back to the CLI:

[default@demo] get users['odoe'];
=> (column=birth_date, value=, timestamp=1298760813075848)
=> (column=full_name, value=John Doe, timestamp=1298760813074461)
Returned 2 results.
[default@demo]

However, if I change the insert in python to be:

cf.insert('odoe', {'full_name': 'John Doe', 'birth_date': 1983})

It works fine:

OrderedDict([(u'birth_date', 1983), (u'full_name', u'John Doe')])

And also through the CLI:

[default@demo] get users['jdoe'];
=> (column=birth_date, value=1983, timestamp=1298760786582058)
=> (column=full_name, value=John Doe, timestamp=1298760786582058)
Returned 2 results.
[default@demo]

Untitled

create_column_family() seems have broken and misleading parameter descriptions:
like: "or float key_cache_size"
or:
key_cache_save_in_seconds instead of row_cache_save_period_in_seconds
and:
row_cache_save_in_seconds instead of key_cache_save_period_in_seconds

Column Family error

When I am executing following command:
cf = pycassa.ColumnFamily(client, 'Standard2')
I get this error message - http://pastie.org/1107607. The keyspace I have connected to definitely exists. I checked this through cassandra-cli: http://pastie.org/1107610.
I have also tried
client.describe_version()
And same error occured

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.