Git Product home page Git Product logo

clickhouse-driver's Introduction

ClickHouse Python Driver

image

image

image

image

image

image

ClickHouse Python Driver with native (TCP) interface support.

Asynchronous wrapper is available here: https://github.com/mymarilyn/aioch

Features

  • External data for query processing.
  • Query settings.
  • Compression support.
  • TLS support.
  • Types support:
    • Float32/64
    • [U]Int8/16/32/64/128/256
    • Date/Date32/DateTime('timezone')/DateTime64('timezone')
    • String/FixedString(N)
    • Enum8/16
    • Array(T)
    • Nullable(T)
    • Bool
    • UUID
    • Decimal
    • IPv4/IPv6
    • LowCardinality(T)
    • SimpleAggregateFunction(F, T)
    • Tuple(T1, T2, ...)
    • Nested
    • Map(key, value)
  • Query progress information.
  • Block by block results streaming.
  • Reading query profile info.
  • Receiving server logs.
  • Multiple hosts support.
  • Python DB API 2.0 specification support.
  • Optional NumPy arrays support.

Documentation

Documentation is available at https://clickhouse-driver.readthedocs.io.

Usage

There are two ways to communicate with server:

  • using pure Client;
  • using DB API.

Pure Client example:

>>> from clickhouse_driver import Client
>>>
>>> client = Client('localhost')
>>>
>>> client.execute('SHOW TABLES')
[('test',)]
>>> client.execute('DROP TABLE IF EXISTS test')
[]
>>> client.execute('CREATE TABLE test (x Int32) ENGINE = Memory')
[]
>>> client.execute(
...     'INSERT INTO test (x) VALUES',
...     [{'x': 100}]
... )
1
>>> client.execute('INSERT INTO test (x) VALUES', [[200]])
1
>>> client.execute(
...     'INSERT INTO test (x) '
...     'SELECT * FROM system.numbers LIMIT %(limit)s',
...     {'limit': 3}
... )
[]
>>> client.execute('SELECT sum(x) FROM test')
[(303,)]

DB API example:

>>> from clickhouse_driver import connect
>>>
>>> conn = connect('clickhouse://localhost')
>>> cursor = conn.cursor()
>>>
>>> cursor.execute('SHOW TABLES')
>>> cursor.fetchall()
[('test',)]
>>> cursor.execute('DROP TABLE IF EXISTS test')
>>> cursor.fetchall()
[]
>>> cursor.execute('CREATE TABLE test (x Int32) ENGINE = Memory')
>>> cursor.fetchall()
[]
>>> cursor.executemany(
...     'INSERT INTO test (x) VALUES',
...     [{'x': 100}]
... )
>>> cursor.rowcount
1
>>> cursor.executemany('INSERT INTO test (x) VALUES', [[200]])
>>> cursor.rowcount
1
>>> cursor.execute(
...     'INSERT INTO test (x) '
...     'SELECT * FROM system.numbers LIMIT %(limit)s',
...     {'limit': 3}
... )
>>> cursor.rowcount
0
>>> cursor.execute('SELECT sum(x) FROM test')
>>> cursor.fetchall()
[(303,)]

License

ClickHouse Python Driver is distributed under the MIT license.

clickhouse-driver's People

Contributors

0x4ec7 avatar achilleash avatar adamleko avatar alex-hofsteede avatar asottile-sentry avatar azat avatar b1naryth1ef avatar berdoc avatar cclauss avatar dependabot[bot] avatar dourvaris avatar ghazi-git avatar igorbb avatar insomnes avatar joelynch avatar kszucs avatar mohammad7t avatar nnseva avatar odidev avatar pulina avatar risicle avatar segv avatar shotinleg avatar smagellan avatar vanzi avatar vitalik-svt avatar vivienm avatar wawaka avatar xzkostyan avatar ym avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clickhouse-driver's Issues

AttributeError when receiving ServerPacketTypes.PROFILE_INFO

After updating to 0.0.16 the following error appears:

/opt/conda/envs/ibis-dev-3.6/lib/python3.6/site-packages/clickhouse_driver/client.py:222: in process_ordinary_query
    columnar=columnar)
/opt/conda/envs/ibis-dev-3.6/lib/python3.6/site-packages/clickhouse_driver/client.py:57: in receive_result
    return result.get_result()
/opt/conda/envs/ibis-dev-3.6/lib/python3.6/site-packages/clickhouse_driver/result.py:39: in get_result
    for packet in self.packet_generator:
/opt/conda/envs/ibis-dev-3.6/lib/python3.6/site-packages/clickhouse_driver/client.py:69: in packet_generator
    packet = self.receive_packet()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <clickhouse_driver.client.Client object at 0x7ff6b95f0cc0>

    def receive_packet(self):
        packet = self.connection.receive_packet()
    
        if packet.type == ServerPacketTypes.EXCEPTION:
            raise packet.exception
    
        elif packet.type == ServerPacketTypes.PROGRESS:
            return packet
    
        elif packet.type == ServerPacketTypes.END_OF_STREAM:
            return False
    
        elif packet.type == ServerPacketTypes.DATA:
            return packet
    
        elif packet.type == ServerPacketTypes.TOTALS:
            return packet
    
        elif packet.type == ServerPacketTypes.EXTREMES:
            return packet
    
        elif packet.type == ServerPacketTypes.PROFILE_INFO:
>           self.last_query.store_profile(packet)
E           AttributeError: 'NoneType' object has no attribute 'store_profile'

see https://circleci.com/gh/cpcloud/ibis/14423

Insert Buffer?

The documentation recommends to insert chunks of at least 1000 items at once. Is there an easy way to add some sort of buffer to gather inserts and thus enhance to performance? Of course, one could use a simple list and append to it/insert if it reaches a specific size, but things get more complicated when someone uses multiprocessing etc.

Request: Client configuration file support

Request the the driver be able to load settings from a client configuration file similar to the clickhouse-client. e.g.

~/.clickhouse-client/config.xml

<config>
    <user>default</user>
    <password>my_password</password>
</config>

This will remove the need to store credentials in code.

Timeout error

Hello!

I'm getting a timeout error when trying to fetch results from clickhouse using this package on windows.
Any ideas how I can fix it?

from clickhouse_driver.client import Client
print('Import Client - Ok', Client)

client = Client('***-clickhouse.***', 8123, , database='test')
print('Get instance of Client - Ok', client)

result = client.execute('select 1')
print('Get results - Ok', result)

Output:

Import Client - Ok <class 'clickhouse_driver.client.Client'>
Get instance of Client - Ok <clickhouse_driver.client.Client object at 0x0000000004B95C88>
---------------------------------------------------------------------------
timeout                                   Traceback (most recent call last)
C:\Users\username\AppData\Local\Continuum\Anaconda3\lib\site-packages\clickhouse_driver\connection.py in connect(self)
    126             self.send_hello()
--> 127             self.receive_hello()
    128 

C:\Users\username\AppData\Local\Continuum\Anaconda3\lib\site-packages\clickhouse_driver\connection.py in receive_hello(self)
    175     def receive_hello(self):
--> 176         packet_type = read_varint(self.fin)
    177 

C:\Users\username\AppData\Local\Continuum\Anaconda3\lib\site-packages\clickhouse_driver\reader.py in read_varint(f)
     45     while True:
---> 46         i = _read_one(f)
     47         result |= (i & 0x7f) << shift

C:\Users\username\AppData\Local\Continuum\Anaconda3\lib\site-packages\clickhouse_driver\reader.py in _read_one(f)
     30 def _read_one(f):
---> 31     c = f.read(1)
     32     if c == '':

C:\Users\username\AppData\Local\Continuum\Anaconda3\lib\socket.py in readinto(self, b)
    585             try:
--> 586                 return self._sock.recv_into(b)
    587             except timeout:

timeout: timed out

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-7-73d5ad8d95e2> in <module>()
      5 print('Get instance of Client - Ok', client)
      6 
----> 7 result = client.execute('select 1')
      8 print('Get results - Ok', result)

C:\Users\username\AppData\Local\Continuum\Anaconda3\lib\site-packages\clickhouse_driver\client.py in execute(self, query, params, with_column_types, external_tables, query_id, settings)
     58     def execute(self, query, params=None, with_column_types=False,
     59                 external_tables=None, query_id=None, settings=None):
---> 60         self.connection.force_connect()
     61 
     62         try:

C:\Users\username\AppData\Local\Continuum\Anaconda3\lib\site-packages\clickhouse_driver\connection.py in force_connect(self)
     97 
     98         if not self.connected:
---> 99             self.connect()
    100 
    101         elif not self.ping():

C:\Users\username\AppData\Local\Continuum\Anaconda3\lib\site-packages\clickhouse_driver\connection.py in connect(self)
    133             self.disconnect()
    134             raise errors.SocketTimeoutError(
--> 135                 '{} ({})'.format(e.message, self.get_description())
    136             )
    137 

AttributeError: 'timeout' object has no attribute 'message'

Ps.
I checked this connection credentials via jdbc-driver

SELECT INTO OUTFILE

current version of driver ignore select into outfile?
there are no difference between queries
select * from log LIMIT 10000 INTO OUTFILE '/var/tmp/test123.csv' FORMAT TabSeparated
and
select * from log LIMIT 10000

clickhouse-driver version 0.0.16

Column names

Hi,

I can execute:

client.execute('SELECT * FROM test3')

or

client.execute('SELECT a, b, c FROM test3')

and I will get list of tuples (one tuple for each row), but I don't have column names.

Is it somehow possible to get all columns names (named tuple, or something) with the data itself, in order to simplify further data manipulation?

For example, after getting result set from client.execute('SELECT * FROM test3'), I have to manually find out all column names and than provide that information to the pandas dataframe.

Regards.

Error writing string while using supserset and clickhouse driver (text encode not set to UTF-8)

Hi,

Something new is happening while adding a clickhouse db to supserset at the time superset is trying to get table meta data.

Traceback (most recent call last):
  File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/superset/views/core.py", line 1764, in testconn
    return json_success(json.dumps(engine.table_names(), indent=4))
  File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 2139, in table_names
    return self.dialect.get_table_names(conn, schema)
  File "<string>", line 2, in get_table_names
  File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/sqlalchemy/engine/reflection.py", line 42, in cache
    return fn(self, con, *args, **kw)
  File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/clickhouse_sqlalchemy/drivers/base.py", line 482, in get_table_names
    return [row.name for row in self._execute(connection, 'SHOW TABLES')]
  File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/clickhouse_sqlalchemy/drivers/native/base.py", line 34, in _execute
    return connection.execute(sql)
  File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 942, in execute
    return self._execute_text(object, multiparams, params)
  File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1104, in _execute_text
    statement, parameters
  File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1416, in _handle_dbapi_exception
    util.reraise(*exc_info)
  File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 249, in reraise
    raise value
  File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/clickhouse_sqlalchemy/drivers/base.py", line 492, in do_execute
    cursor.execute(statement, parameters, context=context)
  File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/clickhouse_sqlalchemy/drivers/native/connector.py", line 125, in execute
    external_tables=external_tables, settings=settings
  File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/clickhouse_driver/client.py", line 102, in execute
    self.connection.force_connect()
  File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/clickhouse_driver/connection.py", line 124, in force_connect
    self.connect()
  File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/clickhouse_driver/connection.py", line 189, in connect
    self.send_hello()
  File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/clickhouse_driver/connection.py", line 260, in send_hello
    write_binary_str(self.password, self.fout)
  File "/home/matthieu/python-venv/super-modular/lib/python3.5/site-packages/clickhouse_driver/writer.py", line 18, in write_binary_str
    text = text.encode('utf-8')
AttributeError: 'NoneType' object has no attribute 'encode'
2018-09-25 21:09:57,639:INFO:werkzeug:127.0.0.1 - - [25/Sep/2018 21:09:57] "POST /superset/testconn HTTP/1.1" 500 -

I suspect that something has changed in a new version of the driver and that now it is checking that there is no none UTF-8 bytes in the string before storing it ?

How do you suggest me to fix that ? Is it something to be set on string sent by superset or is it something on clickhouse driver ?

Thanks Matthieu.

insert data into Date column face error

Exception:
unsupported operand type(s) for -: 'str' and 'datetime.date'

when i try insert into Datetime format column, it's success.

how to fix this problem? thanks.

Tuple support

I can't return unique combinations of two columns (e.g. src and dst).
Executing a query with SELECT DISTINCT(field1, field2) causes the following error when returning data from Clickhouse:

Traceback (most recent call last):
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/columns/service.py", line 56, in get_column_by_spec
cls = column_by_type[spec]
KeyError: 'Tuple(String, String)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./cronapp.py", line 71, in
run(period=period)
File "./cronapp.py", line 52, in run
data = get_conn_list(period=period)
File "./cronapp.py", line 21, in get_conn_list
return client.execute(query)
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/client.py", line 176, in execute
types_check=types_check, columnar=columnar
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/client.py", line 227, in process_ordinary_query
columnar=columnar)
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/client.py", line 75, in receive_result
self.receive_no_progress_result(result, columnar)
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/client.py", line 102, in receive_no_progress_result
packet = self.receive_packet()
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/client.py", line 133, in receive_packet
packet = self.connection.receive_packet()
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 331, in receive_packet
packet.block = self.receive_data()
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 394, in receive_data
block = self.block_in.read()
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/streams/native.py", line 77, in read
column = read_column(column_type, n_rows, self.fin)
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/columns/service.py", line 64, in read_column
column = get_column_by_spec(column_spec)
File "/home/netbug/.local/lib/python3.6/site-packages/clickhouse_driver/columns/service.py", line 60, in get_column_by_spec
raise errors.UnknownTypeError('Unknown type {}'.format(e.args[0]))
clickhouse_driver.errors.UnknownTypeError: Code: 50. Unknown type Tuple(String, String)```

When using domain instead of IP address, the DNS resolve don't change.

I use domain instead of IP address, in order to do load balance and failover.

But when I change the domain mapping, the script still continuing connect to the origin IP address.

For example, I use domain 'xxx.test.com' which mapping ip1, and ip2, and start the script, the data will write into ip1 and ip2 round-robin.

After the mapping of domain to ip2 and ip3, that means replace ip1 with ip3, the script still continues to write to ip1.

Conda forge feedstock

Hi!

I'm working on clickhouse backend for ibis.
Ibis is installable from both pip and conda. I'd like to use clickhouse-driver, but currently clickhouse-driver and clickhouse-cityhash don't have conda packages.

I've already created the recipes, but conda forge packages require maintainers. Would You please create the feedstocks?

Here are the recipes: https://github.com/kszucs/clickhouse-conda

Set a query ID

Clickhouse allows to set an ID on queries, you can see this IDs with a:

select query_id from system.processes;

this is useful when you want to cancel a query from an external process as explain here: https://stackoverflow.com/questions/40546983/how-to-kill-a-process-query-in-clickhouse

I propose that the connection.send_query() method accepts a string argument to name the query. It could be used in client.process_*_query() functions and execute() as an optional argument:

import uuid
c = client('server')
query_id = str(uuid.uuid4())
c.execute("select something", queryid=query_id)

Multithreaded client

I've had some issues with using clickhouse_driver in a multithreaded asyncio environment. The connection seems not to be thread safe. I'm not sure that is the best approach to solve the issues I've encountered, but here is a pooled connection implementation I am using:

import multiprocessing
import asyncio
import logging
from functools import partial
from concurrent.futures import ThreadPoolExecutor

from clickhouse_driver import Client
import threading

class ThreadSafeClient:
    def __init__(self, host, compression):
        self._host = host
        self._compression = compression
        self._loop = asyncio.get_event_loop()
        self._pool = {}

    async def get_conn(self):
        tid = threading.get_ident()
        logging.info(f"{tid} - get connection")

        if tid not in self._pool:
            self._pool[tid] = Client(self._host, compression=self._compression, client_name=f"mlcoin-api-{tid}")
            logging.info(f"{tid} - new connection created")

        return self._pool[tid]

    async def execute(self, statement, values=None, types_check=True):
        def execute_in_pool(statement, values, types_check):
            tid = threading.get_ident()

            if tid not in self._pool:
                self._pool[tid] = Client(self._host, compression=self._compression, client_name=f"mlcoin-api-{tid}")
                logging.info(f"{tid} - new connection created")

            conn = self._pool[tid]
            return conn.execute(statement, values, types_check=types_check)

        execute_stmt = partial(execute_in_pool, types_check=True)
        result = await self._loop.run_in_executor(
            None,
            execute_stmt,
            statement,
            values)

        return result

enhancement - ability to omit values when using JSONEachRow

clickhouse-driver==0.0.10
ClickHouse server version 1.1.54362

Clickhouse JSONEachRow format supports omitting values when inserting records; it would be great if clickhouse-driver would also support this behavior - http://clickhouse-docs.readthedocs.io/en/latest/formats/jsoneachrow.html

It is also possible to omit values in which case the default value of the column is inserted.

However, clickhouse-driver instead throws an exception:

record = {'x': 1}
client.execute('DROP TABLE IF EXISTS test')
client.execute('CREATE TABLE test (x Int32, y Int32) ENGINE=Memory')
client.execute('INSERT INTO test FORMAT JSONEachRow', [record])
print(client.execute('SELECT * FROM test'))
Traceback (most recent call last):
  File "test.py", line 89, in <module>
    client.execute('INSERT INTO test FORMAT JSONEachRow', [record])
  File "/anaconda3/envs/myenv/lib/python2.7/site-packages/clickhouse_driver/client.py", line 169, in execute
    types_check=types_check
  File "/anaconda3/envs/myenv/lib/python2.7/site-packages/clickhouse_driver/client.py", line 239, in process_insert_query
    self.send_data(sample_block, data, types_check=types_check)
  File "/anaconda3/envs/myenv/lib/python2.7/site-packages/clickhouse_driver/client.py", line 261, in send_data
    types_check=types_check)
  File "/anaconda3/envs/myenv/lib/python2.7/site-packages/clickhouse_driver/block.py", line 51, in __init__
    self.dicts_to_rows(data)
  File "/anaconda3/envs/myenv/lib/python2.7/site-packages/clickhouse_driver/block.py", line 70, in dicts_to_rows
    self.data[i] = [row[name] for name in column_names]
KeyError: u'y'

Working example using clickhouse-client:

$ clickhouse-client -n --query='DROP TABLE IF EXISTS test; CREATE TABLE test (x Int32, y Int32) ENGINE=Memory; INSERT INTO test FORMAT JSONEachRow' <<< '{"x": 1}'
$ clickhouse-client -E --query='SELECT * FROM test'
Row 1:
──────
x: 1
y: 0

I was able to get past the KeyError in the Traceback above by changing

self.data[i] = [row[name] for name in column_names]
to:

self.data[i] = [row[name] for name in column_names if name in row]

I get to the point of sending the data to the server but get the following error, which I unfortunately don't have the time to dig into right now to come up with a possible PR/solution:

  File "/anaconda3/envs/myenv/lib/python2.7/site-packages/clickhouse_driver/streams/native.py", line 37, in write
    raise ValueError('Different rows length')
ValueError: Different rows length

This would be a great feature so that JSON objects don't always have to be fully specified in the client code!

Iterator support

Is there any way that the return of a SELECT query would be a row iterator instead of loading the whole query result into memory. Thank you!

execute wait long time

[root@lyc tmp]# python
Python 2.6.6 (r266:84292, Aug 18 2016, 15:13:37)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

from clickhouse_driver import Client
client = Client('192.168.133.2',18123,'client_report','admin','************',insert_block_size=1)
client.execute('SHOW TABLES')
^CTraceback (most recent call last):
File "", line 1, in
File "/usr/lib/python2.6/site-packages/clickhouse_driver/client.py", line 159, in execute
self.connection.force_connect()
File "/usr/lib/python2.6/site-packages/clickhouse_driver/connection.py", line 122, in force_connect
self.connect()
File "/usr/lib/python2.6/site-packages/clickhouse_driver/connection.py", line 188, in connect
self.receive_hello()
File "/usr/lib/python2.6/site-packages/clickhouse_driver/connection.py", line 263, in receive_hello
packet_type = read_varint(self.fin)
File "/usr/lib/python2.6/site-packages/clickhouse_driver/reader.py", line 29, in read_varint
i = _read_one(f)
File "/usr/lib/python2.6/site-packages/clickhouse_driver/reader.py", line 14, in _read_one
c = f.read(1)
File "/usr/lib64/python2.6/socket.py", line 383, in read
data = self._sock.recv(left)
KeyboardInterrupt

Error if enum key is blank

How to reproduce

Create table with enum field and insert a few rows

DROP TABLE IF EXISTS events;
CREATE TABLE events (
    time DateTime('UTC'),
    project Enum8('' = 0, 'first' = 1),
    text String
) ENGINE = MergeTree() ORDER BY (time) PARTITION BY toYYYYMM(time);

INSERT INTO events VALUES(now(), 'first', 'descr1');
INSERT INTO events VALUES(now(), '', 'descr2');

Try to select with python driver:

import clickhouse_driver
client = clickhouse_driver.Client(host='127.0.0.1')
print(client.execute("SELECT * FROM events"))
Traceback (most recent call last):
  File "/tmp/driver-test.py", line 4, in <module>
    print(client.execute("SELECT * FROM events"))
  File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/client.py", line 189, in execute
    types_check=types_check, columnar=columnar
  File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/client.py", line 246, in process_ordinary_query
    columnar=columnar)
  File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/client.py", line 82, in receive_result
    self.receive_no_progress_result(result, columnar)
  File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/client.py", line 109, in receive_no_progress_result
    packet = self.receive_packet()
  File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/client.py", line 140, in receive_packet
    packet = self.connection.receive_packet()
  File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/connection.py", line 334, in receive_packet
    packet.block = self.receive_data()
  File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/connection.py", line 393, in receive_data
    block = self.block_in.read()
  File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/streams/native.py", line 80, in read
    self.fin)
  File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/columns/service.py", line 67, in read_column
    column = get_column_by_spec(column_spec, column_options=column_options)
  File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/columns/service.py", line 45, in get_column_by_spec
    return create_enum_column(spec, column_options)
  File "/Users/name/pyenv/lib/python2.7/site-packages/clickhouse_driver/columns/enumcolumn.py", line 67, in create_enum_column
    return cls(Enum(cls.ch_type, d), **column_options)
  File "/Users/name/pyenv/lib/python2.7/site-packages/enum/__init__.py", line 349, in __call__
    return cls._create_(value, names, module=module, type=type, start=start)
  File "/Users/name/pyenv/lib/python2.7/site-packages/enum/__init__.py", line 462, in _create_
    classdict[member_name] = member_value
  File "/Users/name/pyenv/lib/python2.7/site-packages/enum/__init__.py", line 125, in __setitem__
    if _is_sunder(key):
  File "/Users/name/pyenv/lib/python2.7/site-packages/enum/__init__.py", line 80, in _is_sunder
    return (name[0] == name[-1] == '_' and
IndexError: string index out of range

Extended support of IPv4 and IPv6 column types

Hi,

I am afraid to come with a form of specific request here. I opened this issue last week on Yandex/clickhouse repository: ClickHouse/ClickHouse#2605. It was about issues with support of specific column and types to store IPv4 and IPv6 data.

I didn't get any sort of positive answer from them... at least short term.

I was wondering if it could make sense to develop a form of additional types in your code such that column named IPv4_... or IPv6_... benefits from a specific behaviour. the idea would be to apply conversion function in you code and introduce those column as type INET (similar to postGRESQL) : https://www.postgresql.org/docs/9.1/static/datatype-net-types.html.

By having this feature, This type could be handled by upper layers like your SQLAlchemy driver and related UI and Frontend.

Does it sounds like something doable / acceptable ?

Thanks Matthieu.

Big INSERT ends in timeout

Hi,

I have an issue with a big insert query > 2000 cols.
Finally it runs with native clickhouse client.
After some config changes runs in 0.5 seconds

Now I tried to run the same query with the clickhouse-driver.
It times out.

Before I had the same script working with pymysql.

Any idea how to fix this?

Traceback (most recent call last):
File "Click_v01.py", line 154, in
client.execute(sql)
File "/usr/lib/python3.4/site-packages/clickhouse_driver/client.py", line 73, in execute
query_id=query_id, settings=settings
File "/usr/lib/python3.4/site-packages/clickhouse_driver/client.py", line 88, in process_ordinary_query
return self.receive_result(with_column_types=with_column_types)
File "/usr/lib/python3.4/site-packages/clickhouse_driver/client.py", line 19, in receive_result
block = self.receive_block()
File "/usr/lib/python3.4/site-packages/clickhouse_driver/client.py", line 38, in receive_block
packet = self.connection.receive_packet()
File "/usr/lib/python3.4/site-packages/clickhouse_driver/connection.py", line 234, in receive_packet
packet.type = packet_type = read_varint(self.fin)
File "/usr/lib/python3.4/site-packages/clickhouse_driver/reader.py", line 46, in read_varint
i = _read_one(f)
File "/usr/lib/python3.4/site-packages/clickhouse_driver/reader.py", line 31, in _read_one
c = f.read(1)
File "/usr/lib64/python3.4/socket.py", line 378, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out

Asyncio client

First of all, great to have a native driver!

I'd like to use clickhouse driver from asyncio, how should I implement asyncio connection?
Would it be enough to provide an asynchronous version for Connection and Client?

Unknown type Tuple(Float64, Float64)

Can't execute query with column of type Array(Tuple(Float64, Float64)) as result.

DROP TABLE IF EXISTS test_polygon;

CREATE TABLE test_polygon
(
    id Int64,
    polygon Nested(
        lat Float64,
        lon Float64
    )
)
ENGINE = Log();

INSERT INTO test_polygon VALUES
(1111, [0., 8., 7., 0.], [0., 7., 8., 0.]),
(2222, [5., 5., 4., 3., 5.], [10., 12., 11., 10., 10.]);
SELECT
    id,
    arrayMap((x, y) -> (x, y), polygon.lat, polygon.lon) AS polygon
FROM test_polygon

┌───id─┬─polygon──────────────────────────────┐
│ 1111 │ [(0,0),(8,7),(7,8),(0,0)]            │
│ 2222 │ [(5,10),(5,12),(4,11),(3,10),(5,10)] │
└──────┴──────────────────────────────────────┘

2 rows in set. Elapsed: 0.003 sec.
from clickhouse_driver import Client 

client = Client('localhost')
print(client.execute('SELECT id, arrayMap((x, y)-> tuple(x, y), polygon_lat, polygon_lon) AS polygon FROM test_db.test_polygon'))
KeyError                                  Traceback (most recent call last)
~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/columns/service.py in get_column_by_spec(spec, column_options)
     57         try:
---> 58             cls = column_by_type[spec]
     59             return cls(**column_options)

KeyError: 'Tuple(Float64, Float64)'

During handling of the above exception, another exception occurred:

UnknownTypeError                          Traceback (most recent call last)
<ipython-input-51-1fc000cc1814> in <module>()
----> 1 print(client.execute('SELECT id, arrayMap((x, y)-> tuple(x, y), polygon_lat, polygon_lon) AS polygon FROM test_db.test_polygon'))

~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/client.py in execute(self, query, params, with_column_types, external_tables, query_id, settings, types_check, columnar)
    187                     external_tables=external_tables,
    188                     query_id=query_id, settings=query_settings,
--> 189                     types_check=types_check, columnar=columnar
    190                 )
    191 

~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/client.py in process_ordinary_query(self, query, params, with_column_types, external_tables, query_id, settings, types_check, columnar)
    244                                              types_check=types_check)
    245         return self.receive_result(with_column_types=with_column_types,
--> 246                                    columnar=columnar)
    247 
    248     def process_insert_query(self, query_without_data, data,

~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/client.py in receive_result(self, with_column_types, progress, columnar)
     80 
     81         else:
---> 82             self.receive_no_progress_result(result, columnar)
     83             return result.get_result()
     84 

~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/client.py in receive_no_progress_result(self, result, columnar)
    107     def receive_no_progress_result(self, result, columnar=False):
    108         while True:
--> 109             packet = self.receive_packet()
    110             if not packet:
    111                 break

~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/client.py in receive_packet(self)
    138 
    139     def receive_packet(self):
--> 140         packet = self.connection.receive_packet()
    141 
    142         if packet.type == ServerPacketTypes.EXCEPTION:

~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/connection.py in receive_packet(self)
    332 
    333         if packet_type == ServerPacketTypes.DATA:
--> 334             packet.block = self.receive_data()
    335 
    336         elif packet_type == ServerPacketTypes.EXCEPTION:

~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/connection.py in receive_data(self)
    391             read_binary_str(self.fin)
    392 
--> 393         block = self.block_in.read()
    394         self.block_in.reset()
    395         return block

~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/streams/native.py in read(self)
     78             if n_rows:
     79                 column = read_column(self.context, column_type, n_rows,
---> 80                                      self.fin)
     81                 data.append(column)
     82 

~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/columns/service.py in read_column(context, column_spec, n_items, buf)
     65 def read_column(context, column_spec, n_items, buf):
     66     column_options = {'context': context}
---> 67     column = get_column_by_spec(column_spec, column_options=column_options)
     68     return column.read_data(n_items, buf)
     69 

~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/columns/service.py in get_column_by_spec(spec, column_options)
     49 
     50     elif spec.startswith('Array'):
---> 51         return create_array_column(spec, create_column_with_options)
     52 
     53     elif spec.startswith('Nullable'):

~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/columns/arraycolumn.py in create_array_column(spec, column_by_spec_getter)
    186 def create_array_column(spec, column_by_spec_getter):
    187     inner = spec[6:-1]
--> 188     return ArrayColumn(column_by_spec_getter(inner))

~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/columns/service.py in create_column_with_options(x)
     37 
     38     def create_column_with_options(x):
---> 39         return get_column_by_spec(x, column_options)
     40 
     41     if spec.startswith('FixedString'):

~/.pyenv/versions/jupyter3.6.4/lib/python3.6/site-packages/clickhouse_driver/columns/service.py in get_column_by_spec(spec, column_options)
     60 
     61         except KeyError as e:
---> 62             raise errors.UnknownTypeError('Unknown type {}'.format(e.args[0]))
     63 
     64 

UnknownTypeError: Code: 50. Unknown type Tuple(Float64, Float64)

Support Interval Types

Recent Clickhouse supports intervals / timedeltas but they have special types.

Query:

SELECT INTERVAL `int_col` DAY AS `tmp`
FROM ibis_testing.`functional_alltypes`
LIMIT 10000

Traceback:

../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/client.py:176: in execute
    types_check=types_check, columnar=columnar
../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/client.py:227: in process_ordinary_query
    columnar=columnar)
../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/client.py:75: in receive_result
    self.receive_no_progress_result(result, columnar)
../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/client.py:102: in receive_no_progress_result
    packet = self.receive_packet()
../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/client.py:133: in receive_packet
    packet = self.connection.receive_packet()
../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/connection.py:323: in receive_packet
    packet.block = self.receive_data()
../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/connection.py:386: in receive_data
    block = self.block_in.read()
../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/streams/native.py:77: in read
    column = read_column(column_type, n_rows, self.fin)
../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/columns/service.py:55: in read_column
    column = get_column_by_spec(column_spec)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

spec = 'IntervalDay', column_options = {}

    def get_column_by_spec(spec, column_options=None):
        column_options = column_options or {}

        def create_column_with_options(x):
            return get_column_by_spec(x, column_options)

        if spec.startswith('FixedString'):
            length = int(spec[12:-1])
            return FixedString(length)

        elif spec.startswith('Enum'):
            return create_enum_column(spec)

        elif spec.startswith('Array'):
            return create_array_column(spec, create_column_with_options)

        elif spec.startswith('Nullable'):
            return create_nullable_column(spec, create_column_with_options)

        else:
            try:
                cls = column_by_type[spec]
                return cls(**column_options)

            except KeyError as e:
>               raise errors.UnknownTypeError('Unknown type {}'.format(e.args[0]))
E               clickhouse_driver.errors.UnknownTypeError: Code: 50. Unknown type IntervalDay

../../.pyenv/versions/miniconda3-latest/envs/ml3/lib/python3.6/site-packages/clickhouse_driver/columns/service.py:51: UnknownTypeError
------------------------------------------------- Captured log call --------------------------------------------------
connection.py              174 INFO     Connecting. Database: ibis_testing. User: default
connection.py              275 INFO     Connected to ClickHouse server version 1.1.54310
connection.py              439 INFO     Query: SELECT INTERVAL `int_col` DAY AS `tmp`
FROM ibis_testing.`functional_alltypes`
LIMIT 10000
connection.py              414 DEBUG    Block send time: 8.8e-05

Extremely slow on large select, http protocol almost 10 times faster

It seems that selecting large datasets using the native client is extremely slow. Here is my benchmark https://gist.github.com/dmitriyshashkin/6a4849bdcf882ba340cdfbc1990da401

Initially, I've encountered this behavior on my own dataset, but I was able to reproduce it using the dataset and the structure described here https://clickhouse.yandex/docs/en/getting_started/example_datasets/ontime/

To simplify things a little bit I've used the data for just one month: http://transtats.bts.gov/PREZIP/On_Time_On_Time_Performance_2017_12.zip

As you can see the fastest way to get the data is by using HTTP protocol with requests and pandas. The problem gets worse as the number of the rows grows, on my own dataset with 5M rows I waited for 1 hour before I had to interrupt the process. The bottleneck is not CH itself, the "top" command shows that all the work is done by python with 100% CPU utilization, while CH is almost idle.

Get progress info

It seems the Progress packets are received and managed but there is no way to get the info from the Client or Connection objects. Here an API proposition with a fetch* method, this is common on a database API.

# quick way
c = client('server')
c.execute("select * from table")
result = c.fetchall()

# get progress (and trash the results for an easy example)
c = client('server')
c.execute("select * from verybigtable")
while c.fetchsome():
    print c.progress

AttributeError: 'Client' object has no attribute 'execute_iter'

Hi,

looks like there is no execute_iter method in version '0.0.10'.

from clickhouse_driver import Client

client = Client(host='localhost', port=2102, database='shard01')

settings = {'max_block_size': 100000}

rows_gen = client.execute_iter('select * from query.TCC_S1', settings=settings)


AttributeError Traceback (most recent call last)
in ()
2 client = Client(host='localhost', port=2102, database='shard01')
3 settings = {'max_block_size': 100000}
----> 4 rows_gen = client.execute_iter('select * from query.TCC_S1', settings=settings)

AttributeError: 'Client' object has no attribute 'execute_iter'

TooLargeStringSize: Code: 131.

I often get the error TooLargeStringSize: Code: 131. on inserting data into a table. How I can prevent it? I already tried to insert really small batches.

Better support for writing bytes

When writing a FixedString column, the data is first encoded using utf-8 to bytes in Python. This could cause a problem when the string cannot be encoded using utf-8. For instance, cipher text bytes generated by hmac sha256. I would suggest add type checking before encoding in https://github.com/mymarilyn/clickhouse-driver/blob/master/src/writer.py#L18 and https://github.com/mymarilyn/clickhouse-driver/blob/master/src/writer.py#L24. If the text is already bytes, just don't encode it. For example, import hmac; hmac.new(b"done", msg=b"yes", digestmod="sha256").digest().decode('utf-8') will throw an exception 'utf-8' codec can't decode byte 0xcb in position 3.

Is not working under CentOS 7 after install from pip

Hi,
I'm experiencing some strange issues with module and there are no other issues with any other modules:

Installed from pip without issues:

# pip install clickhouse-driver[lz4]

Collecting clickhouse-driver[lz4]
  Using cached clickhouse-driver-0.0.6.tar.gz
Requirement already satisfied (use --upgrade to upgrade): six in /usr/lib/python2.7/site-packages (from clickhouse-driver[lz4])
Requirement already satisfied (use --upgrade to upgrade): enum34 in /usr/lib/python2.7/site-packages (from clickhouse-driver[lz4])
Requirement already satisfied (use --upgrade to upgrade): lz4 in /usr/lib64/python2.7/site-packages (from clickhouse-driver[lz4])
Collecting clickhouse-cityhash==1.0.2 (from clickhouse-driver[lz4])
  Using cached clickhouse-cityhash-1.0.2.tar.gz
Installing collected packages: clickhouse-cityhash, clickhouse-driver
  Running setup.py install for clickhouse-cityhash ... done
  Running setup.py install for clickhouse-driver ... done
Successfully installed clickhouse-cityhash-1.0.2 clickhouse-driver-0.0.6
# pip show clickhouse_driver

Name: clickhouse-driver
Version: 0.0.6
Summary: Python driver with native interface for ClickHouse
Home-page: https://github.com/mymarilyn/clickhouse-driver
Author: Konstantin Lebedev
Author-email: [email protected]
License: MIT
Location: /usr/lib/python2.7/site-packages
Requires: six, enum34
# ll /usr/lib/python2.7/site-packages/clickhouse_driver
total 164
-rw-r--r-- 1 root root  2155 Jun 30 18:36 block.py
-rw-r--r-- 1 root root  3109 Sep 28 17:42 block.pyc
-rw-r--r-- 1 root root   693 May 22 18:29 blockstreamprofileinfo.py
-rw-r--r-- 1 root root  1247 Sep 28 17:42 blockstreamprofileinfo.pyc
-rw-r--r-- 1 root root  2165 May 22 18:29 clientinfo.py
-rw-r--r-- 1 root root  2856 Sep 28 17:42 clientinfo.pyc
-rw-r--r-- 1 root root  7666 Sep 18 19:58 client.py
-rw-r--r-- 1 root root  7783 Sep 28 17:42 client.pyc
drwxr-xr-x 2 root root  4096 Sep 28 17:42 columns
drwxr-xr-x 2 root root  4096 Sep 28 17:42 compression
-rw-r--r-- 1 root root 11992 Sep 16 19:59 connection.py
-rw-r--r-- 1 root root 12551 Sep 28 17:42 connection.pyc
-rw-r--r-- 1 root root   596 Sep 16 20:24 defines.py
-rw-r--r-- 1 root root   885 Sep 28 17:42 defines.pyc
-rw-r--r-- 1 root root  2899 Sep 16 20:24 errors.py
-rw-r--r-- 1 root root  6561 Sep 28 17:42 errors.pyc
-rw-r--r-- 1 root root    68 Sep 19 18:00 __init__.py
-rw-r--r-- 1 root root   443 Sep 28 17:42 __init__.pyc
-rw-r--r-- 1 root root   522 May 22 18:29 progress.py
-rw-r--r-- 1 root root  1113 Sep 28 17:42 progress.pyc
-rw-r--r-- 1 root root  2002 Aug 31 18:09 protocol.py
-rw-r--r-- 1 root root  2645 Sep 28 17:42 protocol.pyc
-rw-r--r-- 1 root root   186 May 22 18:29 queryprocessingstage.py
-rw-r--r-- 1 root root   590 Sep 28 17:42 queryprocessingstage.pyc
-rw-r--r-- 1 root root  1686 Sep  4 20:49 reader.py
-rw-r--r-- 1 root root  3676 Sep 28 17:42 reader.pyc
-rw-r--r-- 1 root root   717 May 22 18:29 readhelpers.py
-rw-r--r-- 1 root root   945 Sep 28 17:42 readhelpers.pyc
drwxr-xr-x 2 root root  4096 Sep 28 17:42 settings
drwxr-xr-x 2 root root  4096 Sep 28 17:42 streams
drwxr-xr-x 2 root root  4096 Sep 28 17:42 util
-rw-r--r-- 1 root root  1770 Aug 31 18:09 writer.py
-rw-r--r-- 1 root root  3882 Sep 28 17:42 writer.pyc

Example code test.py

#! /usr/bin/env python

from clickhouse_driver.client import Client

ch_db = Client(host='clickhouse-host', database='test_db',
               compression=True)
query = 'select * from test_table'
result = ch_db.execute(query)

print result

Output:

Traceback (most recent call last):
  File "./test.py", line 3, in <module>
    from clickhouse_driver.client import Client
ImportError: No module named client

Insert NULL values for Nullable types

Hi, I have created table with Nullable columns. How to pass NULL values via bulk insert?

It seems there are no analogue value for NULL, None is not working atm.

Actually None is working and it inserts NULL values :)

Set query settings

Clickhouse queries accept lots of settings, according to https://github.com/yandex/ClickHouse/blob/master/dbms/src/Interpreters/Settings.h#L30

If a setting is set to a value different from the default one, then it should be sent to the server.
It could be done in connection.send_query() before write_binary_str('', self.fout) # end of query settings

Example of use:

c = client('server')
settings = Settings()
settings.replace_running_query = True
c.execute('SELECT 1', settings, query_id=a_query_id)

input_format_skip_unknown_fields setting seems to have no effect

According to the Clickhouse documentation, an exception should be raised if input_format_skip_unknown_fields is set to false

https://clickhouse.yandex/docs/en/operations/settings/settings/#input_format_skip_unknown_fields

I can reproduce this behavior as expected with clickhouse-client, but clickhouse-driver seems to ignore this setting. In the example below, "z" is not part of the schema.

clickhouse-driver==0.0.10
ClickHouse server version 1.1.54362

for skip in (True, False):
    print('Skip Unknown:', skip)
    record = {'x': 1, 'y': 2, 'z': 3}
    client.execute('DROP TABLE IF EXISTS test')
    client.execute('CREATE TABLE test (x Int32, y Int32) ENGINE=Memory')
    client.execute('INSERT INTO test FORMAT JSONEachRow', [record], settings={'input_format_skip_unknown_fields': skip})
    print(client.execute('SELECT * FROM test'))

output: (no exception rasied in either case)

('Skip Unknown:', True)
[(1, 2)]
('Skip Unknown:', False)
[(1, 2)]

clickhouse-client - skip unknown fields

$ clickhouse-client -n --query='DROP TABLE IF EXISTS test; CREATE TABLE test (x Int32, y Int32) ENGINE=Memory; set input_format_skip_unknown_fields=1; INSERT INTO test FORMAT JSONEachRow' <<< '{"x": 1, "y": 2, "z": 3}'
$ clickhouse-client -E --query='SELECT * FROM test'
Row 1:
──────
x: 1
y: 2

clickhouse-client - don't skip unknown fields

$ clickhouse-client -n --query='DROP TABLE IF EXISTS test; CREATE TABLE test (x Int32, y Int32) ENGINE=Memory; set input_format_skip_unknown_fields=0; INSERT INTO test FORMAT JSONEachRow' <<< '{"x": 1, "y": 2, "z": 3}'
Error on processing query: INSERT INTO test FORMAT JSONEachRow
Code: 117, e.displayText() = DB::Exception: Unknown field found while parsing JSONEachRow format: z, e.what() = DB::Exception, Stack trace:

0. clickhouse-client(StackTrace::StackTrace()+0x16) [0x4d32516]
1. clickhouse-client(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x1f) [0x26ae89f]
2. clickhouse-client(DB::JSONEachRowRowInputStream::read(std::vector<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn> > >&)+0x1401) [0x4a16ee1]
3. clickhouse-client(DB::BlockInputStreamFromRowInputStream::readImpl()+0x88) [0x4c37388]
4. clickhouse-client(DB::IProfilingBlockInputStream::read()+0x237) [0x408f377]
5. clickhouse-client(DB::AsynchronousBlockInputStream::calculate(MemoryTracker*)+0x47) [0x26c1327]
6. clickhouse-client(ThreadPool::worker()+0x167) [0x4facd67]
7. clickhouse-client() [0x53f6f1f]
8. /lib64/libpthread.so.0(+0x7e25) [0x7f35d8f0fe25]
9. /lib64/libc.so.6(clone+0x6d) [0x7f35d82c534d]

Memory Overflow

I'm running Clickhouse on production, using the asyncio wrapper. But sometimes I get a issue when inserting into database.

Job processing failed
job: <Job coro=<<coroutine object user_events_producer at 0x7f39382c1fc0>>>
Traceback (most recent call last):
  File "./app.py", line 189, in user_events_producer
    [e.data for e in events_queue]
  File "/usr/local/lib/python3.6/site-packages/aioch/client.py", line 92, in execute
    **kwargs)
  File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.6/site-packages/clickhouse_driver/client.py", line 159, in execute
    self.connection.force_connect()
  File "/usr/local/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 124, in force_connect
    elif not self.ping():
  File "/usr/local/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 303, in ping
    packet_type = read_varint(self.fin)
  File "/usr/local/lib/python3.6/site-packages/clickhouse_driver/reader.py", line 29, in read_varint
    i = _read_one(f)
  File "/usr/local/lib/python3.6/site-packages/clickhouse_driver/reader.py", line 14, in _read_one
    c = f.read(1)
info ValueError: PyMemoryView_FromBuffer(): info->buf must not be NULL
Job processing failed
job: <Job coro=<<coroutine object user_events_producer at 0x7f3939175308>>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/clickhouse_driver/client.py", line 169, in execute
    types_check=types_check
  File "/usr/local/lib/python3.6/site-packages/clickhouse_driver/client.py", line 237, in process_insert_query
    sample_block = self.receive_sample_block()
  File "/usr/local/lib/python3.6/site-packages/clickhouse_driver/client.py", line 251, in receive_sample_block
    raise packet.exception
clickhouse_driver.errors.ServerException: Code: 101.
DB::NetException. DB::NetException: Unexpected packet Query received from client. Stack trace:
ingestion-server
0. /usr/bin/clickhouse-server(StackTrace::StackTrace()+0x15) [0x84eb065]
1. /usr/bin/clickhouse-server(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x21) [0x2ccbe01]
2. /usr/bin/clickhouse-server(DB::TCPHandler::receivePacket()+0x22e) [0x2cdb7de]
3. /usr/bin/clickhouse-server(DB::TCPHandler::readData(DB::Settings const&)+0x1c2) [0x2cdbb42]
4. /usr/bin/clickhouse-server(DB::TCPHandler::processInsertQuery(DB::Settings const&)+0x205) [0x2cdbee5]
5. /usr/bin/clickhouse-server(DB::TCPHandler::runImpl()+0x476) [0x2cdc566]
6. /usr/bin/clickhouse-server(DB::TCPHandler::run()+0x2a) [0x2cdd3ba]
7. /usr/bin/clickhouse-server(Poco::Net::TCPServerConnection::start()+0xe) [0x86d53de]
8. /usr/bin/clickhouse-server(Poco::Net::TCPServerDispatcher::run()+0x169) [0x86d57b9]
9. /usr/bin/clickhouse-server(Poco::PooledThread::run()+0x76) [0x8776976]
10. /usr/bin/clickhouse-server(Poco::ThreadImpl::runnableEntry(void*)+0x37) [0x8772b87]
11. /usr/bin/clickhouse-server() [0x8dd397e]
12. /lib/x86_64-linux-gnu/libpthread.so.0(+0x76da) [0x7f650da8d6da]
13. /lib/x86_64-linux-gnu/libc.so.6(clone+0x3e) [0x7f650d21488e]
ingestion-server

I don't know if the issue is the server or the client.

Dates are lower by 1

date.fromtimestamp is supposed to take a local timestamp not a UTC one. I am in a timezone that is behind UTC, so all dates are 1 day behind.

>>> datetime.date.fromtimestamp(0)
datetime.date(1969, 12, 31)

To fix this you need to use datetime.datetime.utcfromtimestamp().date() instead.

return date.fromtimestamp(value * self.offset)

return datetime.datetime.utcfromtimestamp(value * self.offset).date()

Wrong DateTime insert

Hi!

After insert datetime.datetime(2018, 1, 19, 10) through this driver I see '2018-01-19 13:00:00' value in table.
Timezone on my computer and clickhouse server is Moskow.

What I must do to see '2018-01-19 10:00:00' after insert?

Accessing rows_before_limit property through API

The BlockStreamProfileInfo.rows_before_limit property is useful to get the rows count for pagination without running an extra query, but there does not seem to be any way to access it through the API (i.e. the client silently ignores the PROFILE_INFO packet).

As a workaround I hacked together a small change in Client.receive_packet where it saves the last packet.profile_info in the Client instance and we can use it like this:

client = Client()
rows = client.execute('SELECT ...')
total_count = client.profile_info.rows_before_limit

So it's not pretty, but it seems to work fine. Anyway, I think it would make sense to have it in the main API without additional hacks, but I'm not sure what's the best place to put it in... Do you have any thoughts on that? Or perhaps are there any additional caveats that might have caused leaving this out of the API scope?

DBAPI Support

Thanks for the hard working on this great project.

Does this driver already implement the DBAPI? If not, do we have a plan?

Return column type spec

Currently the client returns typenames.

It would be great to get the typespec instead, including nullable flag and/or inner type spec.

How to Insert Date/DateTime from python list or tuple object?

I have a list data type data like this:

[['2018-10-10', 600]]

table schema:

create table example(
 TestDate Date,
 TestValue Int32
) engigne=TinyLog

When I use cllickhouse-driver insert data to table, a data type error has occurred:

Code: 53. Type mismatch in VALUES section. Expected Date got <class 'str'>: 2018-10-10 for column "TestDate".

How to slove it?Thanks

Nothing type

Recent clickhouse has a new type Nothing which is currently unhandled by clickhouse-driver.

https://github.com/yandex/ClickHouse/blob/630d731ee35b55cfe2f3debfe92c1c21bbe3f50b/dbms/tests/queries/0_stateless/00420_null_in_scalar_subqueries.sql

Selecting Null raises:

/tmp/workspace/miniconda/envs/ibis_2.7/lib/python2.7/site-packages/clickhouse_driver/columns/nullablecolumn.py:5: in create_nullable_column
    nested = column_by_spec_getter(inner)
/tmp/workspace/miniconda/envs/ibis_2.7/lib/python2.7/site-packages/clickhouse_driver/columns/service.py:30: in create_column_with_options
    return get_column_by_spec(x, column_options)
/tmp/workspace/miniconda/envs/ibis_2.7/lib/python2.7/site-packages/clickhouse_driver/columns/service.py:51: in get_column_by_spec
    raise errors.UnknownTypeError('Unknown type {}'.format(e.args[0]))
E   UnknownTypeError: Code: 50. Unknown type Nothing

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.