Git Product home page Git Product logo

Comments (10)

xzkostyan avatar xzkostyan commented on May 10, 2024

Hi, @kszucs!

Integration with different data sources is a good idea. But it should be in wrapper packages, not in the core package. There are many sources to integrate with. Interfaces for communication can change really fast. Unfortunately I'm not familiar with all data sources and I can't stay in touch with these changes.

You can write you wrapper and place it on pypi. Feel free to ask if you need any information about integration. I'll try to help.

Note: latest release contains version of driver:

from clickhouse_driver import VERSION

It might help with flawless integration.

from clickhouse-driver.

kszucs avatar kszucs commented on May 10, 2024

Hi @xzkostyan

Actually I try to implement a columnar version of QueryResult, but there are a couple of inconsistencies in the Block implementation. AFAIK when receiving the block.data stores column-wise data whereas when sending it contains row-wise data.

I'm kinda blocked because the tests are running really slowly (I don't know why), so instead I share my findings: master...kszucs:columnar_block

from clickhouse-driver.

xzkostyan avatar xzkostyan commented on May 10, 2024

Hi, @kszucs.

Yup, you are right. There is some inconsistencies in Block.data storing:

  • When you emit INSERT query, data for insert is stored in block in row-wise way.
  • On SELECT statement data received from CH is stored in Block.data in column-wise way.

That's why get_rows method is used to transpose received column-wise data to row-wise. This behavior should be split into different blocks later.

You can check this branch: https://github.com/mymarilyn/clickhouse-driver/tree/feature-deferred-rows-length-validation. There are some speed optimizations on SELECT.

If you want to do some research on performance you can use following profiling snippets:

from clickhouse_driver import Client
c = Client('localhost')
%prun c.execute('SELECT * FROM large_table')
from clickhouse_driver import Client
c = Client('localhost')
%prun c.execute('INSERT INTO test (a, b, c) VALUES', [(x, x, x) for x in range(N)])

from clickhouse-driver.

xzkostyan avatar xzkostyan commented on May 10, 2024

If you need only to implement columnar version of QueryResult you can implement get_columns method that will pick raw block.data. After if you can iteratively .extend() this data in result.

That's it.

from clickhouse-driver.

kszucs avatar kszucs commented on May 10, 2024

I've created a PR according to your comment.

from clickhouse-driver.

kszucs avatar kszucs commented on May 10, 2024

@xzkostyan would You mint to draft a new release? I'd like to use here the columnar result extending fix.

from clickhouse-driver.

xzkostyan avatar xzkostyan commented on May 10, 2024

Sure! I'll make new release on Saturday or Sunday.

from clickhouse-driver.

kszucs avatar kszucs commented on May 10, 2024

Great! Thanks Kostya!

from clickhouse-driver.

xzkostyan avatar xzkostyan commented on May 10, 2024

Hi, @kszucs!

0.0.8 version is released.

from clickhouse-driver.

kszucs avatar kszucs commented on May 10, 2024

Eventually pandas interop will be released in ibis, so I'm closing this ticket now. Thanks!

from clickhouse-driver.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.