Git Product home page Git Product logo

cqlkit's Introduction

CQLKIT

cqlkit is a CLI tool to export Cassandra query to CSV and JSON format. Cassandra is not good at Ad-hoc query, cqlkit allows you to export query result to semi-structured(JSON) or structured data(CSV). There are many tools out there for you to query or process these kinds of format.

Here is a simple some examples.

Export JSON for the system columns in cassandra cluster.

cql2json -q "select peer, data_center, host_id, preferred_ip, rack, release_version from system.peers"

Export CSV for the system columns in cassandra cluster.

cql2csv -q "select peer, data_center, host_id, preferred_ip, rack, release_version from system.peers"

Requirement

  • Java8

Installation

General

  1. Download from release page.
  2. Unzip the package.
  3. Add $CQLKIT_PATH/bin to the PATH environment variable

Mac

Install cqlkit via Homebrew.

brew update
brew install cqlkit

Upgrade cqlkit

brew update
brew upgrade cqlkit

Docker

Run cqlkit via Docker.

 docker run --rm -it tenmax/cqlkit

Usage

CQL2CSV

usage: cql2csv [-c contactpoint] [-r cassandraPort] [-q query] [FILE]
File       The file to use as CQL query. If both FILE and QUERY are
           omitted, query will be read from STDIN.

 -c <arg>                            The contact point. if use multi
                                     contact points, use ',' to separate
                                     multi points
    --connect-timeout <arg>          Connection timeout in seconds;
                                     default: 5
    --consistency <LEVEL>            The consistency level. The level
                                     should be 'any', 'one', 'two',
                                     'three', 'quorum', 'all',
                                     'local_quorum', 'each_quorum',
                                     'serial' or 'local_serial'.
    --cqlshrc <arg>                  Use an alternative cqlshrc file
                                     location, path.
    --date-format <arg>              Use a custom date format. Default is
                                     "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
    --fetchSize <SIZE>               The fetch size. Default is 5000
 -h,--help                           Show the help and exit
 -H,--no-header-row                  Do not output column names.
 -k <arg>                            The keyspace to use.
 -l,--linenumbers                    Insert a column of line numbers at
                                     the front of the output. Useful when
                                     piping to grep or as a simple primary
                                     key.
 -p <arg>                            The password to authenticate.
 -r <arg>                            The port to connect to Cassandra, defaults to 9042.
 -P,--parallel <arg>                 The level of parallelism to run the
                                     task. Default is sequential.
 -q,--query <CQL>                    The CQL query to execute. If
                                     specified, it overrides FILE and
                                     STDIN.
    --query-partition-keys <TABLE>   Query the partition key(s) for a
                                     column family.
    --query-ranges <CQL>             The CQL query would be splitted by
                                     the token ranges. WHERE clause is not
                                     allowed in the CQL query
    --request-timeout <arg>          Request timeout in seconds; default:
                                     12
 -u <arg>                            The user to authenticate.
 -v,--version                        Print the version

CQL2JSON

usage: cql2json [-c contactpoint] [-r cassandraPort] [-q query] [FILE]
File       The file to use as CQL query. If both FILE and QUERY are
           omitted, query will be read from STDIN.

 -c <arg>                            The contact point. if use multi
                                     contact points, use ',' to separate
                                     multi points
    --connect-timeout <arg>          Connection timeout in seconds;
                                     default: 5
    --consistency <LEVEL>            The consistency level. The level
                                     should be 'any', 'one', 'two',
                                     'three', 'quorum', 'all',
                                     'local_quorum', 'each_quorum',
                                     'serial' or 'local_serial'.
    --cqlshrc <arg>                  Use an alternative cqlshrc file
                                     location, path.
    --date-format <arg>              Use a custom date format. Default is
                                     "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
    --fetchSize <SIZE>               The fetch size. Default is 5000
 -h,--help                           Show the help and exit
 -j,--json-columns <arg>             The columns that contains json
                                     string. The content would be used as
                                     json object instead of plain text.
                                     Columns are separated by comma.
 -k <arg>                            The keyspace to use.
 -l,--linenumbers                    Insert a column of line numbers at
                                     the front of the output. Useful when
                                     piping to grep or as a simple primary
                                     key.
 -p <arg>                            The password to authenticate.
 -r <arg>                            The port to connect to Cassandra, defaults to 9042.
 -P,--parallel <arg>                 The level of parallelism to run the
                                     task. Default is sequential.
 -q,--query <CQL>                    The CQL query to execute. If
                                     specified, it overrides FILE and
                                     STDIN.
    --query-partition-keys <TABLE>   Query the partition key(s) for a
                                     column family.
    --query-ranges <CQL>             The CQL query would be splitted by
                                     the token ranges. WHERE clause is not
                                     allowed in the CQL query
    --request-timeout <arg>          Request timeout in seconds; default:
                                     12
 -u <arg>                            The user to authenticate.
 -v,--version                        Print the version

cqlsh

Setup the cqlshrc

To connect to cassandra cluster, although we can use -c and -k to specify the contact server and keyspace respectively, to preapre a cqlshrc is recommended to simply your query. cqlshrc is used by cqlsh. cqlkit leverages this file to connect to your cluster. Here is the setup steps.

  1. Create the cqlshrc file at ~/.cassandra/cqlshrc

  2. Here is the example format.

    [authentication]
    keyspace = system
    
    [connection]
    hostname = 192.168.59.103
    port = 9042
    
    ; vim: set ft=dosini :

Import data from a CSV file

$ cql2csv -q "select text_col from ks.tbl" > example.csv

$ ./cqlsh localhost
cqlsh> COPY ks.tbl FROM 'example.csv' WITH ESCAPE='"' AND HEADER=TRUE

Recommended 3rd Party Tools

  • csvkit - A toolkit to handle CSV files. There are many useful CLI tools included.

  • q - Another CSV tool which focuses on query on CSV files.

  • json2csv - Convert JSON format to CSV format

  • jq - a lightweight and flexible command-line JSON processor.

cqlkit's People

Contributors

balajivenki avatar cloudtu avatar lordsuricato avatar lorenzo avatar phstudy avatar pkgajulapalli avatar popcornylu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cqlkit's Issues

Don't print the full stack trace on error output

As a command-line tool, it's not necessary to dump the application stack trace if an error occurs.
Just output the error message and set the appropriate exit code (the user shouldn't even know that the application was written in Java). If you need to debug the program itself, then accept a --debug option.

add multi contact points support

if you have multi contact points, you can use two way below.

  1. add in command line.
cql2json -c contactPoint1,contactPoint2 --consistency quorum -k myKeyspace -q "select * from myTable limit 1"
  1. add in cqlshrc
[authentication]
keyspace = system

[connection]
hostname = contactPoint1,contactPoint2
port = 9042

; vim: set ft=dosini :

use , to separate multi contact points

cql2csv doesn't work, but cql2json does

When I run:

cql2csv -c cas-01.company.net -q "select * from <key_space>.<table_name>" > table.csv

I only get the header row.

If I run cql2json with the same arguments, I get the entire table as expected.

Consistency Level

Is there a way to set the consistency level on the connection for this tool? I need to be able to set QUORUM or ALL to get full export. Without this, I get different row counts on each run.

No License?

Hello,

This repo does not appear to come with a license?

Could you advise what license this is released under if any?

Regards,
Tim

Build script

I downloaded the project. I have imported it in IntelliJ idea. When I compile with "build of gradle" and try to execute it, it says "the main class has not been found or loaded". What am I doing wrong? Is there another way to compile it? I do this because I need to implement another "Custom" data type

Thank you very much

cql2json does not wrap rows in container object

While cql2json outputs JSON for individual rows, it doesn't wrap all rows in a container object, making the command much less useful for piping the result to additional utilities that expect JSON input.

> cql2json -c localhost -k items -q "select id from items"
{"id":"anv79n"}
{"id":"hcp5yn"}

> cql2json -c localhost -k items -q "select id from items" | json_pp
garbage after JSON object, at character offset 17 (before ""id":"hcp5yn"}\n") at /usr/bin/json_pp5.18 line 45.

Running command without args should display help

It's not very user friendly when running a program without args immediately throws an exception.
If no query is specified in any of the 3 methods, then the program should display help before even trying to connect to a cluster.

> cql2json
Exception in thread "main" java.lang.IllegalArgumentException: Cannot build a cluster without contact points
    at com.datastax.driver.core.Cluster.checkNotEmpty(Cluster.java:118)
    at com.datastax.driver.core.Cluster.<init>(Cluster.java:110)
    at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:179)
    at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:1190)
    at io.tenmax.cqlkit.SessionFactory.<init>(SessionFactory.java:60)
    at io.tenmax.cqlkit.SessionFactory.newInstance(SessionFactory.java:91)
    at io.tenmax.cqlkit.AbstractMapper.run(AbstractMapper.java:215)
    at io.tenmax.cqlkit.AbstractMapper.start(AbstractMapper.java:117)
    at io.tenmax.cqlkit.CQL2JSON.main(CQL2JSON.java:107)

Not able to connect to cassandra , throwing exception com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table schema_keyspaces

Both cql2json and cql2csv are throwing this exception and im not able to get results.
./cql2json -q 'select * from films_property'
Exception in thread "main" com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /30.0.3.217:9042 (com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table schema_keyspaces))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:227)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:82)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1307)
at com.datastax.driver.core.Cluster.init(Cluster.java:159)
at com.datastax.driver.core.Cluster.connect(Cluster.java:249)
at io.tenmax.cqlkit.SessionFactory.(SessionFactory.java:62)
at io.tenmax.cqlkit.SessionFactory.newInstance(SessionFactory.java:91)
at io.tenmax.cqlkit.AbstractMapper.run(AbstractMapper.java:215)
at io.tenmax.cqlkit.AbstractMapper.start(AbstractMapper.java:117)
at io.tenmax.cqlkit.CQL2JSON.main(CQL2JSON.java:107)

Different value formats between cqlsh and cqlkit

First of all thank you for creating this handy tool!

Having the following table:
user | order | created | deleted | score | progress
-----------------------+------------------------+---------------------------------+---------+-------+-----------
mM0CJ0N2QBGOUife8Jv3Tg | 0GrxPSpj9tmcqEghLmtufg | 2020-04-02 09:03:16.933000+0000 | null | 0 | null

Where:

  • user and order are text columns
  • created and deleted are timestamp columns
  • score and progress are double columns

the exported CSV data for that row using cqlkit will look like this:
mM0CJ0N2QBGOUife8Jv3Tg,0GrxPSpj9tmcqEghLmtufg,2020-04-02 11:03:16.933+0200,,0.0,NULL

but using the COPY TO command in cqlsh the same row will look like this:
mM0CJ0N2QBGOUife8Jv3Tg,0GrxPSpj9tmcqEghLmtufg,2020-04-02 09:03:16.933+0000,,0,

As you can see there are differences in the outputs of cqlsh and cqlkit.

A double value looks like:
cqlsh : 0
cqlkit: 0.0

A null double:
cqlsh : ''
cqlkit: NULL

A timestamp:
cqlsh: 2020-04-02 09:03:16.933+0000
cqlkit: 2020-04-02 11:03:16.933+0200

It would be really handy to have an option in cqlkit that would allow to export the data in the same text format as cqlsh does, this becomes specially useful when you need to import the data exported by cqlkit using the COPY FROM command in cqlsh.

unconfigured table

C:>cql2json -q "select * from system.schema_columns"
Error: unconfigured table schema_columns

cql2csv error mismatched input

in running a basic "SELECT * ... LIMIT 100" I'm getting

com.datastax.driver.core.exceptions.SyntaxError: line 0:-1 mismatched input '<EOF>' expecting K_FROM
        at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:58)
        at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:24)
        at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
        at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
        at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:68)
        at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:43)
        at io.tenmax.cqlkit.AbstractMapper.run(AbstractMapper.java:306)
        at io.tenmax.cqlkit.AbstractMapper.start(AbstractMapper.java:136)
        at io.tenmax.cqlkit.CQL2CSV.main(CQL2CSV.java:107)
Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 0:-1 mismatched input '<EOF>' expecting K_FROM
        at com.datastax.driver.core.Responses$Error.asException(Responses.java:143)
        at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179)
        at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:198)
        at com.datastax.driver.core.RequestHandler.access$2600(RequestHandler.java:50)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:852)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:686)
        at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1089)
        at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1012)
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
        at java.lang.Thread.run(Thread.java:750)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.