tenmax / cqlkit Goto Github PK

CLI tool to export Cassandra query as CSV and JSON format.

License: Apache License 2.0

Java 96.11% Shell 2.83% Dockerfile 1.06%

cqlkit's Introduction

CQLKIT

cqlkit is a CLI tool to export Cassandra query to CSV and JSON format. Cassandra is not good at Ad-hoc query, cqlkit allows you to export query result to semi-structured(JSON) or structured data(CSV). There are many tools out there for you to query or process these kinds of format.

Here is a simple some examples.

Export JSON for the system columns in cassandra cluster.

cql2json -q "select peer, data_center, host_id, preferred_ip, rack, release_version from system.peers"

Export CSV for the system columns in cassandra cluster.

cql2csv -q "select peer, data_center, host_id, preferred_ip, rack, release_version from system.peers"

Requirement

Java8

Installation

General

Download from release page.
Unzip the package.
Add $CQLKIT_PATH/bin to the PATH environment variable

Mac

Install cqlkit via Homebrew.

brew update
brew install cqlkit

Upgrade cqlkit

brew update
brew upgrade cqlkit

Docker

Run cqlkit via Docker.

 docker run --rm -it tenmax/cqlkit

Usage

CQL2CSV

usage: cql2csv [-c contactpoint] [-r cassandraPort] [-q query] [FILE]
File       The file to use as CQL query. If both FILE and QUERY are
           omitted, query will be read from STDIN.

 -c <arg>                            The contact point. if use multi
                                     contact points, use ',' to separate
                                     multi points
    --connect-timeout <arg>          Connection timeout in seconds;
                                     default: 5
    --consistency <LEVEL>            The consistency level. The level
                                     should be 'any', 'one', 'two',
                                     'three', 'quorum', 'all',
                                     'local_quorum', 'each_quorum',
                                     'serial' or 'local_serial'.
    --cqlshrc <arg>                  Use an alternative cqlshrc file
                                     location, path.
    --date-format <arg>              Use a custom date format. Default is
                                     "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
    --fetchSize <SIZE>               The fetch size. Default is 5000
 -h,--help                           Show the help and exit
 -H,--no-header-row                  Do not output column names.
 -k <arg>                            The keyspace to use.
 -l,--linenumbers                    Insert a column of line numbers at
                                     the front of the output. Useful when
                                     piping to grep or as a simple primary
                                     key.
 -p <arg>                            The password to authenticate.
 -r <arg>                            The port to connect to Cassandra, defaults to 9042.
 -P,--parallel <arg>                 The level of parallelism to run the
                                     task. Default is sequential.
 -q,--query <CQL>                    The CQL query to execute. If
                                     specified, it overrides FILE and
                                     STDIN.
    --query-partition-keys <TABLE>   Query the partition key(s) for a
                                     column family.
    --query-ranges <CQL>             The CQL query would be splitted by
                                     the token ranges. WHERE clause is not
                                     allowed in the CQL query
    --request-timeout <arg>          Request timeout in seconds; default:
                                     12
 -u <arg>                            The user to authenticate.
 -v,--version                        Print the version

CQL2JSON

usage: cql2json [-c contactpoint] [-r cassandraPort] [-q query] [FILE]
File       The file to use as CQL query. If both FILE and QUERY are
           omitted, query will be read from STDIN.

 -c <arg>                            The contact point. if use multi
                                     contact points, use ',' to separate
                                     multi points
    --connect-timeout <arg>          Connection timeout in seconds;
                                     default: 5
    --consistency <LEVEL>            The consistency level. The level
                                     should be 'any', 'one', 'two',
                                     'three', 'quorum', 'all',
                                     'local_quorum', 'each_quorum',
                                     'serial' or 'local_serial'.
    --cqlshrc <arg>                  Use an alternative cqlshrc file
                                     location, path.
    --date-format <arg>              Use a custom date format. Default is
                                     "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
    --fetchSize <SIZE>               The fetch size. Default is 5000
 -h,--help                           Show the help and exit
 -j,--json-columns <arg>             The columns that contains json
                                     string. The content would be used as
                                     json object instead of plain text.
                                     Columns are separated by comma.
 -k <arg>                            The keyspace to use.
 -l,--linenumbers                    Insert a column of line numbers at
                                     the front of the output. Useful when
                                     piping to grep or as a simple primary
                                     key.
 -p <arg>                            The password to authenticate.
 -r <arg>                            The port to connect to Cassandra, defaults to 9042.
 -P,--parallel <arg>                 The level of parallelism to run the
                                     task. Default is sequential.
 -q,--query <CQL>                    The CQL query to execute. If
                                     specified, it overrides FILE and
                                     STDIN.
    --query-partition-keys <TABLE>   Query the partition key(s) for a
                                     column family.
    --query-ranges <CQL>             The CQL query would be splitted by
                                     the token ranges. WHERE clause is not
                                     allowed in the CQL query
    --request-timeout <arg>          Request timeout in seconds; default:
                                     12
 -u <arg>                            The user to authenticate.
 -v,--version                        Print the version

cqlsh

Setup the cqlshrc

To connect to cassandra cluster, although we can use -c and -k to specify the contact server and keyspace respectively, to preapre a cqlshrc is recommended to simply your query. cqlshrc is used by cqlsh. cqlkit leverages this file to connect to your cluster. Here is the setup steps.

Create the cqlshrc file at ~/.cassandra/cqlshrc

Here is the example format.

[authentication]
keyspace = system

[connection]
hostname = 192.168.59.103
port = 9042

; vim: set ft=dosini :

Import data from a CSV file

$ cql2csv -q "select text_col from ks.tbl" > example.csv

$ ./cqlsh localhost
cqlsh> COPY ks.tbl FROM 'example.csv' WITH ESCAPE='"' AND HEADER=TRUE

Recommended 3rd Party Tools

csvkit - A toolkit to handle CSV files. There are many useful CLI tools included.
q - Another CSV tool which focuses on query on CSV files.
json2csv - Convert JSON format to CSV format
jq - a lightweight and flexible command-line JSON processor.

cqlkit's People

Contributors

Stargazers

Watchers

Forkers

coffius cloudtu npvincent losblancos113 lorenzo harendranathvegi9 adiazgalache lordsuricato nafora-systematic gauravsir183 balajivenki gkodkod trimbakgopalghare madireddyr phanirajl nhtha abonacin

cqlkit's Issues

Don't print the full stack trace on error output

As a command-line tool, it's not necessary to dump the application stack trace if an error occurs.
Just output the error message and set the appropriate exit code (the user shouldn't even know that the application was written in Java). If you need to debug the program itself, then accept a --debug option.

cql2csv only runs with a cli flag

cql2csv <FILE

outputs the HELP info

cql2csv -d <FILE

runs the query inside FILE and outputs expected csv.

add multi contact points support

if you have multi contact points, you can use two way below.

add in command line.

cql2json -c contactPoint1,contactPoint2 --consistency quorum -k myKeyspace -q "select * from myTable limit 1"

add in cqlshrc

[authentication]
keyspace = system

[connection]
hostname = contactPoint1,contactPoint2
port = 9042

; vim: set ft=dosini :

use , to separate multi contact points

cql2csv doesn't work, but cql2json does

When I run:

cql2csv -c cas-01.company.net -q "select * from <key_space>.<table_name>" > table.csv

I only get the header row.

If I run cql2json with the same arguments, I get the entire table as expected.

Consistency Level

Is there a way to set the consistency level on the connection for this tool? I need to be able to set QUORUM or ALL to get full export. Without this, I get different row counts on each run.

No License?

Hello,

This repo does not appear to come with a license?

Could you advise what license this is released under if any?

Regards,
Tim

Build script

I downloaded the project. I have imported it in IntelliJ idea. When I compile with "build of gradle" and try to execute it, it says "the main class has not been found or loaded". What am I doing wrong? Is there another way to compile it? I do this because I need to implement another "Custom" data type

Thank you very much

cql2json does not wrap rows in container object

While cql2json outputs JSON for individual rows, it doesn't wrap all rows in a container object, making the command much less useful for piping the result to additional utilities that expect JSON input.

> cql2json -c localhost -k items -q "select id from items"
{"id":"anv79n"}
{"id":"hcp5yn"}

> cql2json -c localhost -k items -q "select id from items" | json_pp
garbage after JSON object, at character offset 17 (before ""id":"hcp5yn"}\n") at /usr/bin/json_pp5.18 line 45.

Running command without args should display help

It's not very user friendly when running a program without args immediately throws an exception.
If no query is specified in any of the 3 methods, then the program should display help before even trying to connect to a cluster.

> cql2json
Exception in thread "main" java.lang.IllegalArgumentException: Cannot build a cluster without contact points
    at com.datastax.driver.core.Cluster.checkNotEmpty(Cluster.java:118)
    at com.datastax.driver.core.Cluster.<init>(Cluster.java:110)
    at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:179)
    at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:1190)
    at io.tenmax.cqlkit.SessionFactory.<init>(SessionFactory.java:60)
    at io.tenmax.cqlkit.SessionFactory.newInstance(SessionFactory.java:91)
    at io.tenmax.cqlkit.AbstractMapper.run(AbstractMapper.java:215)
    at io.tenmax.cqlkit.AbstractMapper.start(AbstractMapper.java:117)
    at io.tenmax.cqlkit.CQL2JSON.main(CQL2JSON.java:107)

Not able to connect to cassandra , throwing exception com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table schema_keyspaces

Both cql2json and cql2csv are throwing this exception and im not able to get results.
./cql2json -q 'select * from films_property'
Exception in thread "main" com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /30.0.3.217:9042 (com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table schema_keyspaces))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:227)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:82)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1307)
at com.datastax.driver.core.Cluster.init(Cluster.java:159)
at com.datastax.driver.core.Cluster.connect(Cluster.java:249)
at io.tenmax.cqlkit.SessionFactory.(SessionFactory.java:62)
at io.tenmax.cqlkit.SessionFactory.newInstance(SessionFactory.java:91)
at io.tenmax.cqlkit.AbstractMapper.run(AbstractMapper.java:215)
at io.tenmax.cqlkit.AbstractMapper.start(AbstractMapper.java:117)
at io.tenmax.cqlkit.CQL2JSON.main(CQL2JSON.java:107)

Support for configurable port number

The example cqlshrc shows providing a port number, but the code in https://github.com/tenmax/cqlkit/blob/master/src/main/java/io/tenmax/cqlkit/SessionFactory.java is not using port. Only the hostname is being pulled from the configuration file.

Can we add support support for configurable request timeout

Hi,

The tool is awesome!!
The query is timing out for a saturated DB in one of our setups.
Can we add support for --request-timeout like in cqlsh?

Different value formats between cqlsh and cqlkit

First of all thank you for creating this handy tool!

Where:

user and order are text columns
created and deleted are timestamp columns
score and progress are double columns

the exported CSV data for that row using cqlkit will look like this:
mM0CJ0N2QBGOUife8Jv3Tg,0GrxPSpj9tmcqEghLmtufg,2020-04-02 11:03:16.933+0200,,0.0,NULL

but using the COPY TO command in cqlsh the same row will look like this:
mM0CJ0N2QBGOUife8Jv3Tg,0GrxPSpj9tmcqEghLmtufg,2020-04-02 09:03:16.933+0000,,0,

As you can see there are differences in the outputs of cqlsh and cqlkit.

A double value looks like:
cqlsh : 0
cqlkit: 0.0

A null double:
cqlsh : ''
cqlkit: NULL

A timestamp:
cqlsh: 2020-04-02 09:03:16.933+0000
cqlkit: 2020-04-02 11:03:16.933+0200

It would be really handy to have an option in cqlkit that would allow to export the data in the same text format as cqlsh does, this becomes specially useful when you need to import the data exported by cqlkit using the COPY FROM command in cqlsh.

unconfigured table

C:>cql2json -q "select * from system.schema_columns"
Error: unconfigured table schema_columns

cql2csv error mismatched input

in running a basic "SELECT * ... LIMIT 100" I'm getting

com.datastax.driver.core.exceptions.SyntaxError: line 0:-1 mismatched input '<EOF>' expecting K_FROM
        at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:58)
        at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:24)
        at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
        at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
        at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:68)
        at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:43)
        at io.tenmax.cqlkit.AbstractMapper.run(AbstractMapper.java:306)
        at io.tenmax.cqlkit.AbstractMapper.start(AbstractMapper.java:136)
        at io.tenmax.cqlkit.CQL2CSV.main(CQL2CSV.java:107)
Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 0:-1 mismatched input '<EOF>' expecting K_FROM
        at com.datastax.driver.core.Responses$Error.asException(Responses.java:143)
        at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179)
        at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:198)
        at com.datastax.driver.core.RequestHandler.access$2600(RequestHandler.java:50)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:852)
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:686)
        at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1089)
        at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1012)
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
        at java.lang.Thread.run(Thread.java:750)