Git Product home page Git Product logo

spark-cassandra-csv's Introduction

Spark CSV Loader for Cassandra

An Example Tool for Using Spark to load a CSV file into Cassandra using spark Pull Requests and Issues Welcome!

Spark CSV Loader 1.0
Usage: sparkcsvexample [options] filename keyspace table mapping [master] [cassandraIp]

  filename
        Filename to read, csv, ex.(file:///temp/file.csv). If no locator uri it provided will look in Hadoop DefaultFS (CFS on DSE)
  keyspace
        Keyspace to save to
  table
        Table to save to
  mapping
        A file containing the names of the Cassandra columns that the csv columns should map to, comma-delimited
  master
        Spark Address of Master Node, Default runs `dsetool sparkmaster` to find master
  cassandraIp
        Ip Address of Cassandra Server, Default uses Spark Master IP address
  -m <value> | --maxcores <value>
        Number of cores to use by this application
  -x <value> | --executormemory <value>
        Amount of memory for each executor (JVM Style Strings)
  -v | --verify
        Run verification checks after inserting data
  --help
        CLI Help

This tool is designed to work with both standalone Apache Spark and Cassandra Clusters as well as DataStax Cassandra/Spark Clusters.

Requirements

(DSE > 4.5.2 or Apache C* > 2.0.5 ) and Spark > 0.9.1

Building the project

To build go to the home directory of the project and run

./sbt/sbt assembly

This will produce a fat-jar in target/scala-2.10/spark-csv-assembly-1.0.jar. Which needs to be included in any running Spark job. It contains the references to the anonymous functions which Spark will use when running.

Creating the Example Keyspace and Table

This application assumes that the keyspace and table to be inserted to already exist. To create the table used in the example used below run the following commands in cqlsh.

CREATE KEYSPACE ks WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': '1'
};

USE ks;

CREATE TABLE tab (
  key int,
  data1 int,
  data2 int,
  data3 int,
  PRIMARY KEY ((key))
)

Running with Datastax Enterprise

When running on a Datstax Enterprise Cluster with Spark Enabled the app can be run with the included run.sh script. This will include the fat-jar referenced above on the classpath for the dse spark-class call and run the application. Running with this method will pickup your spark-env.sh file and correctly place the logs in your predefined locations.

##example
./run.sh -m 4 file://`pwd`/exampleCsv ks tab exampleMapping

Running with Apache Cassandra

We can run directly from sbt using

#Note that here we need to specify the spark master uri and cassandra ip, otherwise
#the program will try to use DataStax Enterprise to pick up these values
./sbt/sbt "run -m 4 file://`pwd`/exampleCsv ks tab exampleMapping spark://127.0.0.1:7077 127.0.0.1"    

spark-cassandra-csv's People

Contributors

russellspitzer avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.