Git Product home page Git Product logo

cassie's Introduction

Cassie

Cassie is a small, lightweight Cassandra client built on Finagle with with all that provides plus column name/value encoding and decoding.

It is heavily used in production at Twitter so such be considered stable, yet it is incomplete in that it doesn't support the full feature set of Cassandra and will continue to evolve.

Requirements

  • Java SE 6
  • Scala 2.8
  • Cassandra 0.8 or later
  • sbt 0.7

Note that Cassie is usable from Java. Its not super easy, but we're working to make it easier.

Let's Get This Party Started

In your simple-build-tool project file, add Cassie as a dependency:

val twttr = "Twitter's Repository" at "http://maven.twttr.com/"
val cassie = "com.twitter" % "cassie" % "0.19.0"

Finagle

Before going further, you should probably learn about Finagle and its paradigm for asynchronous computing– https://github.com/twitter/finagle.

Connecting To Your Cassandra Cluster

First create a cluster object, passing in a list of seed hosts. By default, when creating a connection to a Keyspace, the given hosts will be queried for a full list of nodes in the cluster. If you don't want to report stats use NullStatsReceiver.

val cluster = new Cluster("host1,host2", OstrichStatsReceiver)

Then create a Keyspace instance which will use Finagle to maintain per-node connection pools and do retries:

val keyspace = cluster.keyspace("MyCassieApp").connect()
// see KeyspaceBuilder for more options here. Try the defaults first.

(If you have some nodes with dramatically different latency—e.g., in another data center–or if you have a huge cluster, you can disable keyspace mapping via "mapHostsEvery(0.minutes)" in which case clients will connect directly to the seed hosts passed to "new Cluster".)

A Quick Note On Timestamps

Cassandra uses client-generated timestamps to determine the order in which writes and deletes should be processed. Cassie previously came with several different clock implementations. Now all Cassie users use the MicrosecondEpochClock and timestamps should be mostly hidden from users.

A Longer Note, This Time On Column Names And Values

Cassandra stores the name and value of a column as an array of bytes. To convert these bytes to and from useful Scala types, Cassie uses Codec parameters for the given type.

For example, take adding a column to a column family of UTF-8 strings:

val strings = keyspace.columnFamily[Utf8Codec, Utf8Codec, Utf8Codec]
strings.insert("newstring", Column("colname", "colvalue"))

The insert method here requires a String and Column[String, String] because the type parameters of the columnFamily call were all Codec[String]. The conversion between Strings and ByteArrays will be seamless. Cassie has codecs for a number of data types already:

  • Utf8Codec: character sequence encoded with UTF-8
  • IntCodec: 32-bit integer stored as a 4-byte sequence
  • LongCodec: 64-bit integer stored as an 8-byte sequence
  • LexicalUUIDCodec a UUID stored as a 16-byte sequence
  • ThriftCodec a Thrift struct stored as variable-length sequence of bytes

Accessing Column Families

Once you've got a Keyspace instance, you can load your column families:

val people  = keyspace.columnFamily[Utf8Codec, Utf8Codec, Utf8Codec]("People")
val numbers = keyspace.columnFamily[Utf8Codec, Utf8Codec, IntCodec]("People",
                defaultReadConsistency = ReadConsistency.One,
                defaultWriteConsistency = WriteConsistency.Any)

By default, ColumnFamily instances have a default ReadConsistency and WriteConsistency of Quorum, meaning reads and writes will only be considered successful if a quorum of the replicas for that key respond successfully. You can change this default or simply pass a different consistency level to specific read and write operations.

Reading Data From Cassandra

Now that you've got your ColumnFamily, you can read some data from Cassandra:

people.getColumn("codahale", "name")

getColumn returns an Future[Option[Column[Name, Value]]] where Name and Value are the type parameters of the ColumnFamily. If the row or column doesn't exist, None is returned. Explaining Futures is out of scope for this README, go the Finagle docs to learn more. But in essence you can do this:

people.getColumn("codahale", "name") map { _ match { case col: Some(Column[String, String]) => # we have data case None => # there was no column } } handle { case e => { # there was an exception, do something about it } }

This whole block returns a Future which will be satisfied when the thrift rpc is done and the callbacks have run.

Anyway, continuing– you can also get a set of columns:

people.getColumns("codahale", Set("name", "motto"))

This returns a Future[java.util.Map[Name, Column[Name, Value]]], where each column is mapped by its name.

If you want to get all columns of a row, that's cool too:

people.getRow("codahale")

Cassie also supports multiget for columns and sets of columns:

people.multigetColumn(Set("codahale", "darlingnikles"), "name")
people.multigetColumns(Set("codahale", "darlingnikles"), Set("name", "motto"))

multigetColumn returns a Future[Map[Key, Map[Name, Column[Name, Value]]]] whichmaps row keys to column names to columns.

Asynchronous Iteration Through Rows and Columns

NOTE: This is new/experimental and likely to change in the future.

Cassie provides functionality for iterating through the rows of a column family and columns in a row. This works with both the random partitioner and the order-preserving partitioner, though iterating through rows in the random partitioner had undefined order.

You can iterate over every column of every row:

val finished = cf.rowsIteratee(100).foreach { case(key, columns) => println(key) //this function is executed async for each row println(cols) } finished() //this is a Future[Unit]. wait on it to know when the iteration is done

This gets 100 rows at a time and calls the above partial function on each one.

Writing Data To Cassandra

Inserting columns is pretty easy:

people.insert("codahale", Column("name", "Coda"))
people.insert("codahale", Column("motto", "Moar lean."))

You can insert a value with a specific timestamp:

people.insert("darlingnikles", Column("name", "Niki").timestamp(200L))
people.insert("darlingnikles", Column("motto", "Told ya.").timestamp(201L))

Batch operations are also possible:

people.batch() { cf =>
  cf.insert("puddle", Column("name", "Puddle"))
  cf.insert("puddle", Column("motto", "Food!"))
}.execute()

(See BatchMutationBuilder for a better idea of which operations are available.)

Deleting Data From Cassandra

First, it's important to understand exactly how deletes work in a distributed system like Cassandra.

Once you've read that, then feel free to remove a column:

people.removeColumn("puddle", "name")

Or a set of columns:

people.removeColumns("puddle", Set("name", "motto"))

Or even a row:

people.removeRow("puddle")

Generating Unique IDs

If you're going to be storing data in Cassandra and don't have a naturally unique piece of data to use as a key, you've probably looked into UUIDs. The only problem with UUIDs is that they're mental, requiring access to MAC addresses or Gregorian calendars or POSIX ids. In general, people want UUIDs which are:

  • Unique across a large set of workers without requiring coordination.
  • Partially ordered by time.

Cassie's LexicalUUIDs meet these criteria. They're 128 bits long. The most significant 64 bits are a timestamp value (from Cassie's strictly-increasing Clock implementation). The least significant 64 bits are a worker ID, with the default value being a hash of the machine's hostname.

When sorted using Cassandra's LexicalUUIDType, LexicalUUIDs will be partially ordered by time -- that is, UUIDs generated in order on a single process will be totally ordered by time; UUIDs generated simultaneously (i.e., within the same clock tick, given clock skew) will not have a deterministic order; UUIDs generated in order between single processes (i.e., in different clock ticks, given clock skew) will be totally ordered by time.

See Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM (1978) vol. 21 (7) pp. 565 and Mattern. Virtual time and global states of distributed systems. Parallel and Distributed Algorithms (1989) pp. 215–226 for a more thorough discussion.

Things What Ain't Done Yet

  • Authentication
  • Meta data (e.g., describe_*)

Thanks

Many thanks to (pre twitter fork):

  • Cliff Moon
  • James Golick
  • Robert J. Macomber

License

Copyright (c) 2010 Coda Hale Copyright (c) 2011-2012 Twitter, Inc.

Published under The Apache 2.0 License, see LICENSE.

cassie's People

Contributors

codahale avatar imownbey avatar jamesgolick avatar johanoskarsson avatar johnynek avatar kevinoliver avatar mariusae avatar ryanking avatar sprsquish avatar stevegury avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cassie's Issues

SBT build fails

Hi there,

I am trying to include the library in a simple project and am getting these resolve dependencies exceptions:

sbt.ResolveException: download failed: javax.jms#jms;1.1!jms.jar
download failed: com.sun.jdmk#jmxtools;1.2.1!jmxtools.jar
download failed: com.sun.jmx#jmxri;1.2.1!jmxri.jar

Following is my build.sbt file:

name := "scala-cassandra-spike"

organization := "ardlema"

version := "0.0.1"

scalaVersion := "2.10.3"

val twitterRepo = "Twitter's Repository" at "http://maven.twttr.com/"
val cassie = "com.twitter" % "cassie" % "0.19.0"

resolvers ++= Seq(twitterRepo)

libraryDependencies ++= Seq(
"org.scalacheck" %% "scalacheck" % "1.10.0" % "test" withSources() withJavadoc(),
cassie
)

initialCommands := "import ardlema.scalacassandraspike._"

Thank you in advance

Readme document issues

val twttr = "Twitter's Repository" at "http://maven.twttr.com/"
val cassie = "com.twitter" % "cassie" % "0.19.0"

then later:

val cluster = new Cluster("host1,host2", OstrichStatsReceiver)

This doesn't work because:

  • 0.19.0 and its dependencies do not contain a "OstrichStatsReceiver" class (I had to use 0.20.0)
  • the correct syntax would be
val cluster = new Cluster("host1,host2", new OstrichStatsReceiver)

(the "new" keyword is missing in the example)

And since we're here, I see that the current source version is 0.25.0-SNAPSHOT . But the twitter repository at http://maven.twttr.com/com/twitter/cassie/ only contains the versions up to 0.20.0 , and according to the other open issues the sources do not compile outside the twitter environment.

Not good.

Allow people to compile this library...

This repo isn't usable outside of Twitter.

[~/Code/cassie]$ mvn                                                                                                                                                                   [master][system] 
[INFO] Scanning for projects...
[ERROR] The build could not read 4 projects -> [Help 1]
[ERROR]   
[ERROR]   The project com.twitter:cassie-core:0.25.1-SNAPSHOT (/Users/leepa/Code/cassie/cassie-core/pom.xml) has 1 error
[ERROR]     Non-resolvable parent POM: Failure to find com.twitter:scala-parent-292:pom:0.0.4 in http://repo1.maven.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced and 'parent.relativePath' points at wrong local POM @ line 8, column 11 -> [Help 2]
[ERROR]   
[ERROR]   The project com.twitter:cassie-hadoop:0.25.1-SNAPSHOT (/Users/leepa/Code/cassie/cassie-hadoop/pom.xml) has 1 error
[ERROR]     Non-resolvable parent POM: Failure to find com.twitter:scala-parent-292:pom:0.0.4 in http://repo1.maven.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced and 'parent.relativePath' points at wrong local POM @ line 8, column 11 -> [Help 2]
[ERROR]   
[ERROR]   The project com.twitter:cassie-serversets:0.25.1-SNAPSHOT (/Users/leepa/Code/cassie/cassie-serversets/pom.xml) has 1 error
[ERROR]     Non-resolvable parent POM: Failure to find com.twitter:scala-parent-292:pom:0.0.4 in http://repo1.maven.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced and 'parent.relativePath' points at wrong local POM @ line 8, column 11 -> [Help 2]
[ERROR]   
[ERROR]   The project com.twitter:cassie-stress:0.25.1-SNAPSHOT (/Users/leepa/Code/cassie/cassie-stress/pom.xml) has 1 error
[ERROR]     Non-resolvable parent POM: Failure to find com.twitter:scala-parent-292:pom:0.0.4 in http://repo1.maven.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced and 'parent.relativePath' points at wrong local POM @ line 8, column 11 -> [Help 2]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
[ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException```

all threads are not stopped on Keyspace.close()

val c = new Cluster(host(0), port).mapHostsEvery(0.seconds)
val keyspace = c.keyspace(repo).connect()
keyspace.close()

running this on "com.twitter" % "cassie-core" % "0.25.2",
results after calling close() in 100% CPU utilisation by some background threads.

Non-resolvable parent POM

The parent POMs are not resolvable.

[INFO] Scanning for projects...
Downloading: http://nexus.xwiki.org/nexus/content/groups/public/com/twitter/scala-parent-292/0.0.2/scala-parent-292-0.0.2.pom
Downloading: http://repo1.maven.org/maven2/com/twitter/scala-parent-292/0.0.2/scala-parent-292-0.0.2.pom
[ERROR] The build could not read 4 projects -> [Help 1]
[ERROR]   
[ERROR]   The project com.twitter:cassie-core:0.22.1-SNAPSHOT (/Users/tcurdt/Downloads/cassie/cassie-core/pom.xml) has 1 error
[ERROR]     Non-resolvable parent POM: Could not find artifact com.twitter:scala-parent-292:pom:0.0.2 in xwiki-releases (http://nexus.xwiki.org/nexus/content/groups/public) and 'parent.relativePath' points at wrong local POM @ line 8, column 11 -> [Help 2]
[ERROR]   

SBT build fails

I have a CompositeCodec I'd like to contribute, but the SBT build fails with the following output:

[info] == cassie-core / compile ==
[info] Source analysis: 39 new/modified, 44 indirectly invalidated, 0 removed.
[info] Compiling main sources...
[error] Note: Some input files use unchecked or unsafe operations.
[error] Note: Recompile with -Xlint:unchecked for details.
[error] /Users/rstrickland/workspace/cassie/cassie-core/src/main/scala/com/twitter/cassie/BaseColumnFamily.scala:48: value setTerminalId is not a member of object com.twitter.finagle.tracing.Trace
[error] Trace.setTerminalId(Trace.nextId)
[error] ^
[error] /Users/rstrickland/workspace/cassie/cassie-core/src/main/scala/com/twitter/cassie/BatchMutationBuilder.scala:62: value Void is not a member of object com.twitter.util.Future
[error] Future.Void
[error] ^
[error] /Users/rstrickland/workspace/cassie/cassie-core/src/main/scala/com/twitter/cassie/Cluster.scala:56: not found: value collectionAsScalaIterable
[error] this(collectionAsScalaIterable(seedHosts).toSet, 9160, NullStatsReceiver)
[error] ^
[error] /Users/rstrickland/workspace/cassie/cassie-core/src/main/scala/com/twitter/cassie/ClusterRemapper.scala:58: not found: value collectionAsScalaIterable
[error] collectionAsScalaIterable(h.endpoints).map {
[error] ^
[error] /Users/rstrickland/workspace/cassie/cassie-core/src/main/scala/com/twitter/cassie/ColumnFamily.scala:184: not found: value collectionAsScalaIterable
[error] for (rowEntry <- collectionAsScalaIterable(rows.entrySet))
[error] ^
[error] /Users/rstrickland/workspace/cassie/cassie-core/src/main/scala/com/twitter/cassie/ColumnFamily.scala:231: not found: value collectionAsScalaIterable
[error] for (rowEntry <- collectionAsScalaIterable(result.entrySet)) {
[error] ^
[error] /Users/rstrickland/workspace/cassie/cassie-core/src/main/scala/com/twitter/cassie/ColumnFamily.scala:284: not found: value collectionAsScalaIterable
[error] for (key <- collectionAsScalaIterable(result.keySet)) {
[error] ^
[error] /Users/rstrickland/workspace/cassie/cassie-core/src/main/scala/com/twitter/cassie/CounterBatchMutationBuilder.scala:49: value Void is not a member of object com.twitter.util.Future
[error] Future.Void
[error] ^
[error] /Users/rstrickland/workspace/cassie/cassie-core/src/main/scala/com/twitter/cassie/CounterColumnFamily.scala:166: not found: value collectionAsScalaIterable
[error] for (rowEntry <- collectionAsScalaIterable(rows.entrySet))
[error] ^
[error] /Users/rstrickland/workspace/cassie/cassie-core/src/main/scala/com/twitter/cassie/CounterColumnFamily.scala:198: not found: value collectionAsScalaIterable
[error] for (rowEntry <- collectionAsScalaIterable(result.entrySet)) {
[error] ^
[error] /Users/rstrickland/workspace/cassie/cassie-core/src/main/scala/com/twitter/cassie/Keyspace.scala:76: value Void is not a member of object com.twitter.util.Future
[error] if (batches.size == 0) return Future.Void
[error] ^
[error] /Users/rstrickland/workspace/cassie/cassie-core/src/main/scala/com/twitter/cassie/SuperCounterBatchMutationBuilder.scala:33: value Void is not a member of object com.twitter.util.Future
[error] Future.Void
[error] ^
[error] /Users/rstrickland/workspace/cassie/cassie-core/src/main/scala/com/twitter/cassie/SuperCounterColumnFamily.scala:81: not found: value collectionAsScalaIterable
[error] for (rowEntry <- collectionAsScalaIterable(result.entrySet)) {
[error] ^
[error] /Users/rstrickland/workspace/cassie/cassie-core/src/main/scala/com/twitter/cassie/codecs/Codec.scala:18: collectionAsScalaIterable is not a member of scala.collection.JavaConversions
[error] import scala.collection.JavaConversions.collectionAsScalaIterable
[error] ^
[error] /Users/rstrickland/workspace/cassie/cassie-core/src/main/scala/com/twitter/cassie/codecs/Codec.scala:41: not found: value collectionAsScalaIterable
[error] for (value <- collectionAsScalaIterable(values))
[error] ^
[error] /Users/rstrickland/workspace/cassie/cassie-core/src/main/scala/com/twitter/cassie/codecs/LegacyUtf8Codec.scala:24: overloaded method constructor deprecated with alternatives:
errordeprecated
[error](message: String)deprecated
[error] cannot be applied to (java.lang.String, java.lang.String)
[error] @deprecated("""Use the new Utf8Codec if you can. You may need to use this for backwards
[error] ^
[error] /Users/rstrickland/workspace/cassie/cassie-core/src/main/scala/com/twitter/cassie/codecs/LegacyUtf8Codec.scala:28: overloaded method constructor deprecated with alternatives:
errordeprecated
[error](message: String)deprecated
[error] cannot be applied to (java.lang.String, java.lang.String)
[error] @deprecated("""Use the new Utf8Codec if you can. You may need to use this for backwards
[error] ^
[error] /Users/rstrickland/workspace/cassie/cassie-core/src/main/scala/com/twitter/cassie/codecs/LegacyUtf8Codec.scala:32: overloaded method constructor deprecated with alternatives:
errordeprecated
[error](message: String)deprecated
[error] cannot be applied to (java.lang.String, java.lang.String)
[error] @deprecated("""Use the new Utf8Codec if you can. You may need to use this for backwards
[error] ^
[error] 18 errors found
[info] == cassie-core / compile ==
[info]
[info] == cassie-stress / check-deps-exist ==
[info] == cassie-stress / check-deps-exist ==
[info]
[info] == cassie-serversets / check-deps-exist ==
[info] == cassie-serversets / check-deps-exist ==
[info]
[info] == cassie-hadoop / check-deps-exist ==
[info] == cassie-hadoop / check-deps-exist ==
[error] Error running compile: Compilation failed
[info]
[info] Total time: 35 s, completed Oct 11, 2012 9:44:55 AM
[info]
[info] Total session time: 41 s, completed Oct 11, 2012 9:44:55 AM
[error] Error during build.

I noticed the sbt dependencies are different than the ones in the pom, so I tried updating the sbt ones to match the pom. This gets rid of the Twitter-related errors, but not the Scala ones. Changing to Scala 2.9 gets rid of those, but then the unit tests fail. I also tried the Maven build, but it complains of a missing plugin that seems to be internal to Twitter.

Am I missing something?

Robbie

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.