Git Product home page Git Product logo

hiveless's Introduction

Hiveless

CI Maven Badge Snapshots Badge

Hiveless is a Scala library for working with Spark and Hive using a more expressive typed API. It adds typed HiveUDFs and implements Spatial Hive UDFs. It consists of the following modules:

  • hiveless-core with the typed Hive UDFs API and the initial base set of codecs
  • hiveless-jts with the TWKB JTS encoding support
  • hiveless-spatial with Hive GIS UDFs (depends on GeoMesa)
  • hiveless-spatial-index with extra Hive GIS UDFs that may be used for the GIS indexing purposes (depends on GeoMesa and GeoTrellis)
    • There is also a forked release CartoDB/analytics-toolbox-databricks, which is a complete hiveless-spatial and hiveless-spatial-index copy at this point. However, it may contain an extended GIS functionality in the future.

Quick Start

To use Hiveless in your project add the following in your build.sbt file as needed:

resolvers ++= Seq(
  // for snapshot artifacts only
  "oss-sonatype" at "https://oss.sonatype.org/content/repositories/snapshots"
)

libraryDependencies ++= List(
  "com.azavea" %% "hiveless-core"          % "<latest version>",
  "com.azavea" %% "hiveless-spatial"       % "<latest version>",
  "com.azavea" %% "hiveless-spatial-index" % "<latest version>"
)

Hiveless Spatial supported GIS functions

CREATE OR REPLACE FUNCTION st_geometryFromText as 'com.azavea.hiveless.spatial.ST_GeomFromWKT';
CREATE OR REPLACE FUNCTION st_intersects as 'com.azavea.hiveless.spatial.ST_Intersects';
CREATE OR REPLACE FUNCTION st_simplify as 'com.azavea.hiveless.spatial.ST_Simplify';
 -- ...and more

The full list of supported functions can be found here.

Spatial Query Optimizations

There are two types of supported optimizations: ST_Intersects and ST_Contains, which allow Spark to push down predicates when possible.

To enable optimizations:

import com.azavea.hiveless.spark.sql.rules.SpatialFilterPushdownRules

val spark: SparkSession = ???
SpatialFilterPushdownRules.registerOptimizations(sparkContext.sqlContext)

It is also possible to set it through the Spark configuration via the optimizations injector:

import com.azavea.hiveless.spark.sql.SpatialFilterPushdownOptimizations

val conf: SparkConfig = ???
config.set("spark.sql.extensions", classOf[SpatialFilterPushdownOptimizations].getName)

License

Code is provided under the Apache 2.0 license available at http://opensource.org/licenses/Apache-2.0, as well as in the LICENSE file. This is the same license used as Spark.

hiveless's People

Contributors

pomadchin avatar

Stargazers

 avatar wxmimperio avatar Juan David Barreto avatar  avatar Juan Carlos Méndez avatar  avatar

Watchers

James Cloos avatar Eugene Cheipesh avatar Vijay Lulla avatar Azavea CI avatar  avatar  avatar  avatar

Forkers

pomadchin

hiveless's Issues

Drop GeoMesa dependency

We don't really use GeoMesa in this project at this point, it can be convenient to drop GM dependency to establish Scala 2.13 cross builds.

Cover projects with tests

We don't really have time to invest into it at the moment, everything should be covered with tests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.