Git Product home page Git Product logo

sparkgis's Introduction

SparkGIS

SparkGIS API documentation: http://bmidb.cs.stonybrook.edu/sparkgisdoc/

Setup and Installation

The following steps needs to be followed in order to compile and run SparkGIS from source

Setting up depenedencies

  • Based on user's operating system and available privileges, appropriate script needs to be executed in deploy/ directory. For instance, if target operating system is Ubuntu-Server and user has root privileges, following set of commands needs to be executed
cd deploy
sh ubuntu_setup_dependencies.sh
  • Similarly for RHEL with root permissions
cd deploy
sh ubuntu_setup_dependencies.sh
  • In case user does not have root permissions, he has to have the following syetem packages available before he can setup dependencies. After that the following script can be executed to setup required spatial libraries
    • gcc
    • g++
    • cmake
cd deploy
sh setup_spatial_libs_from_source.sh

Compiling from source

  • Once dependencies are setup, following compile script can be executed which will compile native libraries as well as the Java code using maven
sh compile.sh

Linking with Apache Spark

  • The lib/ directory needs to be available to spark worker nodes in order to process spatial queries. This can be done by setting following Spark properies either in $SPARK_HOME/conf/spark-default-conf, while creating SparkConf object or when submitting the job to spark through commandline
    • spark.driver.extraLibraryPath
    • spark.executor.extraLibraryPath

Running sample job (Single node)

Current project comes with a sample data to test SparkGIS. After setting up environment and installation as mentioned in the above section, executing following set of commands will run a sample spatial join query on sample dataset and generate a per tile heatmap. This assumes that you have already setup and running HDFS, Spark and SparkGIS on your setup.

Prepare input datasets

cd deploy
hdfs dfs -mkdir -p /sparkgis/sample_data/algo-v1
hdfs dfs -mkdir /sparkgis/sample_data/algo-v2
hdfs dfs -put sample_pia_data/Algo1-TCGA-02-0007-01Z-00-DX1 /sparkgis/sample_data/algo-v1/TCGA-02-0007-01Z-00-DX1
hdfs dfs -put sample_pia_data/Algo2-TCGA-02-0007-01Z-00-DX1 /sparkgis/sample_data/algo-v2/TCGA-02-0007-01Z-00-DX1
cd ..

Modify SparkGIS properties

  • Update following variables in conf/sparkgis.properies
    • hdfs-algo-data=/sparkgis/sample_data/
    • hdfs-hm-results=/sparkgis/sample_results/
  • Update heatmap script in scripts/generate_heatmap.sh
    • algos=' --algos "algo-v1,algo-v2"'
    • caseIDs=' --caseids "TCGA-02-0007-01Z-00-DX1"'

Execute SparkGIS

cd scripts
sh generate_heatmap.sh

This should generate heatmap results in hdfs://127.0.0.1:54310:/sparkgis/sample_results directory.

Directory Structure

  • conf/
    • Contains customizable properties for SparkGIS
  • deploy/
    • Contains deployment scripts for Ubuntu-server and CentOS
    • Other OSes coming soon ... *deps/
    • Contains compiled spatial libraries i.e. geos and libspatialindex in case root privileges are not available.
    • Environment variables SPGIS_INC_PATH and SPGIS_LIB_PATH should point to deps/include and deps/lib in case spatial libraries are compiled here. Otherwise, if spatial libraries are setup with default settings, SPGIS_INC_PATH and SPGIS_LIB_PATH should point to /usr/include and /usr/lib repectively.
  • lib/
    • Contains compiled shared library for native code
    • spark.driver.extraLibraryPath and spark.executor.extraLibraryPath should point to this folder. (Can be done in $SPARK_HOME/conf/spark-default.con or at runtime through SparkConf or through commandline while submitting job)
  • scripts/
    • Shell scripts for various jobs
  • src/
    • Source code for SparkGIS
    • src/main/java/sparkgis/* contains all Java source code
    • src/main/java/jni/* contains JNI interface and native c/c++ code

References:

Furqan Baig, Hoang Vo, Tahsin Kurc, Joel Saltz and Fusheng Wang: SparkGIS: Resource Aware Efficient In-Memory Spatial Query Processing. In Proceedings of SIGSPATIAL 2017. November 7 - 10, 2017, Redondo Beach, California, USA.

sparkgis's People

Contributors

fbaig avatar cochung08 avatar bunnyg avatar rohitshukla92 avatar tkurc avatar

Stargazers

{SET}group avatar

Watchers

James Cloos avatar Akshay Aurora avatar Atanu Ghosh avatar Hoang Vo avatar  avatar Yanhui Liang avatar  avatar Jay Lohokare avatar

sparkgis's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.