Git Product home page Git Product logo

retail-demo-xd's Introduction

xd-demo with Pivotal HD Retail data

Contributors

Demo User Story
We want to ingest real time orders from our POS system directly to HDFS via a pipe delimited HTTP post. A sample post looks like:

Customer ID, Order ID, Order Amount, Store ID
curl -d "{\"orderid\":\"123\",\"storeid\":\"456\",\"customerid\":\"789\",\"orderamount\":\"5000.01\"}" http://localhost:8000 - Good Post
curl -d "{\"orderid\":\"BAD_DATA\",\"storeid\":\"456\",\"customerid\":\"789\",\"orderamount\":\"5000.01\"}" http://localhost:8000 - Bad Post
123|456|789|5000.01 - Dream State in HDFS with HAWQ and in-memory Query

We are going to re-use some integration work that was done in the past and we need to transform and filter the POS data before ingesting into hadoop. The HTTP stream will accept JSON formatted key/value pairs of Order data. Some orders have bad data. We need to filter these records before persisting them to HDFS. After landing the data into hadoop, we would like to run SQL analytics on the orders to see if they match known fraudulent orders from the past. Hive is not an option because it does not provide fast enough response time and full ANSI compliance. We want to run a logistic regression model on all
orders to feed our real-time fraud detection applications that aim to catch criminals before they leave the store. The logistic regression model needs to be re-trained periodically via a scheduled process. The in-memory fraud data store needs to be flushed on a configurable interval and HDFS files need to be archived via a scheduled process.

In order to get this running with Pivotal HD

  1. Start Pivotal HD instance. It is optional to run the "pivotal-samples" data labs to populate the retail_demo DB with HAWQ tables/data. The "pivotal-samples" github project is located at:

    https://github.com/PivotalHD/pivotal-samples

  2. Download and install the latest Spring XD binary. The project is located at:

    http://projects.spring.io/spring-xd/

  3. <<<<<<< HEAD

  4. Update your spring-xd hadoop config ($SPRING_XD/conf/hadoop.properties) to reflect your hdfs address: =======
  5. Update your spring-xd hadoop config ($SPRING_XD/xd/config/hadoop.properties) to reflect webhdfs: >>>>>>> 4985ef63c23b7c2723e426e91d14f685bebacd48

    fs.default.name=hdfs://my-hadoop:8020

  6. Open config.py and add entries for each property. This is very important to ensure connectivity to Pivotal HD and SQLFire.
  7. In a terminal window run(will scp python demo scripts to pivotal hd and sqlfire VMs. Will copy spring xd scripts, lib jars, modules and sink config:
    ./install.py
  8. Run 3 Spring XD runtimes in terminal windows(redis, admin, container)
    sudo sysctl -w net.inet.tcp.msl=1000
    $SPRING_XD/redis/bin/redis-server
    $SPRING_XD/xd/bin/xd-admin --hadoopDistro phd1
    $SPRING_XD/xd/bin/xd-container --hadoopDistro phd1
  9. Run Spring XD Shell in a terminal window


    $SPRING_XD/shell/bin/spring-xd-shell --hadoopDistro phd1

  10. In Spring XD Shell - Create Hadoop ingest, Pivotal HD analytics tap and SQLFire sink. script --file ../../xd/cmd/create-all.cmds
  11. [PIVOTALHD TERMINAL] Open an ssh session to your Pivotal VM and run this script. You must do this before starting the data stream.
    ./demo.py setup_hdfs
  12. In a terminal window, run send_data.py to start a data stream simulation.
    ./send_data.py
  13. [SQLFIRE TERMINAL] Verify that SQLFire is getting only a small subset of orders
    ./demo.py query
  14. In Spring XD Shell - Re-run batch jobs(should delete SQLFire data, populate HAWQ tables, and re-run analytic training model)
    script --file ../../xd/cmd/deploy-batch.cmds
  15. In Spring XD Shell - Reset the richgauge taps to 0)
    script --file ../../xd/cmd/reset-taps.cmds
  16. [PIVOTALHD_TERMINAL] Run a PXF and HAWQ Query
    ./demo.py query_hawq
  17. Install DB Visualizer and run queries through a JDBC client GUI. http://www.dbvis.com/. You will need to add a new "Cache" Driver JAR for SQLFire. You will need to modify '/data/1/hawq_master/gpseg-1/pg_hba.conf' in your Pivotal HD VM to remote connect.
  18. [PIVOTALHD TERMINAL] Restart Pivotal HD via the stop/start scripts.
    /home/gpadmin/stop_all.sh;
    /home/gpadmin/start_all.sh;
  19. In Spring XD Shell - Remove all streams/taps from Spring XD. Does not delete any data) script --file ../../xd/cmd/destroy-all.cmds

xd-demo-client

  1. Update app.properties (src/main/webapps/WEB-INF/classes) to reflect the IP addresses of your sqlfire environment
  2. Open a terminal and build the war via maven
    mvn install
  3. Copy the WAR file to a working tc Server or Tomcat server
  4. The application will be available at: http://localhost:8080/xd-demo-client/resources/index.html

retail-demo-xd's People

Contributors

cfossguy avatar malston-pivotal avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.