Git Product home page Git Product logo

pipelineai / pipeline Goto Github PK

View Code? Open in Web Editor NEW
4.2K 347.0 972.0 92.58 MB

PipelineAI

Home Page: https://generativeaionaws.com

License: Apache License 2.0

Shell 0.31% HTML 0.12% Scala 0.15% JavaScript 0.75% Python 1.32% Java 5.69% CSS 0.19% Groovy 0.01% Clojure 0.08% Dockerfile 0.05% Jsonnet 86.35% Jupyter Notebook 0.08% Ruby 0.01% Makefile 0.07% Go 4.27% TypeScript 0.49% PowerShell 0.01% Pug 0.02% SCSS 0.01% Jinja 0.02%
machine-learning artificial-intelligence tensorflow kubernetes cassandra spark kafka airflow docker redis

pipeline's People

Contributors

cfregly avatar echiu3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pipeline's Issues

Download PEM file - minor typo

missing the -P flag to specify download location

wget http://advancedspark.com/keys/pipeline-training-gce.pem -P ~/.ssh

Build Issue - Unable to locate package linux-tools-3.10.0-229.14.1.el7.x86_64

Steps to Reproduce

  1. provision RHEL 7 instance in amazon.
  2. Checkout the latest code from Git repo
  3. build
    After installing Oracle linux
    the build fails at

apt-get install -y linux-tools-common linux-tools-generic linux-tools-uname -r \

uname-r for the instance I am using is

3.10.0-229.14.1.el7.x86_64

Check below build error.

Reading state information...
E: Unable to locate package linux-tools-3.10.0-229.14.1.el7.x86_64
E: Couldn't find any package by regex 'linux-tools-3.10.0-229.14.1.el7.x86_64'
The command '/bin/sh -c apt-get update && apt-get install -y software-properties-common && add-apt-repository ppa:webupd8team/java && apt-get update && echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections && apt-get install -y oracle-java8-installer && apt-get install -y oracle-java8-set-default && apt-get install -y curl && apt-get install -y wget && apt-get install -y vim && apt-get install -y linux-tools-common linux-tools-generic linux-tools-uname -r && apt-get install -y nodejs && apt-get install -y npm && mkdir -p ~/.vim/{ftdetect,indent,syntax} && for d in ftdetect indent syntax ; do curl -o ~/.vim/$d/scala.vim \ https://raw.githubusercontent.com/derekwyatt/vim-scala/master/syntax/scala.vim; done && cd ~ && apt-get install -y git && apt-get install -y openssh-server && apt-get install -y default-jdk && apt-get install -y apache2 && apt-get install -y cmake && git clone --depth=1 https://github.com/jrudolph/perf-map-agent && cd perf-map-agent && cmake . && make && cd ~ && git clone --depth=1 https://github.com/brendangregg/FlameGraph && wget https://dl.bintray.com/sbt/native-packages/sbt/${SBT_VERSION}/sbt-${SBT_VERSION}.tgz && tar xvzf sbt-${SBT_VERSION}.tgz && rm sbt-${SBT_VERSION}.tgz && ln -s /root/sbt/bin/sbt /usr/local/bin && cd ~ && git clone https://github.com/fluxcapacitor/pipeline.git && sbt clean clean-files' returned a non-zero code: 100
[ec2-user@ip-172-31-26-253 pipeline]$ whoami
ec2-user

List minimum specs for creating docker containers

even though in the docker machine commands and docker commands it is clear that min 8G is required for image should state that up front (would have told me I needed to run from 16G MBP rather than day to day 8G MBA)

Read Spark ML-generated Parquet model data from Java (Spark/NetflixOSS Serving) and C++ (TensorFlow Serving)

here is a sample ALS recommendation/matrix-factorization model generated by Spark 1.6.1:

https://github.com/fluxcapacitor/pipeline/blob/master/datasets/serving/recommendations/spark-1.6.1/als.tar.gz

here are the 3 subdir's generated by the Spark code detailed below:

drwxr-xr-x 2 root root 4096 May 15 06:47 itemFactors/
drwxr-xr-x 2 root root 4096 May 15 06:47 metadata/
drwxr-xr-x 2 root root 4096 May 15 06:47 userFactors/

here is the relevant Spark 1.6.1 code that generated this model: https://github.com/apache/spark/blob/branch-1.6/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala#L242

https://github.com/apache/spark/blob/6a6010f0015542dc2753b2cb12fdd1204db63ea6/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L263

we'll have to dig around the code a bit, but the key is the DefaultParamsWriter code from that 2nd link.

btw, here's the Spark 2.0.0 version which is similar. https://github.com/apache/spark/blob/branch-2.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala

we should make sure 2.0.0 is similar.

Synchronize local dev and production deployment environments

problem is that data science teams train and develop in an environment that's very different from production

use similar docker image for development, but with file watcher to enable rapid iteration of model creation, deployment, and testing

this also has the benefit of being able to reproduce and debug issues in prod since the dev environment is the same (except maybe the size of the dataset)

Error pulling image stderr: write /root/zeppelin-0.6.0-spark

9d502da0bc8e: Error pulling image (latest) from docker.io/fluxcapacitor/pipeline, ApplyLayer exit status 1 stdout: stderr: write /root/zeppelin-0.6.0-spark-1.5.1-hadoop-2.6.0-fluxcapacitor/interpreter/spark/dep/zeppelin-spark-dependencies-0.6.0-incubating-SNAPSHOT.jar: read-only file system ubating-SNAPSHOT.jar: read-only file system

I will try without zeppelin

Port assignment in pipeline-pyspark.sh for 8754

Working through the instruction wiki - using only the code provided - when I try running pipeline-pyspark.sh or pyspark.sh I get the following:

[W 15:23:11.420 NotebookApp] server_extensions is deprecated, use nbserver_extensions
/usr/local/lib/python2.7/dist-packages/widgetsnbextension/__init__.py:30: UserWarning: To use the jupyter-js-widgets nbextension, you'll need to update
    the Jupyter notebook to version 4.2 or later.
  the Jupyter notebook to version 4.2 or later.""")
[W 15:23:11.476 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[W 15:23:11.476 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using authentication. This is highly insecure and not recommended.
[I 15:23:11.477 NotebookApp] The port 8754 is already in use, trying another port.
[C 15:23:11.477 NotebookApp] ERROR: the notebook server could not be started because no available port could be found.

Running

lsof -Pnl +M -i4

I get that jupyter should be using 8754

jupyter-n 2541        0    3u  IPv4  30881      0t0  TCP *:8754 (LISTEN)

Trying to then call pyspark still gets the port error.

Am I missing something that I should have done?

how do I start nifi?

Thanks for this great demo-box. How do I start nifi?

Just connecting to the port doesn't work...

Improve Fallback and Timeout Support Ensemble-based Predictions

related to #154

fallback in the following order

  1. statically-generated version of the most-recent live model (s3 or local disk burned at Docker image creation time?)

  2. if static version not available, fallback back to the statically-generated version of a previous live model (s3 or local disk burned at Docker image creation time)

  3. fallback to completely non-personalized as last resort (local disk burned at Docker image creation time)

Run the Docker Container using the Loaded Image

When the below command is run, getting "docker: Error response from daemon: client is newer than server (client API version: 1.22, server API version: 1.20)." error.

Appreciate your assistance.

docker run -i --privileged --name pipeline -h docker -m 8g -p 80:80 -p 36042:6042 -p 39160:9160 -p 39042:9042 -p 39200:9200 -p 37077:7077 -p 38080:38080 -p 38081:38081 -p 36060:6060 -p 36061:6061 -p 36062:6062 -p 36063:6063 -p 36064:6064 -p 36065:6065 -p 32181:2181 -p 38090:8090 -p 30000:10000 -p 30070:50070 -p 30090:50090 -p 39092:9092 -p 36066:6066 -p 39000:9000 -p 39999:19999 -p 36081:6081 -p 35601:5601 -p 37979:7979 -p 38989:8989 -p 34040:4040 -p 34041:4041 -p 34042:4042 -p 34043:4043 -p 34044:4044 -p 34045:4045 -p 34046:4046 -p 34047:4047 -p 34048:4048 -p 34049:4049 -p 34050:4050 -p 34051:4051 -p 34052:4052 -p 34053:4053 -p 34054:4054 -p 34055:4055 -p 34056:4056 -p 34057:4057 -p 34058:4058 -p 34059:4059 -p 34060:4060 -p 36379:6379 -p 38888:8888 -p 34321:54321 -p 38099:8099 -p 38754:8754 -p 37379:7379 -p 36969:6969 -p 36970:6970 -p 36971:6971 -p 36972:6972 -p 36973:6973 -p 36974:6974 -p 36975:6975 -p 36976:6976 -p 36977:6977 -p 36978:6978 -p 36979:6979 -p 36980:6980 -p 35050:5050 -p 35060:5060 -p 37060:7060 fluxcapacitor/pipeline bash

Tachyon startup and initial format are failing due to log4j:ERROR Could not instantiate class [tachyon.Log4jFileAppender]

Per @BrentDorsey

Tachyon raised three errors when I tried to start it manually:

log4j:ERROR Could not instantiate class [tachyon.Log4jFileAppender]. 
log4j:WARN No such property [deletionPercentage] 
Storage format error

To get the Tachyon Web UI working I propose the following changes:

Setup the Environment: bug on path

On the setup The Environment page, you say that we should launch the following command.
root@docker$ ~/pipeline/bin/initial/RUNME_ONCE.sh
Instead of the previous command, the correct one is
root@docker$ ~/pipeline/bin/RUNME_ONCE.sh

By the way, thank you for your example.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.