Git Product home page Git Product logo

awslabs / aws-orbit-workbench Goto Github PK

View Code? Open in Web Editor NEW
127.0 13.0 26.0 55.01 MB

A Data Platform built for AWS, powered by Kubernetes.

Home Page: https://awslabs.github.io/aws-orbit-workbench/

License: Apache License 2.0

Python 83.69% Shell 5.03% Dockerfile 0.76% JavaScript 0.07% HTML 0.56% CSS 0.22% Jupyter Notebook 0.63% TypeScript 7.66% Smarty 1.20% Mustache 0.16%
kubernetes eks jupyter jupyterhub data-analysis datalake mach orbit-workbench gpu eks-cluster

aws-orbit-workbench's Introduction

Python Version Code style: black License Checked with mypy Static Checking

AWS Orbit Workbench is currently archived and is accessible via READ-ONLY means.

Orbit Workbench is an open framework for building team-based secured data environment. Orbit workbench is built on Kubernetes using Amazon Managed Kubernetes Service (EKS), and provides both a command line tool for rapid deployment as well as Python SDK, Jupyter Plugins and more to accelerate data analysis and ML by integration with AWS analytics services such as Amazon Redshift, Amazon Athena, Amazon EMR, Amazon SageMaker and more.

Orbit Workbench deploys secured team spaces that are mapped to Kubernetes namespaces and span into AWS cloud resources. Each team is a secured zone where only members of the team can access allowed data and share data and code freely within the team. Orbit automatically creates file storage for each team using Amazon EFS, security group and IAM role for each team , as well as their own JupyterHub and Jupyter Server. Orbit workbench users are also capable of launching python code or Jupyter Notebooks as Kubernetes containers or as Amazon Fargate containers. Orbit workbench provides CLI tool for users to build their own custom images and use it to deploy containers or customize their Jupyter environment.

GPU-based algorithms are easily supported by Orbit that pre-configures EKS to allow GPU loads as well as provide examples of how to build images that support GPU accelerations.

If you are looking to build your own Data & ML Platform for your company on AWS, give Orbit Workbench a chance to accelarate your business outcome using AWS Services.

Contributors are welcome!

Please see our Home for installation and usage guides.

Feature List **

  • Collaborative Team Spaces

    • Isolated Team spaces with pre-provisioned access to data sources
    • Team isolation enforced by EKS Namespace as well as AWS constructs such as security groups and roles
    • Shared file system between team users via EFS
    • Scratch shared space for your team only on S3 with defined expiration time
    • Jupyter Plugin to support users with Kubernetes (Jobs, Pods, Volumes and more ) and AWS resources (Athena, Redshift, Glue Catalog, S3, and more)
  • Compute

    • Build your own docker image using Orbit CLI on a remote AWS codebuild and into ECR Repository
    • Support for GPU Node pools
    • Support Dockers with GPU drivers for use of PyTorch, TensorFlow, and others
    • Shared node pools for all teams with storage isolation
    • Auto-Scaling EKS Node pools (coming soon)
  • Security

    • Jupyter Hub integration with SSO Providers via Cognito
    • Ability to map SSO Group to a team to control authentication
  • Deployment

    • Deployment of all AWS and EKS resources via a simple declarative manifest
    • Ability to add and remove teams dynamically
    • Support for Kubernetes Administrative Teams
  • AWS Analytic Services Integrations

    • Amazon Redshift
    • Amazon SageMaker api calls and Kubernetes Operator
    • Amazon EMR on EKS Kubernetes Operator
    • Amazon Athena
    • AWS Glue DataBrew
    • AWS Lake Formation

Create an AWS Orbit Workbench trial environment

Feel free to create a full AWS Orbit Workbench environment in its own VPC.
You can always clone or fork this repo and install via CLI, but if you are just investigating the Workbench, we have provided a standard deployment.

Please follow these steps.

1. Create the AWS Orbit Workbench

Deploy Region Name Region
๐Ÿš€ US East (N. Virginia) us-east-1
๐Ÿš€ US East (Ohio) us-east-2
๐Ÿš€ US West (N. California) us-west-1
๐Ÿš€ US West (Oregon) us-west-2
๐Ÿš€ EU (London) eu-west-2

This reference deployment can only be deployed to Regions denoted above.

The CloudFormation template has all the necessary parameters, but you may change as needed:

  • Cloudformation Parameters

    • Version: The version of Orbit Workbench (corresponds to the versions of aws-orbit in pypi)
    • K8AdminRole: An existing role in your account that has admin access to the EKS cluster
  • The Cloudformation stack will create two(2) AWS CodePipelines:

    • Orbit_Deploy_trial - which will start automatically and create your workbench
    • Orbit_Destroy_trial - which will start automatically and will destroy your workbench
      • this pipeline has a Manual Approval stage that prevents your workbench from moving forward with the destroy process

Once your pipelines are created, the Orbit_Destroy_trial pipeline will wait for you to approve the next stage (which we don't want to do yet).

Go to the Orbit_Destroy_trial pipeline, click Stop Execution then Stop and Abandon. Abandoning the pipeline prevents the job from timing out and stopping at a later time.

The Orbit_Deploy_trial pipeline takes approximaeluy 70-90 minutes to complete.

2. Get your access URL

When the Orbit_Deploy_trial pipeline does complete, go to the EC2 page --> Load Balancing --> Load Balancers and look for the alb we have created...it have a naming pattern of xxxxxxxx-istiosystem-istio-xxxx. Get the DNS of the alb.

The AWS Orbit Workbench homepage will be located at:

https://xxxxxxxx-istiosystem-istio-xxxx-1234567890.{region}.elb.amazonaws.com/orbit/login

You can browse that url. We are using self-signed certs, so your browser may complain, but it is save to Accept and Continue to the site.

The default username and password are:

Username: orbit
Password: OrbitPwd1!

You will be promted to change the password.

Cleaning up the example resources

To remove all workbench resources , do the following:

  1. Goto the Orbit_Destroy_trial pipeline and click 'Release Change'
    • When the CLI_ApproveDestroy stage is active, click Review and then Approve so the pipeline will continue
  2. Wait until the Orbit_Destroy_trial completes
  3. Delete the Cloudformation Stack trial
    • if the template fails to destroy due to objects in the S3 bucket, it is ok to Empty the bucket and delete the stack again

Contributing

Contributing Guidelines: ./CONTRIBUTING.md

License

This project is licensed under the Apache-2.0 License.

**: for detailed feature list by release, please see our release page in the wiki tab

aws-orbit-workbench's People

Contributors

abaror avatar bdesert avatar benalta avatar chamcca avatar dependabot[bot] avatar dgraeber avatar igorborgest avatar nickcorbett avatar rb201 avatar srinivasreddych avatar stthoom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aws-orbit-workbench's Issues

ISSUE-70 Inconsistency in Deploy Mode

Repository Modes are inconsistent.
cli/datamaker_cli/docker.py:95 refers to a mode as "source",
while
cli/datamaker_cli/commands/deploy.py:49 refers to a mode as "code".

need to align.

Error: No such command 'env'.

bash-4.2$ orbit init
[ Info ] Env Manifest generated into conf folder  
[ Tip ] Recommended next step: orbit deploy foundation -f default-foundation.yaml
[ Tip ] Then, fill up the manifest file (default-env-manifest.yaml) and run: orbit env -f default-env-manifest.yaml
                                                  
Initializing |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 100% 
bash-4.2$ orbit env -f default-env-manifest.yaml
Usage: orbit [OPTIONS] COMMAND [ARGS]...
Try 'orbit --help' for help.

Error: No such command 'env'.

Same thing for ECS?

Hi,
Any plans on implementing the same solution to deploy with ECS too ? EKS / K8s might be popular, but ECS is your very own docker containers orchestrator, why is it not treated with at least as much priority as EKS ?
I read Jupyter plugin, so sure there might already be something there for K8s, but what about AWS providing an ECS plugin?
Thanks,

Update profiles with an input file

Currently one can only append/delete one profile at a time, which is inconvenient if one wants to modify multiple profiles.
An option to overwrite current profiles with an input file containing a list of profile would make modification easier.

Primary Manifest too large for Parameter Store

the primary manifest stored in the PS contains the manifest for all teams as well as the original raw. when there are more than two teams, this results in a json doc larger than the 8192 characters supported by Advanced Parameter Store values.

ISSUE-125 Broken SDK renaming

Describe the Issue:

rename script renamed to orbit_sdk, instead of aws_orbit_sdk.

To Reproduce:

N/A

Additional Context:

Open controller.py
the line shows from orbit_sdk.common import ...
it supposed to be from aws_orbit_sdk.common import ...

ISSUE-104 hardcoded toolkit bucket

Describe the Issue:

Example 1 in Lake Creator has hardcoded environment name in the toolkit bucket. It's failing to execute Example 1.

To Reproduce:

  1. Run samples/notebooks/A-LakeCreator/Example-1-Build-Lake.ipynb to the dm_s3_bucket...
  2. Failed on SSM key.

Additional Context:

environment name should be taken from the workspace and SSM key should be parameterized.

dm_s3_bucket = json.loads(ssm.get_parameter(
    Name=f'/datamaker/{workspace.get("env_name")}/manifest'
)['Parameter']['Value'])['toolkit-s3-bucket']

CDK Upgrades

Want to bring light to a larger question on how version bumps should be orchestrated with https://github.com/aws/aws-cdk

Currently, the Orbit CLI uses 1.67, and the CDK is moving quickly, with its latest release hitting 1.95 last week.

It's not obvious whether to version bump for the latest L2 CDK constructs, even though they are very helpful for future development. The CDK team makes it clear these minor versions will contain breaking changes.

I'm very new to CDK, so I'm still learning best practices, but would it be helpful to have test coverage on validating various CDK constructs? Upon research, I see plenty of resources around testing with CDK using Typescript, but not a lot with Python. ๐Ÿค”

Genisis for this was from unsupported UserPoolClient construct in cdk 1.67 #295 (comment)

Tensorboard connection timeout

Tensorboard is not able to connect. We assume there is some port we need to open in order to make tensorboard to work.

Screen Shot 2021-04-19 at 10 49 55 AM

ISSUE-57 ECS Containers cannot start from notebook due to the role

Specify a Type of Issue:

BUG

Describe the Issue:

controller.run_notebooks isn't working due to the missing linked service role for ECS.

To Reproduce:

run in a notebook :

from datamaker_sdk import controller
def run_file():
    notebooks = []
    notebook = {
      "notebookName": "Test-Container.ipynb",
      "sourcePath": "notetbooks/input",
      "targetPath": "notetbooks/output",
      "params": {
        #"bucketName": bucket_name,
      }        
    }
    notebooks.append(notebook)

    notebooksToRun = {
      "compute": {
          "container" : {
              "p_concurrent": "10"
          }
      },

      "tasks":  notebooks  
    }
    # notebooks
    containers = controller.run_notebooks(notebooksToRun)
    print (containers)
    controller.wait_for_tasks_to_complete(containers, 60,10, False)

run_file()

Additional Context:

workaround available:
run in a terminal:
aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com

[Feature] Ability to monitor usage

The administrator should be able to view jobs each user launches and resources each user is using.
It would be even better if there is some sort of stats or graphs that records all the usage.

Orbit destroy won't remove pypi upstream

Currently destroying the orbit env and foundation won't remove the pypi upstream, which could be annoying and confuses the user.
Solution: recover the ~/.config/pip/pip.conf while destroying the foundation

Keep installed pip packages even when notebook is terminated

Issue:
Currently installed pip packages will be gone if the notebook is terminated. This could be an issue as the notebook will be terminated if it's not active, and users will have to reinstall everything again.

Tried Solution:
We tried to change the default pip installation path to be a folder in EFS by setting env variables in Dockerfile, so that packages won't be removed. However, it looks like the folder in EFS got recreated every time the notebook boots up. Below is a snippet of code we used:

# Customize pip installation location
RUN mkdir -p /home/jovyan/private/site_packages
ENV PIP_TARGET=/home/jovyan/private/site_packages
ENV PYTHONPATH=$PYTHONPATH:/home/jovyan/private/site_packages

Question:
Isn't everything in private permanent and won't be removed after each launch up? Or is there any special setting in jupyterhub that disable us doing it. Also, is there any better solution to address this pip package issue?

ISSUE-81 Containers API is broken

Describe the Issue:

When running notebooks in containers using controller API, it won't run

To Reproduce:

create a configuration to execute in a container.
run controller.run_notebooks(notebooksToRun) API

Additional Context:

controller API and notebook_runner seems to be broken.

ISSUE-103 Template for Issues isn't working

Describe the Issue:

Issue template isn't working.

To Reproduce:

  1. Click on Create new Issue - the body is empty

Additional Context:

need to introduce confiig.yml file with default issue template(s)

ISSUE-51 Add sample notebooks - Lake Creator

Specify a Type of Issue:

FEATURE

Describe the Issue:

Need to add example notebooks that would show how APIs are being used. Use containers to run nested notebooks

To Reproduce:

N/A

Additional Context:

N/A

[Feature] User should be able to reattach and delete their ebs

  1. Currently, the ebs is attached to a server based on its name. If a user is able to choose which ebs he/she wants to attach, it would be more useful.
  2. User should be able to delete their ebs when they don't need it anymore. Or some garbage collecting logic for ebs is needed. Otherwise, one can keep starting server with different names and end up with bunch of ebs.

ISSUE-55 missing tools for notebooks

Specify a Type of Issue:

FEATURE

Describe the Issue:

missing tools for notebooks:

  • zip and unzip

To Reproduce:

  1. open landing page
  2. create a server
  3. open terminal
  4. type in command line: zip or unzip

Additional Context:

Workaround:
apt-get install zip unzip

if no sudo access, then:

  • mkdir $HOME/pkgs & cd $HOME/pkgs
  • apt-get download zip unzip
  • for f in `ls *zip*.deb`; do echo dpkg -x $f $HOME/pkgs; done
  • export PATH=$PATH:$HOME/pkgs/usr/bin

Note, this will not make those packages available in notebooks.

[FEATURE] Ray HPO Tune Integration

We've managed to run an axample ray cluster and HPO on kubenetes. Here are the steps:

  1. Connect local kubectl to the eks cluster
  2. Pip install ray locally: pip install ray
  3. Clone ray repo: git clone https://github.com/ray-project/ray.git
  4. Launch up a ray cluster by: ray up ray/python/ray/autoscaler/kubernetes/example-full.yaml
  5. Check the cluster got launched by: kubectl -n ray get pods
  6. To create a sample job: kubectl create -f ray/doc/kubernetes/job-example.yaml
    • You can modify the script got downloaded in the yaml file to run different script
  7. To check result: kubectl -n ray logs <launched job pod>
  8. To tear down the cluster: ray down ray/python/ray/autoscaler/kubernetes/example-full.yaml

Create run demo page on wiki

Have a page that starts from deployment of the sample/manifest with lake-creator and lake-user , to download demo data, run lake-creator notebook and then run user regression. Document step by step.

Fix logout on kubeflow

When user needs to terminate his session, we need to call this url:
/logout?response_type=code&client_id=&redirect_uri=&state=STATE&scope=openid+profile+aws.cognito.signin.user.admin

We need to create small service with html page offering the link to click on to terminate session and redirect them into a new login screen.

the small web server will use something like this form the redirect users into a new login.

<html>
<head>
    <title>This website has moved</title>
    <meta http-equiv="refresh" content="1;url=<cognito_ep>/logout?response_type=code&client_id=<clientid>&redirect_uri=<ingressalb>&state=STATE&scope=openid+profile+aws.cognito.signin.user.admin">
    <meta name="robot" content="noindex,follow">
</head>
<body>
Your session has been terminated, you will be redirected to a new login page now.
</body>
</html>

ISSUE-58 Incorrect Output Dir in Container

Specify a Type of Issue:

BUG

Describe the Issue:

When task is defined as following:

      "notebookName": "Test-Container.ipynb",
      "sourcePath": "private/notebooks/input",
      "targetPath": "private/notebooks/output",

the input path resolves to:
/home/jovyan/private/notebooks/input/Test-Container.ipynb
but output path is resolves to:
/home/jovyan/private/outputs/private/notebooks/output/Test-Container/e1@20201119-15:06.ipynb
instead of:
/home/jovyan/private/notebooks/output/Test-Container/e1@20201119-15:06.ipynb

To Reproduce:

run notebooks using controller API as defined above.

Additional Context:

Recommend to remove hard coded output part, and use it only as default if no explicit targetPatth is provided.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.