Git Product home page Git Product logo

awslabs / aws-orbit-workbench Goto Github PK

View Code? Open in Web Editor NEW
127.0 13.0 26.0 55.01 MB

A Data Platform built for AWS, powered by Kubernetes.

Home Page: https://awslabs.github.io/aws-orbit-workbench/

License: Apache License 2.0

Python 83.69% Shell 5.03% Dockerfile 0.76% JavaScript 0.07% HTML 0.56% CSS 0.22% Jupyter Notebook 0.63% TypeScript 7.66% Smarty 1.20% Mustache 0.16%
kubernetes eks jupyter jupyterhub data-analysis datalake mach orbit-workbench gpu eks-cluster

aws-orbit-workbench's Issues

Primary Manifest too large for Parameter Store

the primary manifest stored in the PS contains the manifest for all teams as well as the original raw. when there are more than two teams, this results in a json doc larger than the 8192 characters supported by Advanced Parameter Store values.

Keep installed pip packages even when notebook is terminated

Issue:
Currently installed pip packages will be gone if the notebook is terminated. This could be an issue as the notebook will be terminated if it's not active, and users will have to reinstall everything again.

Tried Solution:
We tried to change the default pip installation path to be a folder in EFS by setting env variables in Dockerfile, so that packages won't be removed. However, it looks like the folder in EFS got recreated every time the notebook boots up. Below is a snippet of code we used:

# Customize pip installation location
RUN mkdir -p /home/jovyan/private/site_packages
ENV PIP_TARGET=/home/jovyan/private/site_packages
ENV PYTHONPATH=$PYTHONPATH:/home/jovyan/private/site_packages

Question:
Isn't everything in private permanent and won't be removed after each launch up? Or is there any special setting in jupyterhub that disable us doing it. Also, is there any better solution to address this pip package issue?

[Feature] Ability to monitor usage

The administrator should be able to view jobs each user launches and resources each user is using.
It would be even better if there is some sort of stats or graphs that records all the usage.

ISSUE-57 ECS Containers cannot start from notebook due to the role

Specify a Type of Issue:

BUG

Describe the Issue:

controller.run_notebooks isn't working due to the missing linked service role for ECS.

To Reproduce:

run in a notebook :

from datamaker_sdk import controller
def run_file():
    notebooks = []
    notebook = {
      "notebookName": "Test-Container.ipynb",
      "sourcePath": "notetbooks/input",
      "targetPath": "notetbooks/output",
      "params": {
        #"bucketName": bucket_name,
      }        
    }
    notebooks.append(notebook)

    notebooksToRun = {
      "compute": {
          "container" : {
              "p_concurrent": "10"
          }
      },

      "tasks":  notebooks  
    }
    # notebooks
    containers = controller.run_notebooks(notebooksToRun)
    print (containers)
    controller.wait_for_tasks_to_complete(containers, 60,10, False)

run_file()

Additional Context:

workaround available:
run in a terminal:
aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com

[Feature] User should be able to reattach and delete their ebs

  1. Currently, the ebs is attached to a server based on its name. If a user is able to choose which ebs he/she wants to attach, it would be more useful.
  2. User should be able to delete their ebs when they don't need it anymore. Or some garbage collecting logic for ebs is needed. Otherwise, one can keep starting server with different names and end up with bunch of ebs.

ISSUE-58 Incorrect Output Dir in Container

Specify a Type of Issue:

BUG

Describe the Issue:

When task is defined as following:

      "notebookName": "Test-Container.ipynb",
      "sourcePath": "private/notebooks/input",
      "targetPath": "private/notebooks/output",

the input path resolves to:
/home/jovyan/private/notebooks/input/Test-Container.ipynb
but output path is resolves to:
/home/jovyan/private/outputs/private/notebooks/output/Test-Container/e1@20201119-15:06.ipynb
instead of:
/home/jovyan/private/notebooks/output/Test-Container/e1@20201119-15:06.ipynb

To Reproduce:

run notebooks using controller API as defined above.

Additional Context:

Recommend to remove hard coded output part, and use it only as default if no explicit targetPatth is provided.

Error: No such command 'env'.

bash-4.2$ orbit init
[ Info ] Env Manifest generated into conf folder  
[ Tip ] Recommended next step: orbit deploy foundation -f default-foundation.yaml
[ Tip ] Then, fill up the manifest file (default-env-manifest.yaml) and run: orbit env -f default-env-manifest.yaml
                                                  
Initializing |█████████████████████████████| 100% 
bash-4.2$ orbit env -f default-env-manifest.yaml
Usage: orbit [OPTIONS] COMMAND [ARGS]...
Try 'orbit --help' for help.

Error: No such command 'env'.

ISSUE-70 Inconsistency in Deploy Mode

Repository Modes are inconsistent.
cli/datamaker_cli/docker.py:95 refers to a mode as "source",
while
cli/datamaker_cli/commands/deploy.py:49 refers to a mode as "code".

need to align.

ISSUE-81 Containers API is broken

Describe the Issue:

When running notebooks in containers using controller API, it won't run

To Reproduce:

create a configuration to execute in a container.
run controller.run_notebooks(notebooksToRun) API

Additional Context:

controller API and notebook_runner seems to be broken.

ISSUE-103 Template for Issues isn't working

Describe the Issue:

Issue template isn't working.

To Reproduce:

  1. Click on Create new Issue - the body is empty

Additional Context:

need to introduce confiig.yml file with default issue template(s)

[FEATURE] Ray HPO Tune Integration

We've managed to run an axample ray cluster and HPO on kubenetes. Here are the steps:

  1. Connect local kubectl to the eks cluster
  2. Pip install ray locally: pip install ray
  3. Clone ray repo: git clone https://github.com/ray-project/ray.git
  4. Launch up a ray cluster by: ray up ray/python/ray/autoscaler/kubernetes/example-full.yaml
  5. Check the cluster got launched by: kubectl -n ray get pods
  6. To create a sample job: kubectl create -f ray/doc/kubernetes/job-example.yaml
    • You can modify the script got downloaded in the yaml file to run different script
  7. To check result: kubectl -n ray logs <launched job pod>
  8. To tear down the cluster: ray down ray/python/ray/autoscaler/kubernetes/example-full.yaml

Create run demo page on wiki

Have a page that starts from deployment of the sample/manifest with lake-creator and lake-user , to download demo data, run lake-creator notebook and then run user regression. Document step by step.

Orbit destroy won't remove pypi upstream

Currently destroying the orbit env and foundation won't remove the pypi upstream, which could be annoying and confuses the user.
Solution: recover the ~/.config/pip/pip.conf while destroying the foundation

ISSUE-51 Add sample notebooks - Lake Creator

Specify a Type of Issue:

FEATURE

Describe the Issue:

Need to add example notebooks that would show how APIs are being used. Use containers to run nested notebooks

To Reproduce:

N/A

Additional Context:

N/A

Fix logout on kubeflow

When user needs to terminate his session, we need to call this url:
/logout?response_type=code&client_id=&redirect_uri=&state=STATE&scope=openid+profile+aws.cognito.signin.user.admin

We need to create small service with html page offering the link to click on to terminate session and redirect them into a new login screen.

the small web server will use something like this form the redirect users into a new login.

<html>
<head>
    <title>This website has moved</title>
    <meta http-equiv="refresh" content="1;url=<cognito_ep>/logout?response_type=code&client_id=<clientid>&redirect_uri=<ingressalb>&state=STATE&scope=openid+profile+aws.cognito.signin.user.admin">
    <meta name="robot" content="noindex,follow">
</head>
<body>
Your session has been terminated, you will be redirected to a new login page now.
</body>
</html>

CDK Upgrades

Want to bring light to a larger question on how version bumps should be orchestrated with https://github.com/aws/aws-cdk

Currently, the Orbit CLI uses 1.67, and the CDK is moving quickly, with its latest release hitting 1.95 last week.

It's not obvious whether to version bump for the latest L2 CDK constructs, even though they are very helpful for future development. The CDK team makes it clear these minor versions will contain breaking changes.

I'm very new to CDK, so I'm still learning best practices, but would it be helpful to have test coverage on validating various CDK constructs? Upon research, I see plenty of resources around testing with CDK using Typescript, but not a lot with Python. 🤔

Genisis for this was from unsupported UserPoolClient construct in cdk 1.67 #295 (comment)

ISSUE-104 hardcoded toolkit bucket

Describe the Issue:

Example 1 in Lake Creator has hardcoded environment name in the toolkit bucket. It's failing to execute Example 1.

To Reproduce:

  1. Run samples/notebooks/A-LakeCreator/Example-1-Build-Lake.ipynb to the dm_s3_bucket...
  2. Failed on SSM key.

Additional Context:

environment name should be taken from the workspace and SSM key should be parameterized.

dm_s3_bucket = json.loads(ssm.get_parameter(
    Name=f'/datamaker/{workspace.get("env_name")}/manifest'
)['Parameter']['Value'])['toolkit-s3-bucket']

Tensorboard connection timeout

Tensorboard is not able to connect. We assume there is some port we need to open in order to make tensorboard to work.

Screen Shot 2021-04-19 at 10 49 55 AM

ISSUE-55 missing tools for notebooks

Specify a Type of Issue:

FEATURE

Describe the Issue:

missing tools for notebooks:

  • zip and unzip

To Reproduce:

  1. open landing page
  2. create a server
  3. open terminal
  4. type in command line: zip or unzip

Additional Context:

Workaround:
apt-get install zip unzip

if no sudo access, then:

  • mkdir $HOME/pkgs & cd $HOME/pkgs
  • apt-get download zip unzip
  • for f in `ls *zip*.deb`; do echo dpkg -x $f $HOME/pkgs; done
  • export PATH=$PATH:$HOME/pkgs/usr/bin

Note, this will not make those packages available in notebooks.

ISSUE-125 Broken SDK renaming

Describe the Issue:

rename script renamed to orbit_sdk, instead of aws_orbit_sdk.

To Reproduce:

N/A

Additional Context:

Open controller.py
the line shows from orbit_sdk.common import ...
it supposed to be from aws_orbit_sdk.common import ...

Same thing for ECS?

Hi,
Any plans on implementing the same solution to deploy with ECS too ? EKS / K8s might be popular, but ECS is your very own docker containers orchestrator, why is it not treated with at least as much priority as EKS ?
I read Jupyter plugin, so sure there might already be something there for K8s, but what about AWS providing an ECS plugin?
Thanks,

Update profiles with an input file

Currently one can only append/delete one profile at a time, which is inconvenient if one wants to modify multiple profiles.
An option to overwrite current profiles with an input file containing a list of profile would make modification easier.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.