Git Product home page Git Product logo

ray-project / ray-educational-materials Goto Github PK

View Code? Open in Web Editor NEW
293.0 10.0 51.0 24.55 MB

This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.

License: Apache License 2.0

Jupyter Notebook 96.43% Python 3.57%
deep-learning distributed-machine-learning ray-distributed ray-tune ray ray-train ray-data ray-serve generative-ai llm

ray-educational-materials's Introduction

Ray Educational Materials

© 2022, Anyscale Inc. All Rights Reserved

github-stars join-ray-slack discuss twitter

Introductory notebooks test Ray core notebooks test Semantic segmentation notebooks test Observability notebooks test

Welcome to a collection of education materials focused on Ray, a distributed compute framework for scaling your Python and machine learning workloads from a laptop to a cluster.

Recommended Learning Path

Module Description
Overview of Ray An Overview of Ray and entire Ray ecosystem.
Introduction to Ray AI Runtime An Overview of the Ray AI Runtime.
Ray Core: Remote Functions as Tasks Learn how arbitrary functions to be executed asynchronously on separate Python workers.
Ray Core: Remote Objects Learn about objects that can be stored anywhere in a Ray cluster.
Ray Core: Remote Classes as Actors, part 1 Work with stateful actors.
Ray Core: Remote Classes as Actors, part 2 Learn "Tree of Actors" pattern.
Ray Core: Ray API best practices Learn Ray patterns & anti-patterns and best practices.
Scaling batch inference Learn about scaling batch inference in computer vision with Ray.
Optional: Batch inference with Ray Datasets Bonus content for scaling batch inference using Ray Datasets.
Scaling model training Learn about scaling model training in computer vision with Ray.
Ray observability part 1 Introducing the Ray State API and Ray Dashboard UI as tools for observing the Ray cluster and applications.
LLM model fine-tuning and batch inference Fine-tuning a Hugging Face Transformer (FLAN-T5) on the Alpaca dataset. Also includes distributed hyperparameter tuning and batch inference.
Multilingual chat with Ray Serve Serving a Hugging Face LLM chat model with Ray Serve. Integrating multiple models and services within Ray Serve (language detection and translation) to implement multilingual chat.

Connect with the Ray community

You can learn and get more involved with the Ray community of developers and researchers:

  • Ray documentation

  • Official Ray site Browse the ecosystem and use this site as a hub to get the information that you need to get going and building with Ray.

  • Join the community on Slack Find friends to discuss your new learnings in our Slack space.

  • Use the discussion board Ask questions, follow topics, and view announcements on this community forum.

  • Join a meetup group Tune in on meet-ups to listen to compelling talks, get to know other users, and meet the team behind Ray.

  • Open an issue Ray is constantly evolving to improve developer experience. Submit feature requests, bug-reports, and get help via GitHub issues.

  • Become a Ray contributor We welcome community contributions to improve our documentation and Ray framework.

ray-educational-materials's People

Contributors

ammirato avatar dependabot[bot] avatar dmatrix avatar emmyscode avatar haochunchang avatar jcoffi avatar kamil-kaczmarek avatar markintoshz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ray-educational-materials's Issues

[Suggestion]: Improve Readability of Ray Serve Use Case Image

Please share your suggestion here

The collection of diagrams for the Ray Serve use case under the section "Mutli-model composition for model serving" is illegible and cluttered. Replace this image with a more readable diagram whenever it becomes available.

[Bug]: Failing to read AWS S3 file(s)

Notebook with bug

https://github.com/ray-project/ray-educational-materials/blob/main/Introductory_modules/Introduction_to_Ray_AI_Runtime.ipynb

What happened?

Failed to execute following python code:

# Read Parquet file to Ray Dataset.
dataset = ray.data.read_parquet(
    "s3://anyscale-training-data/intro-to-ray-air/nyc_taxi_2021.parquet"
)




### Environment info



Python version: 3.11.3
Ray version: 2.5.0



### Issue Severity

High: It blocks me from completing my task.

[Suggestion]: Ray use cases section should split simple scaling vs advanced use cases

Please share your suggestion here

Currently the list of use cases in https://github.com/ray-project/ray-educational-materials/blob/main/Introductory_modules/Overview_of_Ray.ipynb contains the following:

  • Exoshuffle
  • Building a custom feature engineering library
  • Alpa
  • RLlib / FIFA
  • Multi-model serving
  • RL training / Riot
  • ML platform / Shopify
  • ML platform / Spotify

This is skewed toward advanced use cases, which I don't think accurately reflects the entire target audience of Ray. I think it would be productive to break this down into two categories:

  • Scaling simple ML workloads
    • Batch inference on CPUs and GPUs (Core / Data)
    • Parallel training of many small models / Distributed training of large models (Core / Train)
    • Managing parallel experiments and hyperparameter tuning (Tune)
    • Serving model pipelines or multiple models (Serve)
    • Reinforcement Learning (RLlib)
    • ML platform use cases (Shopify, Spotify)
  • Implementing advanced ML workloads
    • Alpha
    • Exoshuffle
    • Custom feature eng library
    • RL training / Riot / FIFA

[Suggestion]: add "Part 3" to the Overview of Ray

Please share your suggestion here

Add Part 3, that will consist of small coding exercises:

Work with Object store

  • add object with ray.put()
  • print returned object reference
  • use ray.get() to access value of the object.
  • Mention that tasks and actors return futures that are references as well.

Compute pi digits
Use this docs example to show highly_parallel computational job - compute pi digits.

[Suggestion]: Reorganize this repository under consistent directories centered around workflows.

Please share your suggestion here

As the number of different notebooks grows, it becomes more and more difficult to surface what it is that users are interested in. Right now, the directories are named around either relevant library (e.g. "Ray Core") or around type of data (e.g. "Computer_vision_workloads").

At the very least, these conventions should be consistent, and ideally, centered around workflows that developers would relate to. In addition, the README should increase in quality to better describe this repository as well as direct attention and traffic to the relevant modules more quickly.

[Suggestion]: add descriptions on how many Actors are needed given my cluster

Please share your suggestion here

Help Ray users understand how they can estimate number of Actors and compute needed to achieve performant batch prediction. Mention the following:

  • actor defaults (1 cpu) and how to change it
  • how to assign GPU to actors
  • total number of actors as a function of number of cpus or gpus in the cluster.
  • for large cluster mention good practice of limiting the number of CPUs made available on the head node (docs).

What's the meaning on these senstence of "Part 5: Distributed batch inference with Ray Core API"

When using Ray, you can pass objects as arguments to remote functions. Ray will automatically store these objects in the local object store (on the worker node where the function is running) using the ray.put() function. This makes the objects available to all local tasks. However, if the objects are large, this can be inefficient as the objects will need to be copied every time they are passed to a remote function.

To improve performance, you can explicitly store both the model and feature extractor in the object store by using ray.put(). This avoids the need to create multiple copies of the objects.


I am confused on the words on : ray.put()

  1. "However, if the objects are large, this can be inefficient as the objects will need to be copied every time they are passed to a remote function "
  2. "To improve performance, you can explicitly store both the model and feature extractor in the object store by using ray.put(). This avoids the need to create multiple copies of the objects."

which sentence should I follow ?

Ray Website "Try It Out" Quick Start with Ray AIR Colab Error on Import

Notebook with bug

https://colab.research.google.com/github/ray-project/ray-educational-materials/blob/main/Introductory_modules/Quickstart_with_Ray_AIR_Colab.ipynb

What happened?

Description
Running the "try it out" colab on the website fails with import error.
AttributeError: 'NoneType' object has no attribute 'replace'
Using the latest version of xgboost-ray (0.1.18) fix the problem.

Link
https://colab.research.google.com/github/ray-project/ray-educational-materials/blob/main/Introductory_modules/Quickstart_with_Ray_AIR_Colab.ipynb

Environment info

ray==2.3.0 xgboost_ray==0.1.15

Issue Severity

Low: Minor problem.

[Bug]: ray.air checkpoints has moved to ray.train checkpoints

Notebook with bug

Computer_vision_workloads/Semantic_segmentation/Scaling_batch_inference.ipynb

What happened?

Import as well as other dependencies need to be fixed for chekpoint related changes.

#from ray.air import Checkpoint
from ray.train import Checkpoint

Futher Checkpoint.from_dict() does not work as:

AttributeError: The new ray.train.Checkpoint class does not support from_dict(). Instead, only directories are supported.

Environment info

Ray 2.10.0
Python 3.10.13
Ubuntu

Issue Severity

None

[Suggestion]: batch prediction module: merge Actors and ActorPool sections

Please share your suggestion here

Merge Actors and ActorPool approaches into one.

As ActorPool is a utility, it can be presented as a convenience wrapper that it easy to work with. It provides load balancing and Actors management so that Ray user does not need to implement it themselves (as presented in the Actors section).

[Suggestion]: It's better to test the exmaples in the educational materials

Please share your suggestion here

predictions_dataset = predictor.predict(data=dataset, batch_size=1)

If I run on a GPU server, this line will raise a RayTaskError. It seems

the returned segmentation_maps_postprocessed has to be put into CPU numpy and the `num_gpus_per_worker=1' has to be set. It took me much time to realize the example has that issue. For a newbie, even a minor issue may lead to confusion.

Thanks

[suggestion] batch inference module - merge sections to better present Ray AIR

Please share your suggestion here

Merge Datasets and BatchPredictor approaches into one: "Distributed batch inference with Ray AIR".

Datasets approach is more basic; BatchPredictor is more specialized, easy to use and feature rich as it also:

  • supports various predictos (TorchPredictor, HFPredictor)
  • handles framework native batch conversions
  • give an options to resume operations from AIR checkpoint to prediction, selection / keep columns, etc.

Note in this section that BatchPredictor calls dataset.map_batches() under the hood. From that perspective they are similar.

[Suggestion]: incorporate feedback from "Overview of Ray" dry run

Please share your suggestion here

Here are a list of small changes to make based off of feedback from the "Overview of Ray" dry run:

  • include an object store visualization under the section "Put data in the object store"
  • change the naming of training and testing set components to be more readable
  • redirect use case links to YouTube videos rather than our site
  • lower the number of models to be trained
  • start with n_estimators as 8 and then increment in 8 to achieve a more satisfying convergence

[Bug]:

Notebook with bug

LLM_finetuning_and_batch_inference.ipynb

What happened?

Get the following errors while running the following cell
trainer = HuggingFaceTrainer( trainer_init_per_worker=trainer_init_per_worker, scaling_config=ScalingConfig(num_workers=num_workers, use_gpu=use_gpu), datasets={ "train": train_dataset, "evaluation": validation_dataset, }, run_config=RunConfig( checkpoint_config=CheckpointConfig( num_to_keep=1, checkpoint_score_attribute="eval_loss", checkpoint_score_order="min", ), ), preprocessor=batch_preprocessor, )

image

Environment info

ray 2.8 python3.9

Issue Severity

High: It blocks me from completing my task.

[Bug]: Halt due to resources are not available

Example 3: How to use Ray distributed tasks for image transformation and computation

What happened?

When I run the "run_distribued", I had the following errors:

In my case I set the batch to 100 but even I set it to 35, the errors raised too.

I am new to Ray and can not figure out what is going on . What resouces are unavailable and why does the syestm halt?

image

Environment info

System: Centos 7
CPUs: 128
Ray: 2.3
python 3.9

Issue Severity

None

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.