project-codeflare / codeflare Goto Github PK

View Code? Open in Web Editor NEW

217.0 3.0 36.0 1.17 MB

Simplifying the definition and execution, scaling and deployment of pipelines on the cloud.

Home Page: https://codeflare.dev

License: Apache License 2.0

Dockerfile 0.14% Python 13.99% Jupyter Notebook 85.87%

pipelines workflows automl data-science machine-learning ray sklearn hyperparameter-optimization

codeflare's People

Contributors

Stargazers

Watchers

codeflare's Issues

sample pipeline jupyter notebook on binder errors-out

Describe the bug
sample pipeline jupyter notebook errors out due to undefined variable

To Reproduce
Steps to reproduce the behavior:

Go to binder
Click on sample pipeline jupyter notebook
Run

Expected behavior
Jupyter notebook on binder should run without exception

Additional context
error while executing cell:

pipeline_output = rt.execute_pipeline(pipeline, ExecutionType.FIT, pipeline_input)
node_0_output = pipeline_output.get_xyrefs(node_0)


In [74]:

outputs[0]



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-74-a45df8d4a457> in <module>
----> 1 outputs[0]

NameError: name 'outputs' is not defined

Overview

As a CF dev, the current code has become quite complex to manage in two files. This is also not good coding practice and needs major refactoring to account for all the dev.

Acceptance Criteria

Refactor redesign
Refactor the code
Keep as much (if not all) APIs compatible
Ensure all tests pass

Questions

Assumptions

Reference

Overview

For uniformity of input output to pipeline stages, we need to have a list of Future references to Xy objects. Since Ray remote calls always return a future reference and we return a holder class with reference to X and y, which are compute tasks that can be in flight. A list of references allows for the parallelization of operations.

Uniformity of data exchange is enabled by choosing a list of future references as the way to exchange data.

Acceptance Criteria

Move away from current implementation to a future references implementation

Questions

Assumptions

Reference

Lineage And

Overview

AND node semantics computes a full cross product. In grid search cv, an AND node like feature union will require features to be joined in a given input object. For example when performing two fold cross validation on the following pipeline: (PCA (n_components = 5, 10) || Nystrom || Select k-best) && Feature Union. On two-fld CV, we get four objects from PCA node (2x2) and two objects each from Nystrom and Select k-best. A regular AND node will compute 4x2x2 cross product. A lineage and will compute 4 cross products: (pca_5, Nystrom, Select k_best) on the two input objects and (pca_10, Nystrom, select k_best) on the the same two input objects.

Lineage And: Solution select items in the AND node cross product that share the same input object lineage

Acceptance Criteria

implement Lineage And
Test Lineage And on a feature union pipeline

Questions

Assumptions

Reference

Fix node name as keys for all data structures

Overview

As a CF pipelines developer, using node as a key as opposed to node_name causes a lot of overhead. An intrusive change, but will help keep all the core data structures clean!

Acceptance Criteria

Change key to node_name
Fix all the breaking pieces
Fix the notebooks

Questions

Assumptions

Reference

not able to open up ray dashboard route from my local system browser

Describe the bug
Installed RHODS, created data science project, deployed a Jupiter notebook with code flare image
All working fine. Created a cluster. it deployed head and worker ray cluster nodes. All seem to be running fine.
It exposes a route for ray dashboard which is not accessible at all.

To Reproduce
Steps to reproduce the behavior:

Install RHODS
Create a data science project workbench with code flare notebook
Create a ray cluster
Check cluster.status -- all running fine5.
Go to OCP Console/Network/Routes/ namespace -- pick up route address6.
Open this route in your Mac notebook browser7.
http://ray-dashboard-my-new-mnisttest-default.XXX.com --> does open at all...8.
it constantly showing loading with "POC MGEN Fusion home" and " Fusion rack configuration" shown on the page

Expected behavior
Must take me to ray dashboard

Screenshots

Desktop (please complete the following information):

OS: Mac
Browser both chrome, safari
Version Mac Monterey 12.6.1
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Note I have changed security settings of my browser for route address for allow insecure content.. but still it doesn't work for me.

Pipeline save/load

Overview

As a pipelines user, I would like to pick a specific pipeline, store it and reload it for scoring purposes.

Acceptance Criteria

Ability to pick the pipeline given an output
Output can be at any stage of the pipeline
Ability to store the pipeline to disk
Ability to score the chosen pipeline

Questions

Assumptions

Reference

Requirement of 'pip==20.3.3' overwritten by previous instruction in getting started doc.

Describe the bug
A clear and concise description of what the bug is.
Error using getting started docs.
Regarding: https://codeflare.readthedocs.io/en/latest/getting_started/starting.html#codeflare-on-openshift-container-platform-ocp
Section:
Alternatively, you can also build locally with:

git clone https://github.com/project-codeflare/codeflare.git
pip3 install --upgrade pip
pip3 install .
pip3 install -r requirements.txt

SEE PROBLEM BELOW:
$ pip3 install --upgrade pip
Requirement already satisfied: pip in /usr/local/lib/python3.8/site-packages (20.3.3)
Collecting pip
Downloading pip-21.1.2-py3-none-any.whl (1.5 MB)
|████████████████████████████████| 1.5 MB 4.1 MB/s
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 20.3.3
Uninstalling pip-20.3.3:
Successfully uninstalled pip-20.3.3
Successfully installed pip-21.1.2
$
$ pip3 install .
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/pip/_vendor/pkg_resources/init.py", line 583, in _build_master
ws.require(requires)
File "/usr/local/lib/python3.8/site-packages/pip/_vendor/pkg_resources/init.py", line 900, in require
needed = self.resolve(parse_requirements(requirements))
File "/usr/local/lib/python3.8/site-packages/pip/_vendor/pkg_resources/init.py", line 791, in resolve
raise VersionConflict(dist, req).with_context(dependent_req)
pip._vendor.pkg_resources.VersionConflict: (pip 21.1.2 (/usr/local/lib/python3.8/site-packages), Requirement.parse('pip==20.3.3'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/opt/[email protected]/bin/pip3", line 33, in
sys.exit(load_entry_point('pip==20.3.3', 'console_scripts', 'pip3')())
File "/usr/local/opt/[email protected]/bin/pip3", line 25, in importlib_load_entry_point
return next(matches).load()
File "/usr/local/Cellar/[email protected]/3.8.7/Frameworks/Python.framework/Versions/3.8/lib/python3.8/importlib/metadata.py", line 77, in load
module = import_module(match.group('module'))
File "/usr/local/Cellar/[email protected]/3.8.7/Frameworks/Python.framework/Versions/3.8/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 783, in exec_module
File "", line 219, in _call_with_frames_removed
File "/usr/local/lib/python3.8/site-packages/pip/_internal/cli/main.py", line 9, in
from pip._internal.cli.autocompletion import autocomplete
File "/usr/local/lib/python3.8/site-packages/pip/_internal/cli/autocompletion.py", line 10, in
from pip._internal.cli.main_parser import create_main_parser
File "/usr/local/lib/python3.8/site-packages/pip/_internal/cli/main_parser.py", line 8, in
from pip._internal.cli import cmdoptions
File "/usr/local/lib/python3.8/site-packages/pip/_internal/cli/cmdoptions.py", line 23, in
from pip._internal.cli.parser import ConfigOptionParser
File "/usr/local/lib/python3.8/site-packages/pip/_internal/cli/parser.py", line 12, in
from pip._internal.configuration import Configuration, ConfigurationError
File "/usr/local/lib/python3.8/site-packages/pip/_internal/configuration.py", line 21, in
from pip._internal.exceptions import (
File "/usr/local/lib/python3.8/site-packages/pip/_internal/exceptions.py", line 7, in
from pip._vendor.pkg_resources import Distribution
File "/usr/local/lib/python3.8/site-packages/pip/_vendor/pkg_resources/init.py", line 3252, in
def _initialize_master_working_set():
File "/usr/local/lib/python3.8/site-packages/pip/_vendor/pkg_resources/init.py", line 3235, in _call_aside
f(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/pip/_vendor/pkg_resources/init.py", line 3264, in _initialize_master_working_set
working_set = WorkingSet._build_master()
File "/usr/local/lib/python3.8/site-packages/pip/_vendor/pkg_resources/init.py", line 585, in _build_master
return cls._build_from_requirements(requires)
File "/usr/local/lib/python3.8/site-packages/pip/_vendor/pkg_resources/init.py", line 598, in _build_from_requirements
dists = ws.resolve(reqs, Environment())
File "/usr/local/lib/python3.8/site-packages/pip/_vendor/pkg_resources/init.py", line 786, in resolve
raise DistributionNotFound(req, requirers)
pip._vendor.pkg_resources.DistributionNotFound: The 'pip==20.3.3' distribution was not found and is required by the application
$

To Reproduce
Steps to reproduce the behavior:

See steps described above.

Expected behavior
A clear and concise description of what you expected to happen.
pip3 install . from instructions runs without failure.
Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS] MacOS
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Use of 'sklearn' package is deprecated, 'scikit-learn' should be used instead

Describe the bug
The 'sklearn' PyPI package is deprecated, use 'scikit-learn' rather than 'sklearn' for pip commands.

To Reproduce
Steps to reproduce the behavior:

git clone codeflare
pip install .
See error message

 Collecting sklearn>=0.0 (from codeflare==0.1.2.dev0)
  Using cached sklearn-0.0.post9.tar.gz (3.6 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [18 lines of output]
      The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
      rather than 'sklearn' for pip commands.
      
      Here is how to fix this error in the main use cases:
      - use 'pip install scikit-learn' rather than 'pip install sklearn'
      - replace 'sklearn' by 'scikit-learn' in your pip requirements files
        (requirements.txt, setup.py, setup.cfg, Pipfile, etc ...)
      - if the 'sklearn' package is used by one of your dependencies,
        it would be great if you take some time to track which package uses
        'sklearn' instead of 'scikit-learn' and report it to their issue tracker
      - as a last resort, set the environment variable
        SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True to avoid this error
      
      More information is available at
      https://github.com/scikit-learn/sklearn-pypi-package
      
      If the previous advice does not cover your use case, feel free to report it at
      https://github.com/scikit-learn/sklearn-pypi-package/issues/new

Expected behavior
pip install shouldn't fail as only scikit-learn should be used in requirements.txt and setup.py

For more information see: https://github.com/scikit-learn/sklearn-pypi-package

Grid search jupyter notebook on binder missing graphviz library

Graph viz missing from binder service

To Reproduce
Steps to reproduce the behavior:

Go to binder service
2.Run Grid search notebook

Additional context

Below error caused by execution of cell:

non_param_graph = cf_utils.pipeline_to_graph(pipeline)
non_param_graph

ExecutableNotFound: failed to execute ['dot', '-Kdot', '-Tsvg'], make sure the Graphviz executables are on your systems' PATH

Ray cluster on OpenShift fails due to missing file

Describe the bug
Cannot bring up Ray cluster as defined in the OCP tutorial

To Reproduce
Steps to reproduce the behavior:

Go to https://codeflare.readthedocs.io/en/latest/getting_started/starting.html#Openshift-Ray-Cluster-Operator
Run pip3 install --upgrade codeflare
Create namespace oc create namespace codeflare
Run ray up ray/python/ray/autoscaler/kubernetes/example-full.yaml fails:

$ ray up ray/python/ray/autoscaler/kubernetes/example-full.yaml
Provided cluster configuration file (ray/python/ray/autoscaler/kubernetes/example-full.yaml) does not exist

Expected behavior
Bring up Ray cluster on OCP

Desktop (please complete the following information):

OS: MacOS

Additional context
OCP Cluster running on IBM Cloud.

$ oc cluster-info
Kubernetes master is running at https://c100-e.jp-tok.containers.cloud.ibm.com:31129

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

CodeFlare commit hash commit a2b290a115b0cc1317270cef6059d5281215842e

Add support for scoring metrics to Runtime.cross_validate()

Overview

Current implementation does not accept user specified scoring metric(s). For example,
cross_val_score(pipeline, X_test, y_test, scoring="neg_mean_squared_error", cv=10)
The list of sklearn model evaluation metrics are listed here
https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

Acceptance Criteria

One common use case documented below is supported by codeflare pipeline.
https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html#sphx-glr-auto-examples-model-selection-plot-underfitting-overfitting-py

Questions

Assumptions

Reference

Investigate and measure zero copy for pipelines

Overview

As a CF pipelines user, I would like to understand the memory consumption when pipelines are executed. Given pipelines accept nparrays, will zero copy sharing of Ray help?

Acceptance Criteria

Memory growth as pipelines are executed
Clear documentation on this
A potential story explaining this in more detail

Questions

Assumptions

Reference

Object lineage

Overview

As a user of CodeFlare pipeline, how do I know the lineage of an object -- what objects and what nodes generated this particular object?

Acceptance Criteria

Complete lineage including any functions (fit/transform) that are applied
Capture input object and output object
Capture start state and end state
Tests

Questions

Assumptions

Reference

Select best/k-best pipelines

Overview

As a CF pipelines user, I would like the ability to select the best or k-best pipelines from a parameter grid search output.

Acceptance Criteria

Best pipeline selection
K-best pipeline selection
Tests and compatibility with sklearn outputs

Questions

Assumptions

Reference

Support better integration between Ray and Spark in passing ObjectRef without actually moving data

Overview

As a Codeflare user, I want to use Ray and Spark alternately to execute my end-to-end ML jobs. Some steps might be executed more efficiently using Ray, while others using Spark. The plasma store in Ray seems to provide an efficient way to share ObjectRef between Ray and Spark. Currently, RayDP project supports from Spark to Ray in some limited way, by running Spark as a Ray actor. However, ObjectRef cannot be shared easily in both directions, Spark-2-Ray and Ray-2-Spark.

Acceptance Criteria

Pandas dataframe created by remote tasks in local Ray plasma stores can be passed with ObjectRef to the Spark driver to create a Spark dataframe containing list of ObjectRef.
Once that is done, on the Spark side, the executors of Spark can then access to the original Pandas dataframe locally.
From Spark to Ray: Spark preserves groupby() partition semantics and writes these partitions to plasma store, instead of using hashPartition().

Questions

In RayDP, only the driver node knows about and can access Ray. The executors of PySpark doesn't have access to Ray. This will prevent the PySpark executors from accessing the Ray plasma store. As a result, it is not possible to seamlessly pass ObjectRef between Ray workers and Spark executors.

Assumptions

Ray and Spark can share data seamlessly by exchanging ObjectRef among Ray workers and Spark executors.

Reference

[Reference] I have opened an issue on the RayDP repo: oap-project/raydp#164

Nested pipelines

Overview

As a CF pipelines user, support for nested pipelines, where the node of a pipeline can be a pipeline itself.

Acceptance Criteria

Nested pipeline API
Nested pipeline implementation
ADR for supporting nested pipelines
Tests

Questions

Given that pipelines are not estimators by themselves, how can we support nesting easily?

Assumptions

Reference

Grid search CV ADR

Overview

As a CF pipelines user, I would like to see a ADR capturing the design of gridsearch CV.

Acceptance Criteria

ADR for grid search CV

Questions

Assumptions

Reference

Predicted output is not properly assigned to get_yref(), instead is assigned to get_Xref()

Describe the bug
After running a PREDICT, the y_pred cannot be obtained via get_yref(), instead can be obtained via the get_Xref() . Semantically, this seems weird.

To Reproduce
Steps to reproduce the behavior:

Go to https://github.ibm.com/codeflare/ray-pipeline/blob/complex-example-1/notebooks/plot_feature_selection_pipeline.ipynb
Scroll down to `y_pred = ray.get(predict_clf_output[0].get_yref())
If you change that statement to y_pred = ray.get(predict_clf_output[0].get_Xref()) the output would match the original sklearn pipeline on the top.

Expected behavior
The predicted output should be obtained from calling get_yref().

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Pipeline with a single dangling estimator node triggers an exception

Describe the bug
Possibly a corner case?
ray-pipeline/codeflare/pipelines/Datamodel.py in get_pre_edges(self, node)
640 """
641 pre_edges = []
--> 642 pre_nodes = self.pre_graph[node]
643 # Empty pre
644 if not pre_nodes:

KeyError: <codeflare.pipelines.Datamodel.EstimatorNode object at 0x7fa2d8920f10>

To Reproduce

## initialize codeflare pipeline by first creating the nodes
pipeline = dm.Pipeline()
node_a = dm.EstimatorNode('a', MinMaxScaler())
node_b = dm.EstimatorNode('b', StandardScaler())
node_c = dm.EstimatorNode('c', MaxAbsScaler())
node_d = dm.EstimatorNode('d', RobustScaler())

node_e = dm.AndNode('e', FeatureUnion())
node_f = dm.AndNode('f', FeatureUnion())
node_g = dm.AndNode('g', FeatureUnion())

## codeflare nodes are then connected by edges
pipeline.add_edge(node_a, node_e)
pipeline.add_edge(node_b, node_e)
pipeline.add_edge(node_c, node_f)
## node_d does not have a downstream node
# pipeline.add_edge(node_d, node_f)
pipeline.add_edge(node_e, node_g)
pipeline.add_edge(node_f, node_g)

pipeline_input = dm.PipelineInput()
xy = dm.Xy(X,y)
pipeline_input.add_xy_arg(node_a, xy)
pipeline_input.add_xy_arg(node_b, xy)
pipeline_input.add_xy_arg(node_c, xy)
pipeline_input.add_xy_arg(node_d, xy)

## execute the codeflare pipeline
pipeline_output = rt.execute_pipeline(pipeline, ExecutionType.FIT, pipeline_input)

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Error in installing dependencies for LALE jupyter notebook on binder

Describe the bug
lale libray does install does not install all dependencies, options tried were:

!pip install lale[full]
!pip install 'liac-arff>=2.4.0'

and

!pip install lale
!pip install 'liac-arff>=2.4.0'

To Reproduce
Steps to reproduce the behavior:

Go to binder
Click on LALE jupyter notebook

Expected behavior
all cells in the examples should run without errors

Additional context

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-6-44a629fbc523> in <module>
----> 1 (X_train, y_train), (X_test, y_test) = fetch("jungle_chess_2pcs_raw_endgame_complete", "classification")

~/anaconda3/lib/python3.8/site-packages/lale/datasets/openml/openml_datasets.py in fetch(dataset_name, task_type, verbose, preprocess, test_size, astype)
    663     from lale.datasets.data_schemas import liac_arff_to_schema
    664 
--> 665     schema_orig = liac_arff_to_schema(dataDictionary)
    666     target_col = experiments_dict[dataset_name]["target"]
    667     y: Optional[Any] = None

~/anaconda3/lib/python3.8/site-packages/lale/datasets/data_schemas.py in liac_arff_to_schema(larff)
    310 
    311 def liac_arff_to_schema(larff) -> JSON_TYPE:
--> 312     assert is_liac_arff(
    313         larff
    314     ), """Your Python environment might contain an 'arff' package different from 'liac-arff'. You can install it with

AssertionError: Your Python environment might contain an 'arff' package different from 'liac-arff'. You can install it with
    pip install 'liac-arff>=2.4.0'
or with
    pip install 'lale[full]'

Data splitter

Overview

As a CFP user, I would like to split a dataset (e.g., np array, pandas dataframe) into smaller objects that can then be fed into other nodes/pipeline. This is especially useful when we have compute intensive tasks and would like to parallelize it easily.

Acceptance Criteria

Design for splitter, should be simple and intuitive
Implementation as an extension to the Node construct
Tests

Questions

What type of semantics does the splitter node define?

Assumptions

Reference

Jupyter notebook plot_scalable_poly_kernels dies when run on binder

Describe the bug
Jupyter notebook kernel dies

To Reproduce
Steps to reproduce the behavior:

Go to Binder
Click on plot_scalable_poly_kernels
Run the notebook

Expected behavior
The jupyter notebook should run without error

Error running notebook "RaySystemError: System error: buffer source array is read-only"

Describe the bug
I'm trying to run the example notebooks (in codeflare/notebooks), and came across this error. The error persisted thru attempts to restart my kernel, entire machine, and re-cloning the repo. Any help, or an explanation of the root cause, is much appreciated!

To Reproduce
Steps to reproduce the behavior:

Go to notebooks/plot_nca_classification.ipynb
Run 2nd code block. It uses Ray and Codeflare.
This line produces the error knn_pipeline = rt.select_pipeline(pipeline_fitted, pipeline_fitted.get_xyrefs(node_knn)[0])
See error: RaySystemError: System error: buffer source array is read-only

Full stack trace:

RaySystemError: System error: buffer source array is read-only
traceback: Traceback (most recent call last):
  File "/home/kastan/.pyenv/versions/3.8.6/lib/python3.8/site-packages/ray/serialization.py", line 268, in deserialize_objects
    obj = self._deserialize_object(data, metadata, object_ref)
  File "/home/kastan/.pyenv/versions/3.8.6/lib/python3.8/site-packages/ray/serialization.py", line 191, in _deserialize_object
    return self._deserialize_msgpack_data(data, metadata_fields)
  File "/home/kastan/.pyenv/versions/3.8.6/lib/python3.8/site-packages/ray/serialization.py", line 169, in _deserialize_msgpack_data
    python_objects = self._deserialize_pickle5_data(pickle5_data)
  File "/home/kastan/.pyenv/versions/3.8.6/lib/python3.8/site-packages/ray/serialization.py", line 157, in _deserialize_pickle5_data
    obj = pickle.loads(in_band, buffers=buffers)
  File "sklearn/neighbors/_dist_metrics.pyx", line 223, in sklearn.neighbors._dist_metrics.DistanceMetric.__setstate__
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only


---------------------------------------------------------------------------
RaySystemError                            Traceback (most recent call last)
/tmp/ipykernel_1251/3313313255.py in <module>
      9 test_input.add_xy_arg(node_scalar, dm.Xy(X_test, y_test))
     10 
---> 11 knn_pipeline = rt.select_pipeline(pipeline_fitted, pipeline_fitted.get_xyrefs(node_knn)[0])
     12 knn_score = ray.get(rt.execute_pipeline(knn_pipeline, ExecutionType.SCORE, test_input)
     13                     .get_xyrefs(node_knn)[0].get_yref())

~/.pyenv/versions/3.8.6/lib/python3.8/site-packages/codeflare/pipelines/Runtime.py in select_pipeline(pipeline_output, chosen_xyref)
    381         curr_xyref = xyref_queue.get()
    382         curr_node_state_ptr = curr_xyref.get_curr_node_state_ref()
--> 383         curr_node = ray.get(curr_node_state_ptr)
    384         prev_xyrefs = curr_xyref.get_prev_xyrefs()
    385 

~/.pyenv/versions/3.8.6/lib/python3.8/site-packages/ray/_private/client_mode_hook.py in wrapper(*args, **kwargs)
     87             if func.__name__ != "init" or is_client_mode_enabled_by_default:
     88                 return getattr(ray, func.__name__)(*args, **kwargs)
---> 89         return func(*args, **kwargs)
     90 
     91     return wrapper

~/.pyenv/versions/3.8.6/lib/python3.8/site-packages/ray/worker.py in get(object_refs, timeout)
   1621                     raise value.as_instanceof_cause()
   1622                 else:
-> 1623                     raise value
   1624 
   1625         if is_individual_id:

Expected behavior
Expected is selecting the pipeline and evaluating its score via a 'SCORE' pipeline.

Desktop

OS: Ubuntu 20.04 via WSL2 on Windows.
Python 3.8.6

Thank you for any help! I am a University of Illinois at Urbana-Champaign grad student trying to make the most of your work!

Replace SimlpleQueue

Overview

Currently, lineage uses SimpleQueue to realize pipelines. But this is available only in Python >=3.8. This reduces adoption, moving to Queue will give us broader Python version coverage.

Acceptance Criteria

Replace SimpleQueue with Queue
Ensure tests pass

Questions

What are the drawbacks of using Queue vs SimpleQueue?

Assumptions

Reference

https://towardsdatascience.com/dive-into-queue-module-in-python-its-more-than-fifo-ce86c40944ef

project-codeflare / codeflare Goto Github PK

codeflare's People

Contributors

Stargazers

Watchers

Forkers

codeflare's Issues

Overview

Acceptance Criteria

Questions

Assumptions

Reference

Overview

Acceptance Criteria

Questions

Assumptions

Reference

Overview

Acceptance Criteria

Questions

Assumptions

Reference

Overview

Acceptance Criteria

Questions

Assumptions

Reference

Overview

Acceptance Criteria

Questions

Assumptions

Reference

Overview

Acceptance Criteria

Questions

Assumptions

Reference

Overview

Acceptance Criteria

Questions

Assumptions

Reference

Overview

Acceptance Criteria

Questions

Assumptions

Reference

Overview

Acceptance Criteria

Questions

Assumptions

Reference

Overview

Acceptance Criteria

Questions

Assumptions

Reference

Overview

Acceptance Criteria

Questions

Assumptions

Reference

Overview

Acceptance Criteria

Questions

Assumptions

Reference

Overview

Acceptance Criteria

Questions

Assumptions

Reference

Overview

Acceptance Criteria

Questions

Assumptions

Reference

Recommend Projects

Recommend Topics

Recommend Org