Comments (8)
Thanks for reporting @ngohoanganh96 ! We will need to have a closer look.
from zenml.
Hey @ngohoanganh96, thanks for the issue. Please help me with the following so I can understand your problem better:
- What is the output when you run
zenml status
? - Can you try disconnecting to the server (
zenml disconnect
) and then connecting again? - What code, specifically, have you deleted and why? If you have it in a public repo, that'd be super helpful too or you can just post relevant snippets.
from zenml.
Dear @wjayesh, sorry for the late reply, for each of your question, my answers are as below
-
What is the output when you run
zenml status
?
-----ZenML Server Status-----
Connected to a ZenML server: 'http://localhost:8238'
The active user is: 'default'
The active workspace is: 'default' (global)
The active stack is: 'default' (global)
Using configuration from: 'C:\Users\anhnth16\AppData\Roaming\zenml'
Local store files are located at: 'C:\Users\anhnth16\AppData\Roaming\zenml\local_stores'
The status of the local dashboard:
ZenML server 'local'
┌────────────────┬─────────────────────────────────┐
│ URL │ │
├────────────────┼─────────────────────────────────┤
│ STATUS │ ⏸ │
├────────────────┼─────────────────────────────────┤
│ STATUS_MESSAGE │ Docker container is not present │
├────────────────┼─────────────────────────────────┤
│ CONNECTED │ │
└────────────────┴─────────────────────────────────┘ -
Can you try disconnecting to the server (
zenml disconnect
) and then connecting again?
Yes, I have tried to disconnect to the server, and re-connect, it connected fine, but it still occured the same error when i ran the code. -
What code, specifically, have you deleted and why? If you have it in a public repo, that'd be super helpful too or you can just post relevant snippets.
log_dir = os.path.join(context.get_output_artifact_uri(), "logs")
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir=log_dir, histogram_freq=1
)
I deleted this part in trainers.py, since it could not log out the path in the log_dir
P/s:
envs: I ran both kubeflow and zenml in the Docker Destop
debug test: I created a pod in kubernetes to connect to zenml server serve at localhost port 8238, but it seems like there is no connection betweem them. I think the error occurs because it could not create a connection between kubernetes and the zenml server in Docker Desktop.
from zenml.
Hey @ngohoanganh96 I think the connectivity problem is coming from the fact that you are trying to run zenml from docker, Both ZenML and the Kubeflow will be deployed within isolated environments within docker, so unless there is some extra configuration to allow connectivity between the 2 running containers.
The easiest way to fix this is to use a local version of zenml rather than docker, when you do the local recipe will mount the Kubernetes container to the path where zenml files are stored and will be accessible directly
from zenml.
Hey @ngohoanganh96 I think the connectivity problem is coming from the fact that you are trying to run zenml from docker, Both ZenML and the Kubeflow will be deployed within isolated environments within docker, so unless there is some extra configuration to allow connectivity between the 2 running containers. The easiest way to fix this is to use a local version of zenml rather than docker, when you do the local recipe will mount the Kubernetes container to the path where zenml files are stored and will be accessible directly
Dear @safoinme,
In the first time I set up the environment, it was able to run normally so I assumed kubernetes (running kubelow orchestrator) was able to make connection with zenml server normally. It only after i restart my computer (or restart the docker desktop), and run again, it appeared error as above. There something happened after the computer (or docker desktop) restart that make the kubenetes pod unable to connect to zenml server.
I also create a docker deskop container and connect it to zenml server. It ran totally fine.
The error only appear when I try to connect to zenml from the kubernetes pod ( which I assumed zenml execute it in the initiation of zeml step's pod when run a pipeline).
If you have any information about this error, I will be very much appreciated.
I will also try with the local zenml and give you the feedback.
Thank you.
from zenml.
Update trying with local zenml: I have disconnected zenml server, and run kubernetes (running kubelow orchestrator) with local zenml, but it shows below error. I do not know understand why the deployment did not work with created ID, since the ID for deployment is generated automatically.
ERROR:
time="2023-08-22T01:27:38.421Z" level=info msg="capturing logs" argo=true
�[1;35mCreating default workspace 'default' ...�[0m
�[1;35mCreating default user 'default' ...�[0m
�[1;35mCreating default stack for user 'default' in workspace default...�[0m
�[33mThe current global active stack is no longer available. Resetting the active stack to default.�[0m
�[33mThe current repo active workspace is no longer available. Resetting the active workspace to 'default'.�[0m
�[33mThe current repo active stack is no longer available. Resetting the active stack to default.�[0m
�[33mThe current global active stack is no longer available. Resetting the active stack to default.�[0m
�[1;35mReloading configuration file /app/.zen/config.yaml�[0m
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /usr/local/lib/python3.10/runpy.py:196 in _run_module_as_main │
│ │
│ 193 │ main_globals = sys.modules["main"].dict │
│ 194 │ if alter_argv: │
│ 195 │ │ sys.argv[0] = mod_spec.origin │
│ ❱ 196 │ return _run_code(code, main_globals, None, │
│ 197 │ │ │ │ │ "main", mod_spec) │
│ 198 │
│ 199 def run_module(mod_name, init_globals=None, │
│ │
│ /usr/local/lib/python3.10/runpy.py:86 in _run_code │
│ │
│ 83 │ │ │ │ │ loader = loader, │
│ 84 │ │ │ │ │ package = pkg_name, │
│ 85 │ │ │ │ │ spec = mod_spec) │
│ ❱ 86 │ exec(code, run_globals) │
│ 87 │ return run_globals │
│ 88 │
│ 89 def _run_module_code(code, init_globals=None, │
│ │
│ /usr/local/lib/python3.10/site-packages/zenml/entrypoints/entrypoint.py:58 │
│ in │
│ │
│ 55 │
│ 56 │
│ 57 if name == "main": │
│ ❱ 58 │ main() │
│ 59 │
│ │
│ /usr/local/lib/python3.10/site-packages/zenml/entrypoints/entrypoint.py:54 │
│ in main │
│ │
│ 51 │ ) │
│ 52 │ entrypoint_config = entrypoint_config_class(arguments=remaining_arg │
│ 53 │ │
│ ❱ 54 │ entrypoint_config.run() │
│ 55 │
│ 56 │
│ 57 if name == "main": │
│ │
│ /usr/local/lib/python3.10/site-packages/zenml/entrypoints/step_entrypoint_co │
│ nfiguration.py:149 in run │
│ │
│ 146 │ │
│ 147 │ def run(self) -> None: │
│ 148 │ │ """Prepares the environment and runs the configured step.""" │
│ ❱ 149 │ │ deployment = self.load_deployment() │
│ 150 │ │ │
│ 151 │ │ # Activate all the integrations. This makes sure that all mate │
│ 152 │ │ # and stack component flavors are registered. │
│ │
│ /usr/local/lib/python3.10/site-packages/zenml/entrypoints/base_entrypoint_co │
│ nfiguration.py:191 in load_deployment │
│ │
│ 188 │ │ │ The deployment. │
│ 189 │ │ """ │
│ 190 │ │ deployment_id = UUID(self.entrypoint_args[DEPLOYMENT_ID_OPTION │
│ ❱ 191 │ │ return Client().zen_store.get_deployment(deployment_id=deploym │
│ 192 │ │
│ 193 │ def download_code_if_necessary( │
│ 194 │ │ self, deployment: "PipelineDeploymentResponseModel" │
│ │
│ /usr/local/lib/python3.10/site-packages/zenml/zen_stores/sql_zen_store.py:30 │
│ 82 in get_deployment │
│ │
│ 3079 │ │ │ │ ) │
│ 3080 │ │ │ ).first() │
│ 3081 │ │ │ if deployment is None: │
│ ❱ 3082 │ │ │ │ raise KeyError( │
│ 3083 │ │ │ │ │ f"Unable to get deployment with ID '{deployment_i │
│ 3084 │ │ │ │ │ "No deployment with this ID found." │
│ 3085 │ │ │ │ ) │
╰──────────────────────────────────────────────────────────────────────────────╯
KeyError: "Unable to get deployment with ID
'aed10737-29a1-41b0-990f-f47dcd03a099': No deployment with this ID found."
Error: exit status 1
from zenml.
The problem here lies within the fact, that one zenml-server instance - deployed with the default settings- can accept up to 20 requests for the database connection. Depending on what exactly you are running, this gets exceeded and therefore kubeflow cannot "reconnect" to zenml to pass through all data.
You can simply increased the replicas of zenml server instances or work with the environment variables:
zenml.environment.ZENML_STORE_POOL_SIZE = xxx
zenml.environment.ZENML_STORE_MAX_OVERFLOW = xxx
defaults to 20/20
from zenml.
@ngohoanganh96 ông ơi, tôi cũng đang tìm hiểu về zenml, ông còn tìm hiểu về zenml nữa không?
from zenml.
Related Issues (20)
- [BUG]: Rate limiting Vulnerability HOT 1
- [BUG]: Critical Vulnerability related to file access HOT 1
- [BUG]: Unable to open the quickstart colab notebook HOT 2
- [BUG]: Node Selector doesn"t work on job HOT 1
- Integrate `safetensors` for model serialization HOT 4
- [BUG]: Integration Materializers links not working (Documentation) HOT 5
- [BUG]: THIS RESULTED FROM USING ZENML WITH ML FLOW HOT 1
- [BUG]: File priviledge changing timing, potential TOCTOU HOT 3
- Check if Juypter is installed HOT 2
- [BUG]: {{date}} and {{time}} placeholders not being replaced HOT 2
- [BUG]: Cannot use `zenml init` with templates even after installing `zenml[templates]` HOT 3
- Add Argilla annotator stack component
- [BUG]: Unable to determine source root HOT 4
- [BUG]: GreatExpectations materializer not automatically selected
- [BUG]: Logs are not working HOT 1
- [BUG]: Switch to old version dashboard from 0.57.0 zenml-server HOT 4
- [BUG/Feature Request]: Reusing a step overwrites artifact names HOT 1
- [BUG]: "zenml go" broken for me HOT 6
- [BUG]: Copying stack fails after updating zenml server version HOT 1
- [BUG]: mt5 tokenizer spiece.model saving issue HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zenml.