Git Product home page Git Product logo

Comments (8)

avishniakov avatar avishniakov commented on June 10, 2024

Thanks for reporting @ngohoanganh96 ! We will need to have a closer look.

from zenml.

wjayesh avatar wjayesh commented on June 10, 2024

Hey @ngohoanganh96, thanks for the issue. Please help me with the following so I can understand your problem better:

  • What is the output when you run zenml status?
  • Can you try disconnecting to the server (zenml disconnect) and then connecting again?
  • What code, specifically, have you deleted and why? If you have it in a public repo, that'd be super helpful too or you can just post relevant snippets.

from zenml.

ngohoanganh96 avatar ngohoanganh96 commented on June 10, 2024

Dear @wjayesh, sorry for the late reply, for each of your question, my answers are as below

  • What is the output when you run zenml status?
    -----ZenML Server Status-----
    Connected to a ZenML server: 'http://localhost:8238'
    The active user is: 'default'
    The active workspace is: 'default' (global)
    The active stack is: 'default' (global)
    Using configuration from: 'C:\Users\anhnth16\AppData\Roaming\zenml'
    Local store files are located at: 'C:\Users\anhnth16\AppData\Roaming\zenml\local_stores'
    The status of the local dashboard:
    ZenML server 'local'
    ┌────────────────┬─────────────────────────────────┐
    │ URL │ │
    ├────────────────┼─────────────────────────────────┤
    │ STATUS │ ⏸ │
    ├────────────────┼─────────────────────────────────┤
    │ STATUS_MESSAGE │ Docker container is not present │
    ├────────────────┼─────────────────────────────────┤
    │ CONNECTED │ │
    └────────────────┴─────────────────────────────────┘

  • Can you try disconnecting to the server (zenml disconnect) and then connecting again?
    Yes, I have tried to disconnect to the server, and re-connect, it connected fine, but it still occured the same error when i ran the code.

  • What code, specifically, have you deleted and why? If you have it in a public repo, that'd be super helpful too or you can just post relevant snippets.

    log_dir = os.path.join(context.get_output_artifact_uri(), "logs")
    tensorboard_callback = tf.keras.callbacks.TensorBoard(
        log_dir=log_dir, histogram_freq=1
    )

I deleted this part in trainers.py, since it could not log out the path in the log_dir

P/s:
envs: I ran both kubeflow and zenml in the Docker Destop
debug test: I created a pod in kubernetes to connect to zenml server serve at localhost port 8238, but it seems like there is no connection betweem them. I think the error occurs because it could not create a connection between kubernetes and the zenml server in Docker Desktop.

from zenml.

safoinme avatar safoinme commented on June 10, 2024

Hey @ngohoanganh96 I think the connectivity problem is coming from the fact that you are trying to run zenml from docker, Both ZenML and the Kubeflow will be deployed within isolated environments within docker, so unless there is some extra configuration to allow connectivity between the 2 running containers.
The easiest way to fix this is to use a local version of zenml rather than docker, when you do the local recipe will mount the Kubernetes container to the path where zenml files are stored and will be accessible directly

from zenml.

ngohoanganh96 avatar ngohoanganh96 commented on June 10, 2024

Hey @ngohoanganh96 I think the connectivity problem is coming from the fact that you are trying to run zenml from docker, Both ZenML and the Kubeflow will be deployed within isolated environments within docker, so unless there is some extra configuration to allow connectivity between the 2 running containers. The easiest way to fix this is to use a local version of zenml rather than docker, when you do the local recipe will mount the Kubernetes container to the path where zenml files are stored and will be accessible directly

Dear @safoinme,
In the first time I set up the environment, it was able to run normally so I assumed kubernetes (running kubelow orchestrator) was able to make connection with zenml server normally. It only after i restart my computer (or restart the docker desktop), and run again, it appeared error as above. There something happened after the computer (or docker desktop) restart that make the kubenetes pod unable to connect to zenml server.
I also create a docker deskop container and connect it to zenml server. It ran totally fine.
The error only appear when I try to connect to zenml from the kubernetes pod ( which I assumed zenml execute it in the initiation of zeml step's pod when run a pipeline).
If you have any information about this error, I will be very much appreciated.
I will also try with the local zenml and give you the feedback.
Thank you.

from zenml.

ngohoanganh96 avatar ngohoanganh96 commented on June 10, 2024

Update trying with local zenml: I have disconnected zenml server, and run kubernetes (running kubelow orchestrator) with local zenml, but it shows below error. I do not know understand why the deployment did not work with created ID, since the ID for deployment is generated automatically.

ERROR:
time="2023-08-22T01:27:38.421Z" level=info msg="capturing logs" argo=true
�[1;35mCreating default workspace 'default' ...�[0m
�[1;35mCreating default user 'default' ...�[0m
�[1;35mCreating default stack for user 'default' in workspace default...�[0m
�[33mThe current global active stack is no longer available. Resetting the active stack to default.�[0m
�[33mThe current repo active workspace is no longer available. Resetting the active workspace to 'default'.�[0m
�[33mThe current repo active stack is no longer available. Resetting the active stack to default.�[0m
�[33mThe current global active stack is no longer available. Resetting the active stack to default.�[0m
�[1;35mReloading configuration file /app/.zen/config.yaml�[0m
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /usr/local/lib/python3.10/runpy.py:196 in _run_module_as_main │
│ │
│ 193 │ main_globals = sys.modules["main"].dict
│ 194 │ if alter_argv: │
│ 195 │ │ sys.argv[0] = mod_spec.origin │
│ ❱ 196 │ return _run_code(code, main_globals, None, │
│ 197 │ │ │ │ │ "main", mod_spec) │
│ 198 │
│ 199 def run_module(mod_name, init_globals=None, │
│ │
│ /usr/local/lib/python3.10/runpy.py:86 in _run_code │
│ │
│ 83 │ │ │ │ │ loader = loader, │
│ 84 │ │ │ │ │ package = pkg_name, │
│ 85 │ │ │ │ │ spec = mod_spec) │
│ ❱ 86 │ exec(code, run_globals) │
│ 87 │ return run_globals │
│ 88 │
│ 89 def _run_module_code(code, init_globals=None, │
│ │
│ /usr/local/lib/python3.10/site-packages/zenml/entrypoints/entrypoint.py:58 │
│ in │
│ │
│ 55 │
│ 56 │
│ 57 if name == "main": │
│ ❱ 58 │ main() │
│ 59 │
│ │
│ /usr/local/lib/python3.10/site-packages/zenml/entrypoints/entrypoint.py:54 │
│ in main │
│ │
│ 51 │ ) │
│ 52 │ entrypoint_config = entrypoint_config_class(arguments=remaining_arg │
│ 53 │ │
│ ❱ 54 │ entrypoint_config.run() │
│ 55 │
│ 56 │
│ 57 if name == "main": │
│ │
│ /usr/local/lib/python3.10/site-packages/zenml/entrypoints/step_entrypoint_co │
│ nfiguration.py:149 in run │
│ │
│ 146 │ │
│ 147 │ def run(self) -> None: │
│ 148 │ │ """Prepares the environment and runs the configured step.""" │
│ ❱ 149 │ │ deployment = self.load_deployment() │
│ 150 │ │ │
│ 151 │ │ # Activate all the integrations. This makes sure that all mate │
│ 152 │ │ # and stack component flavors are registered. │
│ │
│ /usr/local/lib/python3.10/site-packages/zenml/entrypoints/base_entrypoint_co │
│ nfiguration.py:191 in load_deployment │
│ │
│ 188 │ │ │ The deployment. │
│ 189 │ │ """ │
│ 190 │ │ deployment_id = UUID(self.entrypoint_args[DEPLOYMENT_ID_OPTION │
│ ❱ 191 │ │ return Client().zen_store.get_deployment(deployment_id=deploym │
│ 192 │ │
│ 193 │ def download_code_if_necessary( │
│ 194 │ │ self, deployment: "PipelineDeploymentResponseModel" │
│ │
│ /usr/local/lib/python3.10/site-packages/zenml/zen_stores/sql_zen_store.py:30 │
│ 82 in get_deployment │
│ │
│ 3079 │ │ │ │ ) │
│ 3080 │ │ │ ).first() │
│ 3081 │ │ │ if deployment is None: │
│ ❱ 3082 │ │ │ │ raise KeyError( │
│ 3083 │ │ │ │ │ f"Unable to get deployment with ID '{deployment_i │
│ 3084 │ │ │ │ │ "No deployment with this ID found." │
│ 3085 │ │ │ │ ) │
╰──────────────────────────────────────────────────────────────────────────────╯
KeyError: "Unable to get deployment with ID
'aed10737-29a1-41b0-990f-f47dcd03a099': No deployment with this ID found."
Error: exit status 1

from zenml.

matthiasValuecloud avatar matthiasValuecloud commented on June 10, 2024

The problem here lies within the fact, that one zenml-server instance - deployed with the default settings- can accept up to 20 requests for the database connection. Depending on what exactly you are running, this gets exceeded and therefore kubeflow cannot "reconnect" to zenml to pass through all data.
You can simply increased the replicas of zenml server instances or work with the environment variables:
zenml.environment.ZENML_STORE_POOL_SIZE = xxx
zenml.environment.ZENML_STORE_MAX_OVERFLOW = xxx
defaults to 20/20

from zenml.

david101-hunter avatar david101-hunter commented on June 10, 2024

@ngohoanganh96 ông ơi, tôi cũng đang tìm hiểu về zenml, ông còn tìm hiểu về zenml nữa không?

from zenml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.