Comments (7)
Update 2: after scaling the VM SSD again to ~ 4TB & restarting the cluster , it has kept running successfully since then , yet the cluster seems to have taken an extra ~700GB so far , so the suggested 500GB is clearly not enough for the given example
from farmvibes-ai.
Update 3: the spaceeye.spaceeye.spaceeye has finally finished , yet upon complete it through that error & the workflow failed ,
"reason": "KeyError: 'b806ef61-0164-4ef9-9cec-ccda9682ac9c'\n File \"/opt/conda/lib/python3.8/site-packages/vibe_common/messaging.py\", line 491, in accept_or_fail_event\n return success_callback(message)\n\n File \"/opt/conda/lib/python3.8/site-packages/vibe_server/orchestrator.py\", line 422, in success_callback\n self.inqueues[str(message.run_id)].put(message)\n",
After rerunning the same workflow the task spaceeye.spaceeye.spaceeye was done & spaceeye.spaceeye.split was queued pending nothing , So I restarted the cluster , yet after restarting it responds now with HTTPError: 500 Server Error for both the API & python client , stop & start also get's stuck here
from farmvibes-ai.
The KeyError
you saw was due to the restart of the cluster, in which the orchestrator lost the context about the existing run.
Can you please share the log files stored in ~/.cache/farmvibes-ai/logs
? Also, if possible, please also share the output of docker logs k3d-farmvibes-ai-server-0
and df -h
.
Thanks!
from farmvibes-ai.
that makes sense because eventually everything I ran was queued pending nothing , which file exactly ?
I will assume "terravibes- orchestrator .log" " it's a very large file if you're looking for some error it will be easier to search
{"app_id": "terravibes-orchestrator", "instance": "terravibes-orchestrator-656cbf6fff-kjb9b", "level": "INFO", "msg": "Changed op spaceeye.spaceeye.group_s2 status to pending. (run id: 76fd498a-c476-492b-9735-d5e9bab49f2c)", "scope": "vibe_server.orchestrator.WorkflowStateUpdate", "time": "2023-02-28 09:21:15,480", "type": "log", "ver": "dev"}
{"app_id": "terravibes-orchestrator", "instance": "terravibes-orchestrator-656cbf6fff-kjb9b", "level": "ERROR", "msg": "Marking op spaceeye.spaceeye.group_s2 as failed, but it didn't have a start time set. (run id: 76fd498a-c476-492b-9735-d5e9bab49f2c)", "scope": "vibe_server.orchestrator.WorkflowStateUpdate", "time": "2023-02-28 09:21:15,480", "type": "log", "ver": "dev"}
{"app_id": "terravibes-orchestrator", "instance": "terravibes-orchestrator-656cbf6fff-kjb9b", "level": "INFO", "msg": "Workflow 76fd498a-c476-492b-9735-d5e9bab49f2c changed status to failed\nReason: WORKFLOW_FAILED status propagated from workflow level", "scope": "vibe_server.orchestrator.WorkflowStateUpdate", "time": "2023-02-28 09:21:15,480", "type": "log", "ver": "dev"}
{"app_id": "terravibes-orchestrator", "instance": "terravibes-orchestrator-656cbf6fff-kjb9b", "level": "DEBUG", "msg": "Starting new HTTP connection (1): 127.0.0.1:3500", "scope": "urllib3.connectionpool", "time": "2023-02-28 09:21:15,481", "type": "log", "ver": "dev"}
{"app_id": "terravibes-orchestrator", "instance": "terravibes-orchestrator-656cbf6fff-kjb9b", "level": "DEBUG", "msg": "http://127.0.0.1:3500 \"GET /v1.0/state/statestore/76fd498a-c476-492b-9735-d5e9bab49f2c?metadata.partitionKey=eywa HTTP/1.1\" 200 2967", "scope": "urllib3.connectionpool", "time": "2023-02-28 09:21:15,482", "type": "log", "ver": "dev"}
{"app_id": "terravibes-orchestrator", "instance": "terravibes-orchestrator-656cbf6fff-kjb9b", "level": "DEBUG", "msg": "Starting new HTTP connection (1): 127.0.0.1:3500", "scope": "urllib3.connectionpool", "time": "2023-02-28 09:21:15,493", "type": "log", "ver": "dev"}
{"app_id": "terravibes-orchestrator", "instance": "terravibes-orchestrator-656cbf6fff-kjb9b", "level": "DEBUG", "msg": "http://127.0.0.1:3500 \"POST /v1.0/state/statestore HTTP/1.1\" 204 0", "scope": "urllib3.connectionpool", "time": "2023-02-28 09:21:15,494", "type": "log", "ver": "dev"}
{"app_id": "terravibes-orchestrator", "instance": "terravibes-orchestrator-656cbf6fff-kjb9b", "level": "INFO", "msg": "Changed op spaceeye.preprocess.cloud.merge status to pending. (run id: 76fd498a-c476-492b-9735-d5e9bab49f2c)", "scope": "vibe_server.orchestrator.WorkflowStateUpdate", "time": "2023-02-28 09:21:15,496", "type": "log", "ver": "dev"}
{"app_id": "terravibes-orchestrator", "instance": "terravibes-orchestrator-656cbf6fff-kjb9b", "level": "ERROR", "msg": "Marking op spaceeye.preprocess.cloud.merge as failed, but it didn't have a start time set. (run id: 76fd498a-c476-492b-9735-d5e9bab49f2c)", "scope": "vibe_server.orchestrator.WorkflowStateUpdate", "time": "2023-02-28 09:21:15,496", "type": "log", "ver": "dev"}
and for dh -f
Filesystem Size Used Avail Use% Mounted on
/dev/root 3.9T 2.7T 1.3T 70% /
tmpfs 32G 4.0K 32G 1% /dev/shm
tmpfs 13G 1.9M 13G 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sda15 105M 6.1M 99M 6% /boot/efi
/dev/sdb1 126G 28K 120G 1% /mnt
tmpfs 6.3G 128K 6.3G 1% /run/user/124
tmpfs 6.3G 140K 6.3G 1% /run/user/1000
this after I destroyed the cluster & renamed the old cash file ( the only way new workflows can work )
and for docker logs k3d-farmvibes-ai-server-0
Couldn't collect info from any of the files in "/etc/machine-id,/var/lib/dbus/machine-id"
I0228 22:41:04.152242 7 trace.go:205] Trace[81119723]: "GuaranteedUpdate etcd3" type:*core.ConfigMap (28-Feb-2023 22:41:03.618) (total time: 533ms):
Trace[81119723]: ---"Transaction committed" 533ms (22:41:04.152)
Trace[81119723]: [533.657546ms] [533.657546ms] END
I0228 22:41:04.152421 7 trace.go:205] Trace[956170375]: "Update" url:/api/v1/namespaces/dapr-system/configmaps/operator.dapr.io,user-agent:operator/v0.0.0 (linux/amd64) kubernetes/$Format/leader-election,audit-id:2e418883-fb2b-443c-a6c2-6fcd08f30186,client:10.42.0.62,accept:application/json, */*,protocol:HTTP/2.0 (28-Feb-2023 22:41:03.618) (total time: 534ms):
Trace[956170375]: ---"Object stored in database" 533ms (22:41:04.152)
Trace[956170375]: [534.096151ms] [534.096151ms] END
W0228 22:41:07.369204 7 info.go:53] Couldn't collect info from any of the files in "/etc/machine-id,/var/lib/dbus/machine-id"
update 4 ; as a last try I reran the workflow again with much smaller input region inside the given input region ( only 20 km ) luckily it worked & completed succefully in 7 hours
from farmvibes-ai.
I think you can attach the files if you drag and drop them to the text area.
For the log files, at least the terravibes-orchestrator.log
, but ideally all of them in the directory.
For the output of docker logs
, you can save it to a file with docker logs k3d-farmvibes-ai-server-0 > docker-logs.txt
.
from farmvibes-ai.
I already tried that : terravibes-orchestrator.log is 15.3 MB file , github refused it even after changing it to .txt , also terravibes-restapi.log is around 700 MB , if this files are still important I can try public clouds to upload them
from farmvibes-ai.
Update 5 : as explained above I was able to navigate around the error, specially working with smaller areas , however I still get this error when the resources already exists, which eventually requires the notebook to be run multiple times , in the next run the task will be done immediately , this makes it harder to integrate with as an API "reason": "RuntimeError: Received unsupported message header=MessageHeader(type=<MessageType.error: 'error'>, run_id=UUID('18e9dbdc-1f9c-4ffa-bd85-4455e2a0518c'), id='00-18e9dbdc1f9c4ffabd854455e2a0518c-140e43094eee796e-01', parent_id='00-18e9dbdc1f9c4ffabd854455e2a0518c-b87c84a62262a97c-01', version='1.0', created_at=datetime.datetime(2023, 3, 8, 23, 38, 16, 110840)) content=ErrorContent(status=<OpStatusType.failed: 'failed'>, ename=\"<class 'RuntimeError'>\", evalue='Traceback (most recent call last):\\n File \"/opt/conda/lib/python3.8/site-packages/vibe_agent/worker.py\", line 123, in run_op\\n return factory.build(spec).run(input, cache_info)\\n File \"/opt/conda/lib/python3.8/site-packages/vibe_agent/ops.py\", line 99, in run\\n items_out = self.storage.store(run_id, stac_results, cache_info)\\n File \"/opt/conda/lib/python3.8/site-packages/vibe_agent/storage/local_storage.py\", line 135, in store\\n raise LocalResourceExistsError(\\nvibe_agent.storage.local_storage.LocalResourceExistsError: Op output already exists in storage for download_sentinel2_from_pc with id 23bdca2b-99ce-498e-bd19-ced731cc545e.\\n', traceback=[' File \"/opt/conda/lib/python3.8/site-packages/vibe_agent/worker.py\", line 309, in run_op_from_message\\n out = self.run_op_with_retry(content, message.run_id)\\n', ' File \"/opt/conda/lib/python3.8/site-packages/vibe_agent/worker.py\", line 402, in run_op_with_retry\\n raise RuntimeError(\"\".join(ret.format()))\\n']). Aborting execution.", "status": "failed"
from farmvibes-ai.
Related Issues (20)
- geopandas.dataset deprecated on sentinel_spaceeye notebook HOT 2
- Error setting up farmvibes-ai HOT 12
- Discrepancy in Units between Input Sensor Data and Heatmap Output for Humidity and Salinity Properties. HOT 8
- Error deleting a Workflow HOT 4
- Error installing a Remote Cluster in Azure HOT 20
- Download always fails in sentinel/spectral_indices notebook HOT 6
- Install ERROR during pip install ./src/vibe_core HOT 2
- Error with weed_detection workflow. Are there requirements for nput raster and boundary shapefile?
- Error installing env.yaml file via Conda HOT 10
- How to choose? Local setup on VM vs Remote AKS cluster? HOT 2
- Error during weed_detection_env.yaml setup HOT 3
- Error running basemap_segmentation workflow - sam_inference failed HOT 4
- remote add-secret doesnโt work correctly
- How to deploy Farm vibes on Existing on-premises OpenShift or Kubernetes cluster HOT 1
- What-if scenario evaluation for carbon sequestration notebook HOT 12
- Unable to export SAM models to cluster due to missing file `scripts/export_sam_models.py`
- Facing errors in Land degradation notebook run with different geometry. HOT 2
- Unable to run the irrigation_classification notebook HOT 4
- farmvibes-ai local setup HOT 7
- PyTorch "Undefined symbol" error when importing SAM ONNX models to cluster HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from farmvibes-ai.