Git Product home page Git Product logo

microsoft / farmvibes-ai Goto Github PK

View Code? Open in Web Editor NEW
628.0 38.0 102.0 41.75 MB

FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability

Home Page: https://microsoft.github.io/farmvibes-ai/

License: MIT License

Shell 0.01% Jupyter Notebook 97.85% Python 1.85% Bicep 0.02% HCL 0.27%
agriculture ai geospatial geospatial-analytics stac sustainability multi-modal remote-sensing weather

farmvibes-ai's Introduction

FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability

With FarmVibes.AI, you can develop rich geospatial insights for agriculture and sustainability.

Build models that fuse multiple geospatial and spatiotemporal datasets to obtain insights (e.g. estimate carbon footprint, understand growth rate, detect practices followed) that would be hard to obtain when these datasets are used in isolation. You can fuse together satellite imagery (RGB, SAR, multispectral), drone imagery, weather data, and more.

Fusing datasets this way helps generate more robust insights and unlocks new insights that are otherwise not possible without fusion. This repo contains several fusion workflows (published and shown to be key for agriculture related problems) that help you build robust remote sensing, earth observation, and geospatial models with focus on agriculture/farming with ease. Our main focus right now is agriculture and sustainability, which the models are optimized for. However, the framework itself is generic enough to help you build models for other domains.

FarmVibes.AI Primer

There are three main pieces to FarmVibes.AI. The first one consists of data ingestion and pre-processing workflows to help prepare data for fusion models tailored towards agriculture. Additionally, we provide model training notebook examples that not only allow the configuration of pre-processing of data but also allow tuning existing models with ease. Finally, a compute engine that supports data ingestion as well as adjusting existing and creating novel workflows with the tuned model.

FarmVibes.AI Fusion-Ready Dataset Preparation

In this step, you can select the datasets that you would like to fuse for building the insights. FarmVibes.AI comes with many dataset downloaders. These include satellite imagery from Sentinel 1 and 2, US Cropland Data, USGS Elevation maps, NAIP imagery, NOAA weather data, private weather data from Ambient Weather. Additionally, you can also bring in any rasterized datasets that you want to make them fusion-ready for FarmVibes.AI (e.g. drone imagery or other satellite imagery) and, in the future, custom sensor data (such as weather sensors).

The key technique in FarmVibes.AI is to use as input for ML models data that goes much beyond types, space and time from where the labels are located. For example, when detecting grain silos from satellite imagery (labeled only in optical imagery), it is better to rely on optical as well as elevation and radar bands. In this scenario, it is also important to combine multiple data modalities with other known agriculture infrastructure entities. Likewise, it is also important to use as input the images of a given silo across various times of the year to help generate a more robust model. Including information from many data streams, while also incorporating historical data from nearby or similar locations has been shown to improve robustness of geospatial models (especially for yield, growth, and crop classification problems). FarmVibes.AI generates such input data for models with ease based on parameters that can be specified.

FarmVibes.AI enables a data scientist to massage and/or tune the datasets to their preferences. The tuning is enabled via a configurable workflow which is specified as a directed acyclic graph of data downloading workflows and data preparation workflows. The preparation operators help create the inputs (e.g. fused pandas arrays or tensors containing all raw data) to training and inference modules.

FarmVibes.AI Model Sample Notebook Library

The next step in FarmVibes.AI involves using the inbuilt notebooks to tune the models to achieve a level of accuracy for the parts of the world or seasons that you are focusing on. The library includes notebooks for detecting practices (e.g. harvest date detection), estimating climate impact (both seasonal carbon footprint and long term sustainability), micro climate prediction, and crop identification.

FarmVibes.AI comes with these notebooks to help you get started to train fusion models to combine the geospatial datasets into robust insights tailored for your needs. The users can tune the model to a desired performance and publish the model to FarmVibes.AI. The model then shows up to be used later in an inference engine that can be employed for other parts of the world, other dates, or more.

FarmVibes.AI Inference Engine

The final stage in FarmVibes.AI is to combine the data connectors, pre-processing, and the model pieces together into a robust inference workflow. The generated workflow can then be used for performing inference in an area of interest and time range that can be passed as inputs to the workflow. FarmVibes.AI can be configured such that it then runs the inference for the time range and updates the results whenever upstream data is updated (e.g. new satellite imagery or sensor data is added). You do this by creating a workflow that is composed of fused data preparation and fusion model workflows.

Operation Mode

Currently, we are open-sourcing the local FarmVibes.AI cluster, that uses pre-build operators and workflows and runs them locally on your data science machine. This means that any data generated is persisted locally in your machine. The actual workflows and their implementations are provided via Docker images, with their description available in the workflow list documentation.

The user can interact with the local FarmVibes.AI cluster via a REST API (in localhost) or a local Python client (inside a Jupyter Notebook, for example).

Installation

Please refer to the the Quickstart guide for information on where to get started. If you prefer to setup a dedicated Azure Virtual Machine to run FarmVibes.AI, you can find detailed instructions in the VM setup documentation.

Notebook Examples

In the folder notebooks there are several examples to serve as starting points and demonstrating how FarmVibes.AI can be used to create Agriculture insights. Some of the available notebooks are:

  • helloworld: a simple example on how to use the client to run a workflow and visualize the response.
  • harvest_period: showing how a NDVI time-series computed on top of Sentinel 2 data can be obtained for a single field and planting season and used to estimate emergence and harvest dates.
  • carbon: illustrating how to simulate different soil carbon estimates based on different agriculture practices, leveraging the COMET-Farm API.
  • deepmc: showing how one can build micro-climate forecasts from weather station data using the DeepMC model.
  • crop_segementation: this example shows how to train a crop identification model based on NDVI data computed on top of our SpaceEye cloud-free image generation model. In this example, you can also then use the trained model in an inference workflow to obtain predictions in any area where we are able to generate SpaceEye imagery.

We provide a complete list of the notebooks available and their description in our documentation.

Documentation

More detailed information about the different components can be found in the FarmVibes.AI documentation. In this repository, this information is also accessible in:

  • FARMVIBES_AI.md describing how to setup and manage the local cluster.
  • WORKFLOWS.md describing how workflows can be written and how they function.
  • CLIENT.md documenting the FarmVibes.AI client, which is the preferred way to run workflows and interact with the results.
  • SECRETS.md describing how to manage and pass secrets to the cluster (such as API keys), so that they will be available when running workflows.
  • TROUBLESHOOTING.md in case you run into any issues.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

farmvibes-ai's People

Contributors

farmvibes-ai-cd avatar iamreechi avatar lonnes avatar rafaspadilha avatar renatolfc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

farmvibes-ai's Issues

'Temporary failure in name resolution' when attempting to run workflows from an Azure ML compute instance

My organisation has set up a compute instance on Azure ML Studio, and the quickstart guide was followed to set up Farmvibes on it.

When attempting to run any workflows (including the example helloworld workflow), the run fails.
Upon inspecting the failure, the attached error appears.

The error persists after restarting the instance and after restarting the farmvibes cluster.

{'hello': RunDetails(start_time=datetime.datetime(2022, 11, 15, 4, 41, 35, 737941), end_time=datetime.datetime(2022, 11, 15, 4, 42, 35, 895460), reason='RuntimeError: status=<OpStatusType.failed: \'failed\'> ename="<class \'RuntimeError\'>" evalue=\'Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/urllib/request.py", line 1354, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/opt/conda/lib/python3.8/http/client.py", line 1256, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/opt/conda/lib/python3.8/http/client.py", line 1302, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/opt/conda/lib/python3.8/http/client.py", line 1251, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/conda/lib/python3.8/http/client.py", line 1011, in _send_output
    self.send(msg)
  File "/opt/conda/lib/python3.8/http/client.py", line 951, in send
    self.connect()
  File "/opt/conda/lib/python3.8/http/client.py", line 1418, in connect
    super().connect()
  File "/opt/conda/lib/python3.8/http/client.py", line 922, in connect
    self.sock = self._create_connection(
  File "/opt/conda/lib/python3.8/socket.py", line 787, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "/opt/conda/lib/python3.8/socket.py", line 918, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/worker.py", line 129, in run_op
    return factory.build(spec).run(input)
  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/ops.py", line 221, in run
    items_out = self.storage.store(self.name, run_id, stac_results, cache_info)
  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/storage/local_storage.py", line 137, in store
    self._catalog_cleanup(catalog)
  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/storage/local_storage.py", line 113, in _catalog_cleanup
    catalog.validate_all()
  File "/opt/conda/lib/python3.8/site-packages/pystac/catalog.py", line 874, in validate_all
    self.validate()
  File "/opt/conda/lib/python3.8/site-packages/pystac/stac_object.py", line 54, in validate
    return pystac.validation.validate(self)
  File "/opt/conda/lib/python3.8/site-packages/pystac/validation/__init__.py", line 31, in validate
    return validate_dict(
  File "/opt/conda/lib/python3.8/site-packages/pystac/validation/__init__.py", line 102, in validate_dict
    return RegisteredValidator.get_validator().validate(
  File "/opt/conda/lib/python3.8/site-packages/pystac/validation/stac_validator.py", line 101, in validate
    core_result = self.validate_core(
  File "/opt/conda/lib/python3.8/site-packages/pystac/validation/stac_validator.py", line 217, in validate_core
    self._validate_from_uri(stac_dict, schema_uri)
  File "/opt/conda/lib/python3.8/site-packages/pystac/validation/stac_validator.py", line 162, in _validate_from_uri
    schema, resolver = self.get_schema_from_uri(schema_uri)
  File "/opt/conda/lib/python3.8/site-packages/pystac/validation/stac_validator.py", line 150, in get_schema_from_uri
    s = json.loads(pystac.StacIO.default().read_text(schema_uri))
  File "/opt/conda/lib/python3.8/site-packages/pystac/stac_io.py", line 275, in read_text
    return self.read_text_from_href(href)
  File "/opt/conda/lib/python3.8/site-packages/pystac/stac_io.py", line 292, in read_text_from_href
    with urlopen(href) as f:
  File "/opt/conda/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/opt/conda/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/opt/conda/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/opt/conda/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/opt/conda/lib/python3.8/urllib/request.py", line 1397, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/opt/conda/lib/python3.8/urllib/request.py", line 1357, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
\' traceback=[\'  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/worker.py", line 246, in run
    output = self.run_op_output_handler(content, run_id)
\', \'  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/worker.py", line 278, in run_op_output_handler
    return self.run_op(content, run_id)
\', \'  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/worker.py", line 273, in run_op
    raise RuntimeError("".join(ret.format()))
\']', status='failed')}

connection error: connection refused to port 30000

I have completely installed the FarmVibes.AI on Azure server It was working correctly. I also Run 5 to 7 workflows. but now it gives the connection error. The error is " HTTPConnectionPool(host='172.18.0.2', port=30000): Max retries exceeded with url: /v0/workflows (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f81f585fb20>: Failed to establish a new connection: [Errno 111] Connection refused')) "
I also attached the details of the Error.
Error_details.txt

It appears that there could be a problem with the method used for calculating the RMSE value in deepMC.

There seems to be an issue with calculating the RMSE value in cell 33 of the deepmc/mc_forecast.ipynb notebook. By setting the squared option of the mse function to False, the RMSE value is returned directly, eliminating the need for a separate math.sqrt operation. Nonetheless, the value currently being displayed appears to be the result of an additional sqrt on peration performed on the RMSE value. Would you kindly verify and address this matter?
rmse

Failure to generate SpaceEye and NDVI Timelapse of the farm for year 2022.

Dear Farmvibes Team,
While running "timelapse_visualization.ipynb" notebook the task "spaceeye.preprocess.s2.list"
did not succeed and encountered an error "Could not find orbit element when parsing manifest XML for item S2A_MSIL2A_20221210T053221_R105_T43QDA_20221210T222819"

I have attached the screenshot of task which was failed along with its log for better understanding.
I have also attached the text file containing polygonal co-ordinates of farm for which workflow execution fails.

2022 (1)

2022

Farm Co-ordinates.txt

Receiving Error: 'spaceeye.split' does not exist in the workflow graph when running visualize dataset notebook of "Crop Segmentation"

Hi,
I am running visualize dataset notebook of crop segmentation on Azure VM with recommended specs,I didnot made any changes and running the default code provided by you guys but I am receiving error when I run the workflow.
The error I receive is:
"Unable to run workflow with provided parameters. Tried to connect port rasters from op spaceeye.split to port raster of op ndvi.compute_index, but 'spaceeye.split' does not exist in the workflow graph."

I have attached the code and error for your reference.I am also using reverse proxy which is hidden in the below screenshots.

1 code crop segmentation
2 error crop segmentation

Can't setup the worker after destroy

I have previously installed the farmvibes-ai and run some tasks successfully, but at some point it started to give connection refused or 500 errors:
HTTPConnectionPool(host='172.19.0.2', port=30000): Max retries exceeded with url: /v0/runs (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb0b6cdada0>: Failed to establish a new connection: [Errno 111] Connection refused'

I have tried to stop/start the cluster, but it didn't help. After that I tried to destroy it and then install again, however I get the following error when trying to setup the cluster:

 Deploying the Dapr control plane to your cluster...
✅  Success! Dapr has been installed to namespace dapr-system. To verify, run `dapr status -k' in your terminal. To get started, go here: https://aka.ms/dapr-getting-started
Installing redis in the cluster...
statefulset.apps/redis-replicas scaled
Installing rabbitmq in the cluster...
^[[B^[[BError: INSTALLATION FAILED: timed out waiting for the condition
Failed to install rabbitmq in k8s cluster ❌
Error from server (BadRequest): pod redis-master-0 does not have a host assigned
Error from server (BadRequest): pod redis-master-0 does not have a host assigned

I have also tried to clean up the .config/farmvibes-ai/, .kube and .k3d folders, but it didn't help.

BufferError while running ndvi_summary notebook

Following is the error we got while running the ndvi summary notebook along with the specifications of our VM
RAM : 32
Disk size : 2TB
Free space : 767.8 GB
vCPU : 8

[I 13:31:13.568 NotebookApp] Saving file at /land_degradation/land_degradation.ipynb
[E 14:17:02.094 NotebookApp] Exception in callback <bound method WebSocketMixin.send_ping of ZMQChannelsHandler(87d1348c-b134-473b-ab6b-570bfb826b7d)>
Traceback (most recent call last):
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/ioloop.py", line 921, in _run
val = self.callback()
File "/home/azureuser/.local/lib/python3.8/site-packages/notebook/base/zmqhandlers.py", line 188, in send_ping
self.ping(b'')
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/websocket.py", line 445, in ping
self.ws_connection.write_ping(data)
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/websocket.py", line 1101, in write_ping
self._write_frame(True, 0x9, data)
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/websocket.py", line 1061, in _write_frame
return self.stream.write(frame)
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/iostream.py", line 540, in write
self._write_buffer.append(data)
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/iostream.py", line 157, in append
b += data # type: ignore
BufferError: Existing exports of data: object cannot be re-sized
[E 15:31:02.090 NotebookApp] Exception in callback <bound method WebSocketMixin.send_ping of ZMQChannelsHandler(87d1348c-b134-473b-ab6b-570bfb826b7d)>
Traceback (most recent call last):
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/ioloop.py", line 921, in _run
val = self.callback()
File "/home/azureuser/.local/lib/python3.8/site-packages/notebook/base/zmqhandlers.py", line 188, in send_ping
self.ping(b'')
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/websocket.py", line 445, in ping
self.ws_connection.write_ping(data)
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/websocket.py", line 1101, in write_ping
self._write_frame(True, 0x9, data)
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/websocket.py", line 1061, in _write_frame
return self.stream.write(frame)
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/iostream.py", line 540, in write
self._write_buffer.append(data)
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/iostream.py", line 157, in append
b += data # type: ignore
BufferError: Existing exports of data: object cannot be re-sized
[E 16:12:32.092 NotebookApp] Exception in callback <bound method WebSocketMixin.send_ping of ZMQChannelsHandler(87d1348c-b134-473b-ab6b-570bfb826b7d)>
Traceback (most recent call last):
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/ioloop.py", line 921, in _run
val = self.callback()
File "/home/azureuser/.local/lib/python3.8/site-packages/notebook/base/zmqhandlers.py", line 188, in send_ping
self.ping(b'')
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/websocket.py", line 445, in ping
self.ws_connection.write_ping(data)
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/websocket.py", line 1101, in write_ping
self._write_frame(True, 0x9, data)
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/websocket.py", line 1061, in _write_frame
return self.stream.write(frame)
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/iostream.py", line 540, in write
self._write_buffer.append(data)
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/iostream.py", line 157, in append
b += data # type: ignore
BufferError: Existing exports of data: object cannot be re-sized
[E 17:24:02.092 NotebookApp] Exception in callback <bound method WebSocketMixin.send_ping of ZMQChannelsHandler(87d1348c-b134-473b-ab6b-570bfb826b7d)>
Traceback (most recent call last):
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/ioloop.py", line 921, in _run
val = self.callback()
File "/home/azureuser/.local/lib/python3.8/site-packages/notebook/base/zmqhandlers.py", line 188, in send_ping
self.ping(b'')
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/websocket.py", line 445, in ping
self.ws_connection.write_ping(data)
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/websocket.py", line 1101, in write_ping
self._write_frame(True, 0x9, data)
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/websocket.py", line 1061, in _write_frame
return self.stream.write(frame)
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/iostream.py", line 540, in write
self._write_buffer.append(data)
File "/home/azureuser/.local/lib/python3.8/site-packages/tornado/iostream.py", line 157, in append
b += data # type: ignore
BufferError: Existing exports of data: object cannot be re-sized

Screenshot 2023-03-09 224004(1)

This is where the execution has been stuck and the values are not updating anymore.

Failure to execute 's2.download' workflow task with time_range in January 2022

The 's2.download' workflow task seems to fail consistently with time_range set to include at least January 2022. It seems to work if January is excluded. I have only confirmed this for a few locations in Western Australia, and can't say whether it happens in any other locations.

Failure details message is as follows:

's2.s2.download': RunDetails(start_time=datetime.datetime(2023, 2, 9, 9, 44, 4, 752908), submission_time=datetime.datetime(2023, 2, 9, 9, 44, 4, 705244), end_time=datetime.datetime(2023, 2, 9, 10, 24, 29, 832081), reason='RuntimeError: Received unsupported message header=MessageHeader(type=<MessageType.error: \'error\'>, run_id=UUID(\'29d110b7-0ee2-46e4-91ef-345753402cb7\'), id=\'00-29d110b70ee246e491ef345753402cb7-f04b837f0791e025-01\', parent_id=\'00-29d110b70ee246e491ef345753402cb7-1ac389aaff11fcb8-01\', version=\'1.0\', created_at=datetime.datetime(2023, 2, 9, 10, 24, 29, 824878)) content=ErrorContent(status=<OpStatusType.failed: \'failed\'>, ename="<class \'RuntimeError\'>", evalue=\'Traceback (most recent call last):\
  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/worker.py", line 121, in run_op\
    return factory.build(spec).run(input)\
  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/ops.py", line 111, in run\
    stac_results = self._call_validate_op(**{**items, **raw_items})\
  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/ops.py", line 75, in _call_validate_op\
    results = self.callback(**kwargs)\
  File "/app/ops/download_sentinel2_from_pc/download_s2_pc.py", line 56, in download_product\
    asset_path = collection.download_asset(item.assets[k], self.tmp_dir.name)\
  File "/opt/conda/lib/python3.8/site-packages/vibe_lib/planetary_computer.py", line 109, in download_asset\
    raise RuntimeError(f"Failed asset {asset.href} after {MAX_RETRIES} retries.")\
RuntimeError: Failed asset https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/50/J/LM/2022/01/26/S2B_MSIL2A_20220126T021339_N0400_R060_T50JLM_20220211T070542.SAFE/GRANULE/L2A_T50JLM_A025540_20220126T021913/IMG_DATA/R10m/T50JLM_20220126T021339_B02_10m.tif after 5 retries.\
\', traceback=[\'  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/worker.py", line 306, in run_op_from_message\
    out = self.run_op_with_retry(content, message.run_id)\
\', \'  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/worker.py", line 404, in run_op_with_retry\
    raise RuntimeError("".join(ret.format()))\
\']). Aborting execution.', status='failed')

When attempting to visit the asset url in a browser, I get a 'ResourceNotFound' error.

Thanks in advance for the help!

Question : why do not used Daubechies in deepmc at wpd

Thank you for your research and code.

Micro-climate Prediction - Multi Scale Encoder-decoder based
Deep Learning Framework.
This paper using Daubechied wavelet function, but here code used bior3.5.
please tell me why.

Thank you

VM create fails to complete using bicep - setup script not executed

This is the error I get when following the instructions on VM-SETUP.md

{"status":"Failed","error":{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.","details":[{"code":"Conflict","message":"{\r\n \"status\": \"Failed\",\r\n \"error\": {\r\n \"code\": \"ResourceDeploymentFailure\",\r\n \"message\": \"The 'AzureAsyncOperationWaiting' resource operation completed with terminal provisioning state 'Failed'.\",\r\n \"details\": [\r\n {\r\n \"code\": \"VMExtensionProvisioningError\",\r\n \"message\": \"VM has reported a failure when processing extension 'farmvibes-ai_setup_script'. Error message: \\\"Enable failed: failed to get configuration: invalid configuration: 'commandToExecute' is not specified\\\"\\r\\n\\r\\nMore information on troubleshooting is available at https://aka.ms/VMExtensionCSELinuxTroubleshoot \"\r\n }\r\n ]\r\n }\r\n}"}]}}

Space requirements for Spaceeye Cloud Removal?

After setting up the sentinel-spaceye notebook environment & running the code, the process fails. Currently, I am using the recommended space of ~500GB. But this is clearly not enough. The cluster crashes at the preprocess_s2_improved_masks step. Current RAM: 16GB, Disk Space: ~500GB (100% consumed).

Need the Spaceye pipeline for cloud removal on a large number of data points (500), regions (roughly 20km2 in the area), and times (average period = 20 days). What are the suggested system requirements for this task?

Workflows not terminating in AML compute

This may be related to my other issue. If so, feel free to close this.

After following the quickstart guide to install the newest farmvibes version in an Azure ML compute, any workflow I try to run hangs indefinitely and with no error message. A screenshot is attached.

farmvibes problem

Received Input Validation Errors while running whatif.ipynb notebook for carbon sequestration.

Dear Farmvibes Team,
While running whatif.ipynb notebook I received input validation errors:
1) The scenario must contain at least 2 crop year
2) Input requires at least one simulation scenario (not named 'Current' or 'Baseline').The input parser found 0.

Because of this "farm_ai/carbon_local/carbon_whatif" workflow execution got failed.

Alternate method I tried:
When I add one more year(2021) in scenario.json the workflow executed successfully and I get the value of carbon sequestration as an output.
I have attached modified scenario.json that makes workflow to execute successfully.
I would appreciate your feedback on whether my approach is correct ?

{
    "id": "9a69ef8b-2823-4e5a-b9a9-d3a1eea8dc32",
    "farmId": "d9ceb3b1-1461-4957-b3d1-b5b3af0fd5da",
    "name": "Grape_Table_2022",
    "history": {
        "2022": {
            "crops": [
                {
                    "name": "Alfalfa",
                    "plantedDate": "2022/03/12",
                    "type": "annual crop",
                    "fertilizer": [
                        {
                            "id": "8424a859-8326-4e6b-b0f1-6c482c6df6d8",
                            "date": "2022/03/03",
                            "eep": {
                                "id": 1902,
                                "name": "Slow Release"
                            },
                            "fertilizerType": {
                                "id": 1557,
                                "name": "Calcium Ammonium Nitrate"
                            },
                            "totalFertilizerApplied": "0",
                            "totalNitrogenApplied": "1.0"
                        }
                    ],
                    "harvest": [
                        {
                            "id": "4a40d027-717f-436b-9316-46a32516c798",
                            "gfsrt": "True",
                            "harvestDate": "2022/02/15",
                            "sshrr": "10",
                            "yield": "39"
                        }
                    ],
                    "tillage": [
                        {
                            "id": "63c9e567-2d4a-4d42-8b87-f9c30c40dff4",
                            "date": "2022/04/05",
                            "implement": {
                                "id": 1951,
                                "name": "Intensive Tillage"
                            }
                        }
                    ],
                    "omad": [
                        {
                            "id": "b53c3ba3-c409-4505-b226-2a62547e17c1",
                            "date": "2022/03/03",
                            "type": "Soybean Meal",
                            "amount": "100",
                            "percentN": "0",
                            "CNratio": "1.3"
                        }
                    ]
                }
            ],
            "selected": true
        },
        "2021": {
            "crops": [
                {
                    "name": "Alfalfa",
                    "plantedDate": "2021/06/30",
                    "type": "annual crop",
                    "fertilizer": [
                        {
                            "id": "8424a859-8326-4e6b-b0f1-6c482c6df6d8",
                            "date": "2021/06/30",
                            "eep": {
                                "id": 1902,
                                "name": "Slow Release"
                            },
                            "fertilizerType": {
                                "id": 1557,
                                "name": "Calcium Ammonium Nitrate"
                            },
                            "totalFertilizerApplied": "0",
                            "totalNitrogenApplied": "1.0"
                        }
                    ],
                    "harvest": [
                        {
                            "id": "4a40d027-717f-436b-9316-46a32516c798",
                            "gfsrt": "True",
                            "harvestDate": "2021/09/20",
                            "sshrr": "10",
                            "yield": "39"
                        }
                    ],
                    "tillage": [
                        {
                            "id": "63c9e567-2d4a-4d42-8b87-f9c30c40dff4",
                            "date": "2021/06/15",
                            "implement": {
                                "id": 1951,
                                "name": "Intensive Tillage"
                            }
                        }
                    ],
                    "omad": []
                }
            ],
            "selected": true
        }

    },
    "container": "farm"
}

Error setting up cluster - 'Could not download chart: no cached repo found'

When running 'farmvibes-ai.sh setup', I consistently receive the following error. This started happening within the last week.
I've checked the '/home/azureuser/.cache/helm/repository/' directory, and it does exist, but is empty.
I've also rerun bash ./resources/vm/setup_farmvibes_ai_vm.sh, but the error still hasn't resolved.

module.kubernetes.kubernetes_namespace.kubernetesdaprnamespace: Creating...
module.kubernetes.kubernetes_persistent_volume.user_storage_pv: Creating...
module.kubernetes.kubernetes_namespace.kubernetesdaprnamespace: Creation complete after 0s [id=dapr-system]
module.kubernetes.kubernetes_persistent_volume.user_storage_pv: Creation complete after 1s [id=user-storage-pv]
module.kubernetes.kubernetes_persistent_volume_claim.user_storage_pvc: Creating...
module.kubernetes.kubernetes_persistent_volume_claim.user_storage_pvc: Creation complete after 0s [id=default/user-storage-pvc]
module.kubernetes.helm_release.redis: Creating...
module.kubernetes.helm_release.dapr: Creating...
module.kubernetes.helm_release.rabbitmq: Creating...
╷
│ Error: could not download chart: no cached repo found. (try 'helm repo update'): open /home/azureuser/.cache/helm/repository/bitnami-index.yaml: no such file or directory
│ 
│   with module.kubernetes.helm_release.dapr,
│   on modules/kubernetes/dapr.tf line 7, in resource "helm_release" "dapr":
│    7: resource "helm_release" "dapr" {
│ 
╵
╷
│ Error: could not download chart: no cached repo found. (try 'helm repo update'): open /home/azureuser/.cache/helm/repository/bitnami-index.yaml: no such file or directory
│ 
│   with module.kubernetes.helm_release.rabbitmq,
│   on modules/kubernetes/rabbitmq.tf line 1, in resource "helm_release" "rabbitmq":
│    1: resource "helm_release" "rabbitmq" {
│ 
╵
╷
│ Error: could not download chart: no cached repo found. (try 'helm repo update'): open /home/azureuser/.cache/helm/repository/bitnami-index.yaml: no such file or directory
│ 
│   with module.kubernetes.helm_release.redis,
│   on modules/kubernetes/redis.tf line 1, in resource "helm_release" "redis":
│    1: resource "helm_release" "redis" {
│ 
╵
Error from server (NotFound): deployments.apps "terravibes-rest-api" not found
Error from server (NotFound): deployments.apps "terravibes-rest-api" not found
Error from server (NotFound): deployments.apps "terravibes-orchestrator" not found
Error from server (NotFound): deployments.apps "terravibes-orchestrator" not found
Error from server (NotFound): deployments.apps "terravibes-cache" not found
Error from server (NotFound): deployments.apps "terravibes-cache" not found
Error from server (NotFound): deployments.apps "terravibes-worker" not found
Error from server (NotFound): deployments.apps "terravibes-worker" not found

running `bash farmvibes-ai.sh update` gives me this error

running bash farmvibes-ai.sh update gives me this error

image

Plan: 16 to add, 0 to change, 0 to destroy.
module.kubernetes.kubernetes_namespace.kubernetesdaprnamespace: Creating...
module.kubernetes.kubernetes_persistent_volume.user_storage_pv: Creating...
module.kubernetes.helm_release.redis: Creating...
module.kubernetes.helm_release.rabbitmq: Creating...

│ Error: namespaces "dapr-system" already exists

│ with module.kubernetes.kubernetes_namespace.kubernetesdaprnamespace,
│ on modules/kubernetes/dapr.tf line 1, in resource "kubernetes_namespace" "kubernetesdaprnamespace":
│ 1: resource "kubernetes_namespace" "kubernetesdaprnamespace" {



│ Error: persistentvolumes "user-storage-pv" already exists

│ with module.kubernetes.kubernetes_persistent_volume.user_storage_pv,
│ on modules/kubernetes/persistentvolume.tf line 1, in resource "kubernetes_persistent_volume" "user_storage_pv":
│ 1: resource "kubernetes_persistent_volume" "user_storage_pv" {



│ Error: cannot re-use a name that is still in use

│ with module.kubernetes.helm_release.rabbitmq,
│ on modules/kubernetes/rabbitmq.tf line 1, in resource "helm_release" "rabbitmq":
│ 1: resource "helm_release" "rabbitmq" {



│ Error: cannot re-use a name that is still in use

│ with module.kubernetes.helm_release.redis,
│ on modules/kubernetes/redis.tf line 1, in resource "helm_release" "redis":
│ 1: resource "helm_release" "redis" {

Originally posted by @richstep in #83 (comment)

mc_forecast dosen't work because historical observations depends on AGWeatherNet which doesn't work

After trying a lot with deep_mc notebook , it depends on a backend call to retrieve agweathernet data somehow which obviously doesn't work & limits the workflow scope, it would be great to show an example file of how the data is downloaded & pre-processed to be able to replace it with more generalized historical weather api ( Like : Microsoft Azure Weather Rest API or openweather historical weather) ,
the error is found after running historical_dataset = utils.get_csv_data(path=file_path)
result :

Cell In[11], line 1
----> 1 historical_dataset = utils.get_csv_data(path=file_path)

File ~/farmvibes-ai/notebooks/deepmc/notebook_lib/utils.py:20, in get_csv_data(path, date_attribute, columns_rename, frequency)
     10 def get_csv_data(
     11     path: str,
     12     date_attribute: str = "date",
     13     columns_rename: Dict[str, str] = {},
     14     frequency: str = "60min",
     15 ):
     16     """
     17     Read data from CSV file using Pandas python package.
     18     """
---> 20     data_df = pd.read_csv(path)
     21     data_df[date_attribute] = pd.to_datetime(data_df[date_attribute])
     23     if columns_rename:

File ~/anaconda3/envs/deepmc-pytorch/lib/python3.8/site-packages/pandas/util/_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs)
    209     else:
    210         kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)

File ~/anaconda3/envs/deepmc-pytorch/lib/python3.8/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    325 if len(args) > num_allow_args:
    326     warnings.warn(
    327         msg.format(arguments=_format_argument_list(allow_args)),
    328         FutureWarning,
    329         stacklevel=find_stack_level(),
    330     )
--> 331 return func(*args, **kwargs)

File ~/anaconda3/envs/deepmc-pytorch/lib/python3.8/site-packages/pandas/io/parsers/readers.py:950, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    935 kwds_defaults = _refine_defaults_read(
    936     dialect,
    937     delimiter,
   (...)
    946     defaults={"delimiter": ","},
    947 )
    948 kwds.update(kwds_defaults)
--> 950 return _read(filepath_or_buffer, kwds)

File ~/anaconda3/envs/deepmc-pytorch/lib/python3.8/site-packages/pandas/io/parsers/readers.py:605, in _read(filepath_or_buffer, kwds)
    602 _validate_names(kwds.get("names", None))
    604 # Create the parser.
--> 605 parser = TextFileReader(filepath_or_buffer, **kwds)
    607 if chunksize or iterator:
    608     return parser

File ~/anaconda3/envs/deepmc-pytorch/lib/python3.8/site-packages/pandas/io/parsers/readers.py:1442, in TextFileReader.__init__(self, f, engine, **kwds)
   1439     self.options["has_index_names"] = kwds["has_index_names"]
   1441 self.handles: IOHandles | None = None
-> 1442 self._engine = self._make_engine(f, self.engine)

File ~/anaconda3/envs/deepmc-pytorch/lib/python3.8/site-packages/pandas/io/parsers/readers.py:1735, in TextFileReader._make_engine(self, f, engine)
   1733     if "b" not in mode:
   1734         mode += "b"
-> 1735 self.handles = get_handle(
   1736     f,
   1737     mode,
   1738     encoding=self.options.get("encoding", None),
   1739     compression=self.options.get("compression", None),
   1740     memory_map=self.options.get("memory_map", False),
   1741     is_text=is_text,
   1742     errors=self.options.get("encoding_errors", "strict"),
   1743     storage_options=self.options.get("storage_options", None),
   1744 )
   1745 assert self.handles is not None
   1746 f = self.handles.handle

File ~/anaconda3/envs/deepmc-pytorch/lib/python3.8/site-packages/pandas/io/common.py:856, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    851 elif isinstance(handle, str):
    852     # Check whether the filename is to be opened in binary mode.
    853     # Binary mode does not support 'encoding' and 'newline'.
    854     if ioargs.encoding and "b" not in ioargs.mode:
    855         # Encoding
--> 856         handle = open(
    857             handle,
    858             ioargs.mode,
    859             encoding=ioargs.encoding,
    860             errors=errors,
    861             newline="",
    862         )
    863     else:
    864         # Binary mode
    865         handle = open(handle, ioargs.mode)

FileNotFoundError: [Errno 2] No such file or directory: './data/Palouse/training.csv'

Also , the default environment seems to be missing unfoldN package

please help deepmc setting

Could you please confirm if setting chunk_size = 2620, ts_lookback = 240, and total_models = 130 would be appropriate for predicting 130 hours using DeepMC, with 240 data points in the lookback window and a 60-minute frequency?
Additionally, I would like to know how the value of 524 is calculated for predicting 24 hours using a frequency of 60 minutes.

thank you!

Connection issue while trying to run hello world workflow

Hi,

I have setup a cluster on WSL in my windows machine (Ubuntu 20.04 WSL - 16 cores, 32 GB RAM),

when I am trying to run the hello world workflow, I have been getting the following error

yaswanth@GAVLMUMLT-071:~/Devprojects/farmvibes-ai$ python -m vibe_core.farmvibes_ai_hello_world
INFO:__main__:Successfully obtained a FarmVibes.AI client (addr=http://192.168.49.2:30000)
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 159, in _new_conn
    conn = connection.create_connection(
  File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 665, in urlopen
    httplib_response = self._make_request(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.8/http/client.py", line 1256, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1302, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1251, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1011, in _send_output
    self.send(msg)
  File "/usr/lib/python3.8/http/client.py", line 951, in send
    self.connect()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 187, in connect
    conn = self._new_conn()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 171, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7ff1ffa2c550>: Failed to establish a new connection: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/yaswanth/.local/lib/python3.8/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 719, in urlopen
    retries = retries.increment(
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 436, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='192.168.49.2', port=30000): Max retries exceeded with url: /v0/workflows (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff1ffa2c550>: Failed to establish a new connection: [Errno 110] Connection timed out'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/yaswanth/.local/lib/python3.8/site-packages/vibe_core/farmvibes_ai_hello_world.py", line 45, in <module>
    main()
  File "/home/yaswanth/.local/lib/python3.8/site-packages/vibe_core/farmvibes_ai_hello_world.py", line 29, in main
    LOGGER.info(f"available workflows: {client.list_workflows()}")
  File "/home/yaswanth/.local/lib/python3.8/site-packages/vibe_core/client.py", line 115, in list_workflows
    return self._request("GET", "v0/workflows")
  File "/home/yaswanth/.local/lib/python3.8/site-packages/vibe_core/client.py", line 79, in _request
    response = self.session.request(method, urljoin(self.baseurl, endpoint), *args, **kwargs)
  File "/home/yaswanth/.local/lib/python3.8/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/yaswanth/.local/lib/python3.8/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/home/yaswanth/.local/lib/python3.8/site-packages/requests/adapters.py", line 565, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='192.168.49.2', port=30000): Max retries exceeded with url: /v0/workflows (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff1ffa2c550>: Failed to establish a new connection: [Errno 110] Connection timed out'))

I understand it is a networking issue, and I tried to ping /check the IP address 192.168.49.2 manually, but I cannot ping it. I have checked the ipconfig and I can't find that series in my list of virtual adapters as well.

I have tried restarting cluster, deleting the cluster and fresh installation, etc but the issue remains the same.

Please point me in the right direction to resolve this.

Question: Interpretation of outputs in crop_segmentation.ipynb notebook

I cannot understand the purpose of the crop_segmentation.ipynb notebook. I can understand it shows some differences of the lands, as is visible in the last cell. However, I do not understand how the heat map in the last cell output can be interpreted... 🤔 What does the yellow colour mean? And what does the blue colour mean? Can you help me with that, please?

Thank you!

EDIT

I am sorry, I am not able to add any further info to the issue (assignees, label, project, ...).

Error while setting up farmvibes on Azure VM

I am getting following error while setting up farmvibes on Azure VM.When I run this command " bash farmvibes-ai.sh setup" I get
"error: deployment "terravibes-worker" exceeded its progress deadline"

Here is the log:
Log:
Status: Downloaded newer image for mcr.microsoft.com/farmai/terravibes/cache:prod
mcr.microsoft.com/farmai/terravibes/cache:prod
Installing redis in the cluster...
statefulset.apps/redis-replicas scaled
Installing rabbitmq in the cluster...
secret/rabbitmq-connection-string created
component.dapr.io/control-pubsub created
component.dapr.io/statestore created
service/terravibes-rest-api created
deployment.apps/terravibes-rest-api created
deployment.apps/terravibes-orchestrator created
deployment.apps/terravibes-worker created
deployment.apps/terravibes-cache created
component.dapr.io/control-pubsub unchanged
component.dapr.io/statestore unchanged
service/terravibes-rest-api unchanged
deployment.apps/terravibes-rest-api unchanged
deployment.apps/terravibes-orchestrator unchanged
deployment.apps/terravibes-worker unchanged
deployment.apps/terravibes-cache unchanged
deployment.apps/terravibes-rest-api condition met
deployment "terravibes-rest-api" successfully rolled out
deployment.apps/terravibes-orchestrator condition met
deployment "terravibes-orchestrator" successfully rolled out
error: timed out waiting for the condition on deployments/terravibes-worker
Waiting for deployment "terravibes-worker" rollout to finish: 0 of 3 updated replicas are available...
error: deployment "terravibes-worker" exceeded its progress deadline
deployment.apps/terravibes-cache condition met
deployment "terravibes-cache" successfully rolled out

Success!

FarmVibes.AI REST API is running at http://192.168.49.2:30000

Continuously running Herbie data ingestion workflow

Hello,
I have been attempting to run the Deep Micro Climate notebook in FarmVibes (https://github.com/microsoft/farmvibes-ai/tree/main/notebooks/deepmc). We have set up a VM instance in azure by following the VM setup instructions in the documentation. When we attempted to run the notebook, it created 4 VibeWorkFlowRuns for data ingestion from Herbie forecast (temperature, humidity, u, v). I let this run for a long time (almost 24 hours) and it keeps saying the workflow is running. I have also tried, restarting the RestAPI with the farmvibes-ai.sh file and also reinstalled everything with the farmvibes-ai.sh file. However, despite this, it still is not finishing the run. I suspect it might be a VM configuration issue with it's networking or it might be an issue with the RestAPI setup on this VM. Any help is appreciated. Thanks!

spaceeye.spaceeye.spaceeye in crop segementation notebook crashes the cluster

After setting up Crop Segmentation environment & running the notebook Crop Segmentation multiple times , it now causes the farmvibes-ai (kube) cluster to fail, it becomes unresponsive for 30000 port , yet checking farmvibes-ai.sh status shows it running and restart doesn't work the terminal hangs unresponsively , after using farmvibes-ai.sh stop then start solves the issue yet after a while the same issue reproduces itself .
This is being tried on an azure VM (Standard D16s v3 )( 16 core, 64 GB ram , 2 TB SSD)
Steps to reproduce :
1.$ conda env create -f ./crop_env.yaml
2.$ conda activate crop-seg
3.$ jupyter notebook --allow-root -ip 0.0.0.0
4. then from jupyter notebook run 01_dataset_generation
this has been repeated multiple times using $ farmvibes-ai.sh stop then $ farmvibes-ai.sh start , the current free space at this point is 133GB
cropsegerror
it would be a great help if we have an idea of how long this task(similar data generative tasks) takes to complete on average on (Standard D16s v3 )

Error during cluster setup

Discussed in #33

Originally posted by cedrichelewaut January 28, 2023
Hi,

I am trying to set up my cluster on Ubuntu and I get a message telling me it is created successfully.
However, I am unable to get the cluster info and everything that follows also fails. I installed all dependencies using the provided file.

Here is my output:
image

Error in running notebook land_degradation.ipynb

Dear Farmvibes Team,
While running workflow "farm_ai/land_degradation/landsat_ndvi_trend" in land_degradation.ipynb notebook the task "trend.chunked_linear_trend.linear_trend" did not succeed and encountered an error "missing 1 required positional argument "
I have attached the screenshot of task which was failed along with its log for better understanding.

1 crop

2

Workflow for Training Models

It seems like the default VM created in the VM setup does not have the ability to train models (i.e. has GPUs). Any recommendations on how to setup a VM to train models on Azure and how to have it interact with FarmVibes Cluster created with the default VM?

Possibility of Utilizing Empirical Mode Decomposition for pre-processing the data

Hi,
I have been going through the DeepMC part of the farmvibes.ai project and found that Wavelet Packet decomposition used as a pre-processor to allow models to understand and fit on the underlying patterns in the time-series data.
.
Similar to Wavelet Transform, Empirical Mode Decomposition (EMD) is an empirical, iterative and adaptive algorithm which decomposes a signal into components called Intrinsic Mode Functions (IMFs). EMD is also better suited for climate predictions on variables like temperature, rainfall etc., as it doesn't assume linearity of signal and has no prior assumptions of a basis function.
.
I was wondering if EMD would be a good additional option as a pre-processor along with the Wavelet Decomposition. Do let me know your thoughts. If you do feel it would be useful for the project, I will be happy to contribute the code to add that functionality.
.
References:

  • Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H., Zheng, Q., Yen, N. C., Tung, C. C., & Liu, H. H. (1998). The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 454(1971), 903–995. https://doi.org/10.1098/rspa.1998.0193

Thank you.

Farmvibes Deployment to external service

Hi FarmVibes Team! Thanks again for your work.

I was wondering, if there is an option to deploy the server on one machine and connect using a client from another?

Based on my understanding, it should be straightforward as FarmVibes client already uses API to communicate with the server. However, at the moment the local deployment uses docker bridge network and it is only accessible from a localhost environment. Is it possible to create a port mapping for the REST API?

Another question is regarding .raster_asset.path_or_url function in the results of the run. Will it automatically return the url if remote server is detected?

Connection error

After updating the latest code from FarmVibes.AI GitHub and recreating complete environment again, we have been facing connection error issues while running the notebooks.

E.g., below is an error code for Hello World workflow execution while Terravibes rest APIs are working fine. @http://172.18.0.3:30000/v0/docs

HTTPError: 400 Client Error: Bad Request for url: http://172.18.0.3:30000/v0/runs. Unable to run workflow with provided parameters. HTTPConnectionPool(host='127.0.0.1', port=3500): Max retries exceeded with url: /v1.0/state/statestore/runs?metadata.partitionKey=eywa (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa5188af250>: Failed to establish a new connection: [Errno 111] Connection refused'))

Few other observations:

  1. Bash farmvibes.AI.sh setup gives success after running the script, but it constantly gives a few errors towards the end of the script execution, and we get a connection error while running any notebook, including hellowold.
  2. Bash farmvibes.Ai.sh start scripts take time to complete, and we mostly need to stop it (please refer to the attached screen shot).
  3. Even earlier ( with old code) we were able to run the other notebooks, but it use to take long time to download the data ( e.g. – harvest prediction, weed management, land degradation etc.)

img2
img1

System specs:
Memory : 16GB
Processors : 4
Hard disk (SCSI) : 512GB
Free space : 261 GB

Ability to view and edit workflows

Is there an option to view the actual source code of the workflows? And is there a document describing the available parameters and the outputs types?

It would be very beneficial to understand the inner workings of the workflows and be able to modify them for a particular use case. As en example, at the moment spaceeye pipeline takes up huge amount of space and optimising assets handling (deleting processed tiles, saving only ROI instead of complete assets, etc.) should help to reduce that.

Having Problem in VM Setup.

I am trying to set up VM for farmvibes.ai with docker on Microsoft azure but don,t understand the setup guide.
What does this point mean? (in the picture below)
Screenshot (173)
my terminal shows this error
Screenshot (174)

Can you help me finding out why my deploy does not work

Can you help me finding out why my deploy does not work

***@***.***:~/farmvibes-ai$ az deployment group create
--resource-group contoso \
>    --name farmvibes-ai-vm-demo \
>    --template-file  resources/vm/farmvibes_ai_vm.bicep \
>    --parameters \
>             ssh_public_key="$(cat ‪~/.ssh/id_rsa.pub)" \
>             vm_suffix_name=demo-vm \
>             encoded_script="$(cat resources/vm/setup_farmvibes_ai_vm.sh |
gzip -9 | base64 -w0)"
cat: ‪~/.ssh/id_rsa.pub: No such file or directory
{'code': 'InvalidTemplateDeployment', 'message': "The template deployment
'farmvibes-ai-vm-demo' is not valid according to the validation procedure.
The tracking id is '81fd1e88-4cc2-4471-ba45-446425c8ce60'. See inner errors
for details."}

Inner Errors:
{'code': 'QuotaExceeded', 'message': 'Operation could not be completed as
it results in exceeding approved Total Regional Cores quota. Additional
details - Deployment Model: Resource Manager, Location: eastus2, Current
Limit: 4, Current Usage: 1, Additional Required: 8, (Minimum) New Limit
Required: 9. Submit a request for Quota increase at
https://aka.ms/ProdportalCRP/#blade/Microsoft_Azure_Capacity/UsageAndQuota.ReactView/Parameters/%7B%22subscriptionId%22:%22d530b09b-5368-4991-945e-bab6ffc61884%22,%22command%22:%22openQuotaApprovalBlade%22,%22quotas%22:[%7B%22location%22:%22eastus2%22,%22providerId%22:%22Microsoft.Compute%22,%22resourceName%22:%22cores%22,%22quotaRequest%22:%7B%22properties%22:%7B%22limit%22:9,%22unit%22:%22Count%22,%22name%22:%7B%22value%22:%22cores%22%7D%7D%7D%7D]%7D
by specifying parameters listed in the ‘Details’ section for deployment to
succeed. Please read more about quota limits at
https://docs.microsoft.com/en-us/azure/azure-supportability/regional-quota-requests
'}

-- 
Juan Arce

Storage Software Engineer

Originally posted by @arcesoftware in #45 (comment)

preprocess.s1.download fails with "Found multiple prefixes matching \\'{base_pref}\\'"

Hi,

For a wide range of dates and areas (around Denmark), the preprocess.s1.download task (in e.g. the spaceeye workflow) fails with a "Found multiple prefixes matching \'{base_pref}\'" error:

File "/opt/conda/lib/python3.8/site-packages/vibe_lib/planetary_computer.py", line 421, in get_complete_s1_prefix\n raise RuntimeError(f"Found multiple prefixes matching \'{base_pref}\': {prefixes}")\nRuntimeError: Found multiple prefixes matching \'GRD/2021/5/9/IW/DV/S1A_IW_GRDH_1SDV_20210509T170937_20210509T171002_037814_04768F\': {\'GRD/2021/5/9/IW/DV/S1A_IW_GRDH_1SDV_20210509T170937_20210509T171002_037814_04768F_555A\', \'GRD/2021/5/9/IW/DV/S1A_IW_GRDH_1SDV_20210509T170937_20210509T171002_037814_04768F_CF1A\'}\n'

MRE:

from datetime import datetime
from shapely import geometry as shpg
from vibe_core.client import get_default_vibe_client

client = get_default_vibe_client()
geom = shpg.Point(10.3, 56.3).buffer(0.02, cap_style=3)
time_range = (datetime(2021, 5, 1), datetime(2021, 6, 1))

se_run = client.run(
    "data_ingestion/spaceeye/spaceeye",
    "Test",
    geometry=geom,
    time_range=time_range)

Sentinel-2 Preprocess error - LocalResourceExists

When running a Sentinel-2 workflow for:

  • Coordinates: 121.3853 , -33.27323 , 121.5615 , -33.10911
  • Date range: 2022-1-1 to 2022-12-30

I receive the following error:

 's2.s2.filter': RunDetails(start_time=datetime.datetime(2023, 2, 22, 6, 1, 21, 353780), submission_time=datetime.datetime(2023, 2, 22, 6, 1, 21, 185469), end_time=datetime.datetime(2023, 2, 22, 6, 1, 21, 390884), reason=None, status='done'),
 's2.s2.download': RunDetails(start_time=datetime.datetime(2023, 2, 22, 6, 1, 21, 451478), submission_time=datetime.datetime(2023, 2, 22, 6, 1, 21, 409055), end_time=datetime.datetime(2023, 2, 22, 6, 1, 23, 447684), reason=None, status='done'),
 's2.s2.preprocess': RunDetails(start_time=datetime.datetime(2023, 2, 22, 6, 22, 30, 897563), submission_time=datetime.datetime(2023, 2, 22, 6, 1, 23, 464084), end_time=datetime.datetime(2023, 2, 22, 6, 39, 53, 205683), reason='RuntimeError: Received unsupported message header=MessageHeader(type=<MessageType.error: \'error\'>, run_id=UUID(\'2dcb7a30-af57-4950-8b95-9fd474f95ab2\'), id=\'00-2dcb7a30af5749508b959fd474f95ab2-3c83c6aaf8670721-01\', parent_id=\'00-2dcb7a30af5749508b959fd474f95ab2-1d50c297f4140322-01\', version=\'1.0\', created_at=datetime.datetime(2023, 2, 22, 6, 39, 53, 195758)) content=ErrorContent(status=<OpStatusType.failed: \'failed\'>, ename="<class \'RuntimeError\'>", evalue=\'Traceback (most recent call last):\
  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/worker.py", line 123, in run_op\
    return factory.build(spec).run(input, cache_info)\
  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/ops.py", line 99, in run\
    items_out = self.storage.store(run_id, stac_results, cache_info)\
  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/storage/local_storage.py", line 135, in store\
    raise LocalResourceExistsError(\
vibe_agent.storage.local_storage.LocalResourceExistsError: Op output already exists in storage for stack_sentinel2_bands with id 4d21a03a-4bfe-42ce-9fe2-ab43a7fda690.\
\', traceback=[\'  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/worker.py", line 309, in run_op_from_message\
    out = self.run_op_with_retry(content, message.run_id)\
\', \'  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/worker.py", line 402, in run_op_with_retry\
    raise RuntimeError("".join(ret.format()))\
\']). Aborting execution.', status='failed')}```

Unable to visualize the raster

Dear FarmVibes-Ai team,

I am unable to visualize the output while running the "Hello World.Ipynb" via python client .
Moreover, I am unable to display the output while runing code directly on server.
Anyone could assist me in this regard.

Error while running crop_cycles notebook

Hi Farmvibes Team,

I am trying to run the notebook crop cycles but I keep getting the following error:

eam, timeout, verify, cert, proxies)
    561     if isinstance(e.reason, _SSLError):
    562         # This branch is for urllib3 v1.22 and later.
    563         raise SSLError(e, request=request)
--> 565     raise ConnectionError(e, request=request)
    567 except ClosedPoolError as e:
    568     raise ConnectionError(e, request=request)

ConnectionError: HTTPConnectionPool(host='172.18.0.3', port=32259): Max retries exceeded with url: /v0/system-metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fcb48d233a0>: Failed to establish a new connection: [Errno 111] Connection refused'))

I have taken the latest code from your github and installed the cluster as per the instructions in the quickstart.md document.
I have done the reinstallation twice already. The workflow starts sometimes but eventually gives this errror.
The space on my VM shows the following:

Filesystem      Size  Used Avail Use% Mounted on
/dev/root       2.0T  743G  1.3T  38% /
devtmpfs        7.9G  4.0K  7.9G   1% /dev
tmpfs           7.9G  4.0K  7.9G   1% /dev/shm
tmpfs           1.6G  1.5M  1.6G   1% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           7.9G     0  7.9G   0% /sys/fs/cgroup
/dev/loop0      128K  128K     0 100% /snap/bare/5
/dev/loop1       56M   56M     0 100% /snap/core18/2708

Could you please let me know where I am going wrong?

Error in Running the notebook 03_aml_training.ipynb

Dear Farm Vibes - Team,

I successfully ran the notebooks 01_dataset_generation.ipynb and 02_visualize_dataset.ipynb. But I am facing the error in running the 03_aml_training.ipynb. Anyone could assist me in this regard.

Error Log:

Converting CDLMask CRS from EPSG:5070 to EPSG:32611
Converting CDLMask resolution from 30.0 to 10.0
Traceback (most recent call last):
  File "03_aml_training.py", line 210, in <module>
    save_chips_locally(data.train_dataloader(), os.path.join(AML_DATASET_DIR, "train"))
  File "/home/azureuser/farmvibes-ai/notebooks/crop_segmentation/notebook_lib/modules.py", line 31, in save_chips_locally
    batch = next(iter(dataloader))
  File "/home/azureuser/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 628, in __next__
    data = self._next_data()
  File "/home/azureuser/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
    return self._process_data(data)
  File "/home/azureuser/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
    data.reraise()
  File "/home/azureuser/.local/lib/python3.8/site-packages/torch/_utils.py", line 543, in reraise
    raise exception
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/azureuser/.local/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/azureuser/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/azureuser/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/azureuser/.local/lib/python3.8/site-packages/torchgeo/datasets/geo.py", line 876, in __getitem__
    samples = [ds[query] for ds in self.datasets]
  File "/home/azureuser/.local/lib/python3.8/site-packages/torchgeo/datasets/geo.py", line 876, in <listcomp>
    samples = [ds[query] for ds in self.datasets]
  File "/home/azureuser/farmvibes-ai/notebooks/crop_segmentation/notebook_lib/datasets.py", line 293, in __getitem__
    sample = super().__getitem__(query)
  File "/home/azureuser/.local/lib/python3.8/site-packages/torchgeo/datasets/geo.py", line 427, in __getitem__
    data = self._merge_files(filepaths, query, self.band_indexes)
AttributeError: 'CDLMask' object has no attribute 'band_indexes'

mc_forecast.ipynb herbie download error

OS : Ubuntu 22.04.1 LTS

At mc_forecast.ipynb
https://blaylockbk.github.io/Herbie/_build/html/

404
File not found

The site configured at this address does not contain the requested file.

If this is your site, make sure that the filename case matches the URL.
For root URLs (like http://example.com/) you must provide an index.html file.

Read the full documentation for more information about using GitHub Pages.

GitHub Status@githubstatus,

and

forecast_ = Forecast(
workflow_name=HERBIE_DOWNLOAD_WORKFLOW,
geometry=STATION_GEOMETRY,
time_range=time_range,
parameters=parameters,
)
run_list = forecast_.submit_download_request()

ConnectionError: HTTPConnectionPool(host='192.168.49.2', port=30000): Max retries exceeded with url: /v0/runs (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff1eea52c70>: Failed to establish a new connection: [Errno 110] Connection timed out'))

how to fix??

The farm coordinates obtained from the workflow output do not match those provided as input for the same farm.

Dear Farmvibes Team,
In "crop_cycles.ipynb" notebook the workflow executed successfully but the coordinates generated by the workflow output differ from those provided as input for the same farm.
Also the shape of output graph "Number of growing cycles" generated is not similar to actual geometrical shape of the farm because of which it becomes difficult to interpret which part of farm belongs to which type of crop cycle.
I have attached the text file containing polygonal co-ordinates of farm which is given as input to the workflow and also attached screenshot of coordinates for same farm which we got from workflow output.
I have also attached the screenshot comparing actual farm geometry and shape of output graph "Number of growing cycles" generated.

Input Farm Co-ordinates.txt

1 - Copy

3 - Copy

Incomptabile Version name for vibe_core/setup.py "2023-02-16" makes it fail for enviroments setup

in file farmvibes-ai/src/vibe_core/setup.py the version name "2023-02-16" is incompatible with older versions of pip which is used by the environments so the environment setup fails after PEP 440 error , a quick solve was to locally change the version name to " 2023.2.16" which solved the problem for most environments , yet it's very apparent that different notebooks uses different package versions in their environments which makes them incompatible with each other & makes the API integration harder eventually I hope that wide compatibility gets a better focus in upcoming releases

Are you using FarmVibes.AI?

Are you using FarmVibes.AI?

If you are using FarmVibes.Ai, first we would like to Thank You. Our goal is to grow the community, empower researchers/data scientist to innovate and build their own geospatial AI Models for agriculture and sustainability, helping each other.

The purpose of this issue

We are always interested in finding out who is using FarmVibes.Ai, what attracted you to using it, how we can listen to your needs and if you are interested, help promote your organization.

  • We have people reaching out to us asking, who is using FarmVibes.AI.
  • We’d like to listen to what you would like to see in FarmVibes.AI and your scenarios.
  • We'd like to help promote your organization and work with you.

What we would like from you

Submit a comment in this issue to include the following information, which would then also get included into an ADOPTERS.md file that will be added to this repo for others to see:

  • Your organization or company
  • Link to your website
  • Your country
  • Your contact info to reach out to you: blog, email or Twitter (at least one).
  • What is your scenario for using FarmVibes.AI? Please give as much information as possible that you feel comfortable with. This is about enabling the community to see solutions using FarmVibes.AI.
  • Name of the project (if it has one)
  • Are you running you workflows in testing or production?
  • Link to the project website
  • Can we use your Company Logo on the website or communication (Y/N)
  • Link to your transparent logo image.

Example:

Website: https://mycompanywebsite/
Country: USA
Contact: [email protected]
Usage scenario: Using FarmVibes.Ai to build my own workflow for detecting Harvest dates
Status: Inference Model Deployed/Development/Testing 
Project Name and URL: Project Alpha. Website at http://mycompanywebsite.com/projectalpha 
Can we use your logo on the website or communication? Yes. Logo image available at http://mycompanywebsite.com/projectalpha.png```

Unable to Execute Helloworld workflow

I have set up a virtual machine on Azure using the official documentation and video provided by FarmVibes-AI on GitHub and Microsoft Research. The installation process was successful, but when I attempted to run the "hello world" workflow, I encountered a timeout error. Despite increasing the timeout to 300 seconds, the error persists. I have included screenshots of the issue for your reference.

WhatsApp Image 2023-01-27 at 2 51 57 PM
WhatsApp Image 2023-01-27 at 2 52 46 PM
WhatsApp Image 2023-01-27 at 2 50 55 PM

[Feature request] Out of the box support for NDRE, ReCI indices

Hi, for those who wish to use farm vibes to monitor the crop health in plantations /orchards, out of the box support for a few more indices will be helpful.

Normalized Difference Red-Edge Index

Formula ==> NDRE = (NIR – RedEdge)/(NIR + RedEdge)

Red-Edge Chlorophyll Vegetation Index

Forumla ==> ReCI = (NIR / RED) – 1

Refer this link for more technical details.

[https://www.hiphen-plant.com/vegetation-indices-chlorophyll/3612/#:~:text=Chlorophyll%20Index%20(CI)&text=The%20red%2Dedge%20band%20is,bands%20and%20the%20NIR%20band.]

VM Setup Issue

Getting the following error when I try to setup the VM. I have cloned the repo to my local machine and am running the command as per the guide:

{"status":"Failed","error":{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.","details":[{"code":"Conflict","message":"{\r\n \"status\": \"Failed\",\r\n \"error\": {\r\n \"code\": \"ResourceDeploymentFailure\",\r\n \"message\": \"The resource operation completed with terminal provisioning state 'Failed'.\",\r\n \"details\": [\r\n {\r\n \"code\": \"VMExtensionProvisioningError\",\r\n \"message\": \"VM has reported a failure when processing extension 'farmvibes-ai_setup_script'. Error message: \\\"Enable failed: failed to get configuration: invalid configuration: 'commandToExecute' is not specified\\\"\\r\\n\\r\\nMore information on troubleshooting is available at https://aka.ms/VMExtensionCSELinuxTroubleshoot \"\r\n }\r\n ]\r\n }\r\n}"}]}}

Error in running notebook crop_cycles.ipynb

Dear Farmvibes Team,
While running notebook crop_cycles.ipynb the task "chunk_onnx.chunk_raster" got failed and raise value error " dim size cannot be smaller than chip size"
I have attached the screenshot of task which was failed along with its log for better understanding. How can I resolve this issue ?

1

2

Sentinel-1 list task error for first half of 2022 in some locations

When trying to run the SpaceEye workflow for a location near Esperance in Western Australia (bounding box centre point: 121.4734,-33.19117 ), I encountered the following error, terminating the workflow.

This error did not occur when providing a date range of 2022-3-1 to 2022-12-31, though this only returned SpaceEye outputs for 2022-8-1 onwards.
The error seems to occur only when a date range fully contained by March and August of 2022 is specified at this tile.

'preprocess.s1.list': RunDetails(start_time=datetime.datetime(2023, 2, 15, 3, 27, 13, 915573), submission_time=datetime.datetime(2023, 2, 15, 3, 27, 13, 770448), end_time=datetime.datetime(2023, 2, 15, 3, 27, 26, 732284), reason='RuntimeError: Received unsupported message header=MessageHeader(type=<MessageType.error: \'error\'>, run_id=UUID(\'9dd85457-e49e-44c5-b9bf-7fa995c25d9c\'), id=\'00-9dd85457e49e44c5b9bf7fa995c25d9c-f1305ea3e004f272-01\', parent_id=\'00-9dd85457e49e44c5b9bf7fa995c25d9c-bba3b24cf8250e46-01\', version=\'1.0\', created_at=datetime.datetime(2023, 2, 15, 3, 27, 26, 702399)) content=ErrorContent(status=<OpStatusType.failed: \'failed\'>, ename="<class \'RuntimeError\'>", evalue=\'Traceback (most recent call last):\
  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/worker.py", line 121, in run_op\
    return factory.build(spec).run(input)\
  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/ops.py", line 111, in run\
    stac_results = self._call_validate_op(**{**items, **raw_items})\
  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/ops.py", line 75, in _call_validate_op\
    results = self.callback(**kwargs)\
  File "/app/ops/list_sentinel1_products/list_sentinel1_products_pc.py", line 28, in list_sentinel1_products\
    raise RuntimeError(\
RuntimeError: No product found for time range (datetime.datetime(2022, 4, 1, 0, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2022, 6, 30, 0, 0, tzinfo=datetime.timezone.utc)) and 35 geometries\
\', traceback=[\'  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/worker.py", line 306, in run_op_from_message\
    out = self.run_op_with_retry(content, message.run_id)\
\', \'  File "/opt/conda/lib/python3.8/site-packages/vibe_agent/worker.py", line 404, in run_op_with_retry\
    raise RuntimeError("".join(ret.format()))\
\']). Aborting execution.', status='failed'),

Crop Segementation local training error & AML training error

I've been working my way around crop-segmentation notebook for a while , now I'm finally at the training stage , however I get errors for both local & AML traninging , for local training this what I get in[20] after running trainer.fit(model,data):

AssertionError                            Traceback (most recent call last)
Cell In[20], line 7
      3     model = SegmentationModel.load_from_checkpoint(CHPT_PATH)
      4 else:
      5     # Train it now
      6     #nn.Module
----> 7     trainer.fit(model,data)

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:696, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    677 r"""
    678 Runs the full optimization routine.
    679 
   (...)
    693     datamodule: An instance of :class:`~pytorch_lightning.core.datamodule.LightningDataModule`.
    694 """
    695 self.strategy.model = model
--> 696 self._call_and_handle_interrupt(
    697     self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    698 )

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:650, in Trainer._call_and_handle_interrupt(self, trainer_fn, *args, **kwargs)
    648         return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
    649     else:
--> 650         return trainer_fn(*args, **kwargs)
    651 # TODO(awaelchli): Unify both exceptions below, where `KeyboardError` doesn't re-raise
    652 except KeyboardInterrupt as exception:

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:735, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    731 ckpt_path = ckpt_path or self.resume_from_checkpoint
    732 self._ckpt_path = self.__set_ckpt_path(
    733     ckpt_path, model_provided=True, model_connected=self.lightning_module is not None
    734 )
--> 735 results = self._run(model, ckpt_path=self.ckpt_path)
    737 assert self.state.stopped
    738 self.training = False

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1166, in Trainer._run(self, model, ckpt_path)
   1162 self._checkpoint_connector.restore_training_state()
   1164 self._checkpoint_connector.resume_end()
-> 1166 results = self._run_stage()
   1168 log.detail(f"{self.__class__.__name__}: trainer tearing down")
   1169 self._teardown()

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1252, in Trainer._run_stage(self)
   1250 if self.predicting:
   1251     return self._run_predict()
-> 1252 return self._run_train()

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1283, in Trainer._run_train(self)
   1280 self.fit_loop.trainer = self
   1282 with torch.autograd.set_detect_anomaly(self._detect_anomaly):
-> 1283     self.fit_loop.run()

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py:200, in Loop.run(self, *args, **kwargs)
    198 try:
    199     self.on_advance_start(*args, **kwargs)
--> 200     self.advance(*args, **kwargs)
    201     self.on_advance_end()
    202     self._restarting = False

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py:271, in FitLoop.advance(self)
    267 self._data_fetcher.setup(
    268     dataloader, batch_to_device=partial(self.trainer._call_strategy_hook, "batch_to_device", dataloader_idx=0)
    269 )
    270 with self.trainer.profiler.profile("run_training_epoch"):
--> 271     self._outputs = self.epoch_loop.run(self._data_fetcher)

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py:200, in Loop.run(self, *args, **kwargs)
    198 try:
    199     self.on_advance_start(*args, **kwargs)
--> 200     self.advance(*args, **kwargs)
    201     self.on_advance_end()
    202     self._restarting = False

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py:203, in TrainingEpochLoop.advance(self, data_fetcher)
    200     self.batch_progress.increment_started()
    202     with self.trainer.profiler.profile("run_training_batch"):
--> 203         batch_output = self.batch_loop.run(kwargs)
    205 self.batch_progress.increment_processed()
    207 # update non-plateau LR schedulers
    208 # update epoch-interval ones only when we are at the end of training epoch

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py:200, in Loop.run(self, *args, **kwargs)
    198 try:
    199     self.on_advance_start(*args, **kwargs)
--> 200     self.advance(*args, **kwargs)
    201     self.on_advance_end()
    202     self._restarting = False

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py:87, in TrainingBatchLoop.advance(self, kwargs)
     83 if self.trainer.lightning_module.automatic_optimization:
     84     optimizers = _get_active_optimizers(
     85         self.trainer.optimizers, self.trainer.optimizer_frequencies, kwargs.get("batch_idx", 0)
     86     )
---> 87     outputs = self.optimizer_loop.run(optimizers, kwargs)
     88 else:
     89     outputs = self.manual_loop.run(kwargs)

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py:200, in Loop.run(self, *args, **kwargs)
    198 try:
    199     self.on_advance_start(*args, **kwargs)
--> 200     self.advance(*args, **kwargs)
    201     self.on_advance_end()
    202     self._restarting = False

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:201, in OptimizerLoop.advance(self, optimizers, kwargs)
    198 def advance(self, optimizers: List[Tuple[int, Optimizer]], kwargs: OrderedDict) -> None:  # type: ignore[override]
    199     kwargs = self._build_kwargs(kwargs, self.optimizer_idx, self._hiddens)
--> 201     result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
    202     if result.loss is not None:
    203         # automatic optimization assumes a loss needs to be returned for extras to be considered as the batch
    204         # would be skipped otherwise
    205         self._outputs[self.optimizer_idx] = result.asdict()

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:248, in OptimizerLoop._run_optimization(self, kwargs, optimizer)
    240         closure()
    242 # ------------------------------
    243 # BACKWARD PASS
    244 # ------------------------------
    245 # gradient update with accumulated gradients
    246 else:
    247     # the `batch_idx` is optional with inter-batch parallelism
--> 248     self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
    250 result = closure.consume_result()
    252 if result.loss is not None:
    253     # if no result, user decided to skip optimization
    254     # otherwise update running loss + reset accumulated loss
    255     # TODO: find proper way to handle updating running loss

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:358, in OptimizerLoop._optimizer_step(self, optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
    355     self.optim_progress.optimizer.step.increment_ready()
    357 # model hook
--> 358 self.trainer._call_lightning_module_hook(
    359     "optimizer_step",
    360     self.trainer.current_epoch,
    361     batch_idx,
    362     optimizer,
    363     opt_idx,
    364     train_step_and_backward_closure,
    365     on_tpu=isinstance(self.trainer.accelerator, TPUAccelerator),
    366     using_native_amp=(self.trainer.amp_backend == AMPType.NATIVE),
    367     using_lbfgs=is_lbfgs,
    368 )
    370 if not should_accumulate:
    371     self.optim_progress.optimizer.step.increment_completed()

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1550, in Trainer._call_lightning_module_hook(self, hook_name, pl_module, *args, **kwargs)
   1547 pl_module._current_fx_name = hook_name
   1549 with self.profiler.profile(f"[LightningModule]{pl_module.__class__.__name__}.{hook_name}"):
-> 1550     output = fn(*args, **kwargs)
   1552 # restore current_fx when nested context
   1553 pl_module._current_fx_name = prev_fx_name

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/core/module.py:1674, in LightningModule.optimizer_step(self, epoch, batch_idx, optimizer, optimizer_idx, optimizer_closure, on_tpu, using_native_amp, using_lbfgs)
   1592 def optimizer_step(
   1593     self,
   1594     epoch: int,
   (...)
   1601     using_lbfgs: bool = False,
   1602 ) -> None:
   1603     r"""
   1604     Override this method to adjust the default way the :class:`~pytorch_lightning.trainer.trainer.Trainer` calls
   1605     each optimizer.
   (...)
   1672 
   1673     """
-> 1674     optimizer.step(closure=optimizer_closure)

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py:168, in LightningOptimizer.step(self, closure, **kwargs)
    165     raise MisconfigurationException("When `optimizer.step(closure)` is called, the closure should be callable")
    167 assert self._strategy is not None
--> 168 step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
    170 self._on_after_step()
    172 return step_output

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py:216, in Strategy.optimizer_step(self, optimizer, opt_idx, closure, model, **kwargs)
    206 """Performs the actual optimizer step.
    207 
    208 Args:
   (...)
    213     **kwargs: Any extra arguments to ``optimizer.step``
    214 """
    215 model = model or self.lightning_module
--> 216 return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py:153, in PrecisionPlugin.optimizer_step(self, model, optimizer, optimizer_idx, closure, **kwargs)
    151 if isinstance(model, pl.LightningModule):
    152     closure = partial(self._wrap_closure, model, optimizer, optimizer_idx, closure)
--> 153 return optimizer.step(closure=closure, **kwargs)

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:65, in _LRScheduler.__init__.<locals>.with_counter.<locals>.wrapper(*args, **kwargs)
     63 instance._step_count += 1
     64 wrapped = func.__get__(instance, cls)
---> 65 return wrapped(*args, **kwargs)

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/torch/optim/optimizer.py:113, in Optimizer._hook_for_profile.<locals>.profile_hook_step.<locals>.wrapper(*args, **kwargs)
    111 profile_name = "Optimizer.step#{}.step".format(obj.__class__.__name__)
    112 with torch.autograd.profiler.record_function(profile_name):
--> 113     return func(*args, **kwargs)

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/torch/autograd/grad_mode.py:27, in _DecoratorContextManager.__call__.<locals>.decorate_context(*args, **kwargs)
     24 @functools.wraps(func)
     25 def decorate_context(*args, **kwargs):
     26     with self.clone():
---> 27         return func(*args, **kwargs)

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/torch/optim/adam.py:118, in Adam.step(self, closure)
    116 if closure is not None:
    117     with torch.enable_grad():
--> 118         loss = closure()
    120 for group in self.param_groups:
    121     params_with_grad = []

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py:138, in PrecisionPlugin._wrap_closure(self, model, optimizer, optimizer_idx, closure)
    125 def _wrap_closure(
    126     self,
    127     model: "pl.LightningModule",
   (...)
    130     closure: Callable[[], Any],
    131 ) -> Any:
    132     """This double-closure allows makes sure the ``closure`` is executed before the
    133     ``on_before_optimizer_step`` hook is called.
    134 
    135     The closure (generally) runs ``backward`` so this allows inspecting gradients in this hook. This structure is
    136     consistent with the ``PrecisionPlugin`` subclasses that cannot pass ``optimizer.step(closure)`` directly.
    137     """
--> 138     closure_result = closure()
    139     self._after_closure(model, optimizer, optimizer_idx)
    140     return closure_result

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:146, in Closure.__call__(self, *args, **kwargs)
    145 def __call__(self, *args: Any, **kwargs: Any) -> Optional[Tensor]:
--> 146     self._result = self.closure(*args, **kwargs)
    147     return self._result.loss

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:132, in Closure.closure(self, *args, **kwargs)
    131 def closure(self, *args: Any, **kwargs: Any) -> ClosureResult:
--> 132     step_output = self._step_fn()
    134     if step_output.closure_loss is None:
    135         self.warning_cache.warn("`training_step` returned `None`. If this was on purpose, ignore this warning...")

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:407, in OptimizerLoop._training_step(self, kwargs)
    398 """Performs the actual train step with the tied hooks.
    399 
    400 Args:
   (...)
    404     A ``ClosureResult`` containing the training step output.
    405 """
    406 # manually capture logged metrics
--> 407 training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
    408 self.trainer.strategy.post_training_step()
    410 model_output = self.trainer._call_lightning_module_hook("training_step_end", training_step_output)

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1704, in Trainer._call_strategy_hook(self, hook_name, *args, **kwargs)
   1701     return
   1703 with self.profiler.profile(f"[Strategy]{self.strategy.__class__.__name__}.{hook_name}"):
-> 1704     output = fn(*args, **kwargs)
   1706 # restore current_fx when nested context
   1707 pl_module._current_fx_name = prev_fx_name

File ~/anaconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py:358, in Strategy.training_step(self, *args, **kwargs)
    356 with self.precision_plugin.train_step_context():
    357     assert isinstance(self.model, TrainingStep)
--> 358     return self.model.training_step(*args, **kwargs)

File ~/farmvibes-ai-main/notebooks/crop_segmentation/notebook_lib/models.py:91, in SegmentationModel.training_step(self, batch, batch_idx)
     90 def training_step(self, batch: Dict[str, Any], batch_idx: int) -> Dict[str, Any]:
---> 91     return self._shared_step(batch, batch_idx)

File ~/farmvibes-ai-main/notebooks/crop_segmentation/notebook_lib/models.py:78, in SegmentationModel._shared_step(self, batch, batch_idx)
     76 pred = self(batch["image"])
     77 for t in pred, batch["mask"]:
---> 78     assert torch.all(torch.isfinite(t))
     79 loss = self.loss(pred, batch["mask"])
     81 return {"loss": loss, "preds": pred.detach(), "target": batch["mask"]}

AssertionError: 

And for the AML training , the job fails after submitting it to a compute instant , again it appears to has something to do with inter-package compatibility , the error massage from AML:

Execution failed. User process 'python' exited with status code 1. Please check log file 'user_logs/std_log.txt' for error details. Error: Traceback (most recent call last):
  File "/mnt/azureml/cr/j/85b9fae7d3cd4fcf936315a8324fde05/exe/wd/aml_train_script.py", line 4, in <module>
    import torch
  File "/azureml-envs/azureml_a197fc75079e2e0b8afc1441914b3e27/lib/python3.10/site-packages/torch/__init__.py", line 217, in <module>
    _load_global_deps()
  File "/azureml-envs/azureml_a197fc75079e2e0b8afc1441914b3e27/lib/python3.10/site-packages/torch/__init__.py", line 178, in _load_global_deps
    _preload_cuda_deps()
  File "/azureml-envs/azureml_a197fc75079e2e0b8afc1441914b3e27/lib/python3.10/site-packages/torch/__init__.py", line 158, in _preload_cuda_deps
    ctypes.CDLL(cublas_path)
  File "/azureml-envs/azureml_a197fc75079e2e0b8afc1441914b3e27/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /azureml-envs/azureml_a197fc75079e2e0b8afc1441914b3e27/lib/python3.10/site-packages/nvidia/cublas/lib/libcublas.so.11: symbol cublasLtGetStatusString version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

Could really use help in moving forward from this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.