Git Product home page Git Product logo

Comments (11)

boraozgen avatar boraozgen commented on September 13, 2024 2

Thanks for the replies. I appreciate that some of the points are already addressed and others are being addressed in PRs.

It has been some time since I wrote the original feedback and in the meantime I did some design changes to the application. I switched to an always-connected design with an always-listening OTA client, which makes most of the points in this post not applicable to my case. I still believe that there could be use cases where the application needs more control on the OTA client, therefore they are still valid.

As most of the points are addressed, I am closing this thread. Any unresolved points can be tracked in individual issues.

from ota-for-aws-iot-embedded-sdk.

pvyawaha avatar pvyawaha commented on September 13, 2024

Hello,

1.Not possible to defer the download after receiving an update job to a future time, e.g. using a StartDownload command.

When a job document is received the application callback ( otaAppCallback in demo ) has an event OtaJobEventReceivedJob . This event has data associated with it of type OtaJobDocument_t which has fields -
pJobDocJson Pointer to entire job document ( JSON )
jobDocLength Size of the job document
pJobId pointer to Job ID
jobIdLength Job ID length
fileTypeId File type ID

The callback event is called when the received job document json is valid and has execution key. When this even is received application can wait for an event if it is short wait like winding down some application specific task, or we can suspend and resume OTA Agent when application is ready or even shutdown and restart using the OTA Agent APIs.

The reason I am recommending to suspend/resume OTA Agent or even shutdown/restart if the update needs to be deferred for longer period is the job might have been cancelled/timedout/aborted in the service and no longer valid. So that validation is done when we resume/restart the OTA agent.

  1. Not possible to get the result of the job document after a job document is received. Does an update job exist or is the job document empty? Can we move on with the application or is the download in progress? This is critical for applications which would like to suspend other operations while the download is in progress.

The OTA_GetState gives the state of type OtaState_t which has four states for file and download operations -

`OtaAgentStateCreatingFile` - File is being created to receive the update file or on embedded platforms without file system it can be erasing a flash partition.
`OtaAgentStateRequestingFileBlock` - The OTA Agent is sending requests for file blocks to the service.
`OtaAgentStateWaitingForFileBlock ` - The OTA Agent is waiting for response from the service for file blocks.
`OtaAgentStateClosingFile` - File is being closed and crypto signature validated as the download is complete.

Calling OTA_GetStatistics gets the OtaAgentStatistics_t which has information regarding current download -

otaPacketsReceived - Number of packets received from the service
otaPacketsQueued - Number of packets queued for processing but not yet written to file/flash
otaPacketsProcessed - Number of packets processed ( cbor decoded ) and written to file/flash and marked in the received biotmap
otaPacketsDropped - Number of packets dropped

3.How to figure out when an external MQTT_ProcessLoop is required to continue with the download? For systems with long socket receive timeouts, calling ProcessLoop (i.e. socket recv) repeatedly causes unnecessary delays and would be nice to avoid if the OTA agent does not expect a response at the moment.

The MQTT_ProcessLoop is called outside the OTA library and in the OTA demo so can be used as per application requirement. When OTA Agent is not processing a job it is only waiting for a job notification from the service. The application can keep the OTA agent running or periodically start or shutdown it. Can you please share if you are using this on the linux or FreeRTOS OS? we have released MQTT Agent for FreeRTOS that helps in thread safety and connection sharing. https://freertos.org/mqtt-agent/index.html

from ota-for-aws-iot-embedded-sdk.

boraozgen avatar boraozgen commented on September 13, 2024

Thank you for the quick and detailed reply. OtaJobEventReceivedJob seems to be useful to notify about the update job, however it is not possible to distinguish it from a job in self-test state. Furthermore it would be nice to get a notification about an empty job document too. In that case I am receiving an OtaJobEventProcessed but I suppose it does not uniquely identify an empty job document. Therefore I have to wait some time and check if the state is OtaAgentStateWaitingForJob to conclude that there are no updates, which is not ideal. IMHO, some kind of "get job state" API would be helpful.

In short, what is currently the best way to check if there are no updates?

Deferring the download using suspend/resume makes sense. I will look into this.

I am using Mbed OS and a custom MQTT agent based on coreMQTT.

from ota-for-aws-iot-embedded-sdk.

boraozgen avatar boraozgen commented on September 13, 2024

Regarding suspending after the job document is received: The OTA agent still goes on with creating the download file and subscribing to the stream topic immediately after the job document is received, even though a suspend command is sent. I think these steps should be done after another user-triggered event. Furthermore, even if the agent is suspended before a file block is requested, it causes an "unexpected event" error:

[ERR ][OTA]: Received unexpected event: Current state=[Suspended], Event received=[RequestFileBlock]

from ota-for-aws-iot-embedded-sdk.

boraozgen avatar boraozgen commented on September 13, 2024

I would like to report on my progress of the integration of the library. My goal is to provide feedback and maybe receive some advice on the usage of the library. After struggling some time to find out how every state and state transition works, I managed to get it running together with my application. Please understand the negative connotation as a compliment to the rest of the library, as the low-level porting layer is very well designed and porting process was very smooth. I cannot say the same for the high-level integration. I appreciate that the API is very flexible but it is pretty hard to use, requiring deep knowledge about the inner workings of the library.

Firstly, let me describe my use case: I have an application on a Cortex-M target running Mbed OS where I publish some data and check my shadows periodically. I would like to check for updates at a certain time of the day and perform the update. I don't want to receive any update information in between.

Therefore I imagined an API like this:

  • Initialize on startup
  • Start the OTA agent after the MQTT connection (to handle self-test)
  • Check for updates and download.

Initialization is accomplished by OTA_Init, so that is settled.

For the startup after the connection, I signal an OtaAgentEventStart event. This requests a job document (makes sense for self-test), but in case a new update exists, it starts downloading the update immediately. To stop the update, the agent must then be suspended explicitly, and the point to do this is unclear. In case of a self-test, we would like to receive the document and keep working (multiple ProcessLoops are required to finish self-test) until no new messages are received and we are still in OtaAgentStateWaitingForJob state. Therefore I cannot suspend the agent immediately after starting it. I had to build a relatively complicated logic to handle this. I think adding additional states like "update available" and "performing self-test" would be better.

Another issue that I had was to decide when to call MQTT ProcessLoop. I don't have an architecture where I have a background task which calls ProcessLoop repeatedly, therefore I have to call it on the same task where I issue the events to the agent. Probably it would be easier to integrate the library with such a separate looping task.

In a related issue, it would also be nice not to require a separate task/thread for the OTA agent, i.e. a synchronous API. This is solved pretty well with the new IoT SDK, whereas somehow the OTA agent is designed with concurrency requirement.

Some other issues I encountered:

  • Suspending the agent does not unsubscribe from the jobs topic, while the resume event subscribes and requests a job document. This leads to the receival of two documents after resuming, causing an "unexpected event" error. The operation is not affected by this event though. Still, I think an option/API to unsubscribe from the topics is required, if not done automatically on suspend. This also causes receival of the job documents during suspended state, again causing unexpected events. Shutdown event triggers unsubscribe, but it also causes a larger cleanup, requiring reinitialization, which is IMO not appropriate.
  • The events of the app callback are poorly documented. I had to go through the source to find out which event is triggered when. Furthermore, many times the events do not provide helpful information. E.g. there is an event triggered after each processed block, which is an obscure OtaJobEventProcessed, with the block data attached to it without any context. Also mentioned above, OtaJobEventReceivedJob is emitted both on self-test and new update documents. For an empty document, again OtaJobEventProcessed is emitted. These events should be more distinguishable.
  • I noticed that some state transitions are not printed or printed incorrectly to the logs. I can point to the exact locations if required.

Finally a design question: The self-test is executed after the connection is established to the broker. What happens if a connection cannot be established? I assume the application is expected to detect this and revert to the previous image. How is the job document handled in this case? Does the agent recognise this situation and report a failed update?

from ota-for-aws-iot-embedded-sdk.

boraozgen avatar boraozgen commented on September 13, 2024

Another issue that I encountered today as I am testing the request momentum condition: The agent stops itself when the momentum is reached. This causes a situation where we must reinitialize the agent and restart the event processing thread, which is not ideal for RTOS-based systems. Couldn't we go for a suspend instead?

from ota-for-aws-iot-embedded-sdk.

pvyawaha avatar pvyawaha commented on September 13, 2024

Hello ,

We are trying to reproduce both the issues and looking into addition of - No Active Job event. I will update it here soon.

from ota-for-aws-iot-embedded-sdk.

fhars avatar fhars commented on September 13, 2024

Not unsubscribing when suspending is sort of consistent. If I remember the stuff I've read correctly, the suspend/resume logic has originally been intended to handle loss of network connectivity, so there would be no place to send the unsubscribe requests anyway.

from ota-for-aws-iot-embedded-sdk.

boraozgen avatar boraozgen commented on September 13, 2024

There should be a way to disable the agent (and the related communication) and enable it again afterwards, for the use case where the updates are only checked periodically. If this is not intended by suspend/resume, another API should be added.

from ota-for-aws-iot-embedded-sdk.

boraozgen avatar boraozgen commented on September 13, 2024

Is there an ETA for possible fixes/additions?

from ota-for-aws-iot-embedded-sdk.

ActoryOu avatar ActoryOu commented on September 13, 2024

Hello,
Firstly, thanks for bringing in use cases.

For the startup after the connection, I signal an OtaAgentEventStart event. This requests a job document (makes sense for self-test), but in case a new update exists, it starts downloading the update immediately. To stop the update, the agent must then be suspended explicitly, and the point to do this is unclear. In case of a self-test, we would like to receive the document and keep working (multiple ProcessLoops are required to finish self-test) until no new messages are received and we are still in OtaAgentStateWaitingForJob state. Therefore I cannot suspend the agent immediately after starting it. I had to build a relatively complicated logic to handle this. I think adding additional states like "update available" and "performing self-test" would be better.

I think we can callback to user layer when we start to perform "new update" or "self-test" after receiving the job document. I'll update this later.

Another issue that I had was to decide when to call MQTT ProcessLoop. I don't have an architecture where I have a background task which calls ProcessLoop repeatedly, therefore I have to call it on the same task where I issue the events to the agent. Probably it would be easier to integrate the library with such a separate looping task.

In a related issue, it would also be nice not to require a separate task/thread for the OTA agent, i.e. a synchronous API. This is solved pretty well with the new IoT SDK, whereas somehow the OTA agent is designed with concurrency requirement.

I think #441 might help. It provides a single loop for user to call to process OTA events.

Suspending the agent does not unsubscribe from the jobs topic, while the resume event subscribes and requests a job document. This leads to the receival of two documents after resuming, causing an "unexpected event" error. The operation is not affected by this event though. Still, I think an option/API to unsubscribe from the topics is required, if not done automatically on suspend. This also causes receival of the job documents during suspended state, again causing unexpected events. Shutdown event triggers unsubscribe, but it also causes a larger cleanup, requiring reinitialization, which is IMO not appropriate.

In this scenario, I suggest you to use OTA_Shutdown with unsubscribe flag to stop the whole OTA, and re-init/restart it when you want to check if any update from cloud side. Currently we don't support unsubscription in suspend.

The events of the app callback are poorly documented. I had to go through the source to find out which event is triggered when. Furthermore, many times the events do not provide helpful information. E.g. there is an event triggered after each processed block, which is an obscure OtaJobEventProcessed, with the block data attached to it without any context. Also mentioned above, OtaJobEventReceivedJob is emitted both on self-test and new update documents. For an empty document, again OtaJobEventProcessed is emitted. These events should be more distinguishable.

#443 added more description for events. Please take a look.

I noticed that some state transitions are not printed or printed incorrectly to the logs. I can point to the exact locations if required.

I'm not sure about this. Do you mean OTA change the state to "OtaAgentStateReady" in OTA_EventProcessingTask?

Finally a design question: The self-test is executed after the connection is established to the broker. What happens if a connection cannot be established? I assume the application is expected to detect this and revert to the previous image. How is the job document handled in this case? Does the agent recognise this situation and report a failed update?

User should start OTA library even there is no connection. You can take stm32u5 demo as reference. OTA starts a timer in starthandler. If there is no job coming from cloud before timeout in self-test, the OTA reverts the image to the previous version.

Thanks.

from ota-for-aws-iot-embedded-sdk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.