Git Product home page Git Product logo

ota-for-aws-iot-embedded-sdk's Introduction

AWS IoT Over-the-air Update Library

Getting Started With OTA

As mentioned in the previous 'Upcoming changes' section, the new "core" OTA libraries have been released. These modular and composable libraries can be utilized to implement an OTA 'orchestrator' which sequences the libraries to achieve Over-The-Air Update functionality. The composable nature of the OTA orchestrator will allow for new backing services, both supported by AWS and not.

This library will remain available however it will not be developed further. Support will instead be focused on the new composable libraries and example orchestrators.

For more information, see the following:

  1. FreeRTOS.org webpage explaining the modular OTA concept
  2. Example: Simple OTA Orchestrator
  3. Example: OTA Agent Orchestrator
    1. This one is written to ease the transition of applications using this SDK.

And for the composable libraries, see:

  1. Jobs Library which also contains additional support for AWS IoT OTA jobs
  2. MQTT Streaming Library for file block downloads over MQTT
  3. coreHTTP for file block downloads over HTTP

Description

API Documentation Pages for current and previous releases of this library can be found here

The OTA library enables you to manage the notification of a newly available update, download the update, and perform cryptographic verification of the firmware update. Using the library, you can logically separate firmware updates from the application running on your devices. The OTA library can share a network connection with the application, saving memory in resource-constrained devices. In addition, the OTA library lets you define application-specific logic for testing, committing, or rolling back a firmware update. The library supports different application protocols like Message Queuing Telemetry Transport (MQTT) and Hypertext Transfer Protocol (HTTP), and provides various configuration options you can fine tune depending on network type and conditions. This library is distributed under the MIT Open Source License.

This library has gone through code quality checks including verification that no function has a GNU Complexity score over 10. This library has also undergone static code analysis from Coverity static analysis.

See memory requirements for this library here.

AWS IoT Over-the-air Update Library v3.4.0 source code is part of the FreeRTOS 202210.00 LTS release.

AWS IoT Over-the-air Update Library v3.3.0 source code is part of the FreeRTOS 202012.01 LTS release.

Upcoming Changes

This library will be deprecated in 2024. Please see Getting Started With OTA

AWS IoT Over-the-air Updates Config File

The AWS IoT Over-the-air Updates library exposes configuration macros that are required for building the library. A list of all the configurations and their default values are defined in ota_config_defaults.h. To provide custom values for the configuration macros, a custom config file named ota_config.h can be provided by the user application to the library.

By default, a ota_config.h custom config is required to build the library. To disable this requirement and build the library with default configuration values, provide OTA_DO_NOT_USE_CUSTOM_CONFIG as a compile time preprocessor macro.

Building the Library

The otaFilePaths.cmake file contains the information of all source files and the header include paths required to build the AWS IoT Over-the-air Updates library.

As mentioned in the previous section, either a custom config file (i.e. ota_config.h) OR the OTA_DO_NOT_USE_CUSTOM_CONFIG macro needs to be provided to build the AWS IoT Over-the-air Updates library.

For a CMake example of building the AWS IoT Over-the-air Updates library with the otaFilePaths.cmake file, refer to the coverity_analysis library target in the test/CMakeLists.txt file.

Building Unit Tests

Checkout CMock Submodule

By default, the submodules in this repository are configured with update=none in .gitmodules to avoid increasing clone time and disk space usage of other repositories (like AWS IoT Device SDK for Embedded C that submodules this repository).

To build unit tests, the submodule dependency of CMock is required. Use the following command to clone the submodule:

git submodule update --checkout --init --recursive test/unit-test/CMock source/dependency/coreJSON source/dependency/3rdparty/tinycbor

Platform Prerequisites

  • Linux
  • For building the library, CMake 3.13.0 or later and a C90 compiler.
  • For running unit tests, Ruby 2.0.0 or later is additionally required for the CMock test framework (that we use).
  • For running the coverage target, gcov and lcov are additionally required.

Steps to build unit tests

  1. Go to the root directory of this repository. (Make sure that the CMock submodule is cloned as described above.)

  2. Run the cmake command: cmake -S test -B build

  3. Run this command to build the library and unit tests: make -C build all

  4. The generated test executables will be present in build/bin/tests folder.

  5. Run cd build && ctest to execute all tests and view the test run summary.

Migration Guide

How to migrate from v2.0.0 (Release Candidate) to v3.4.0

The following table lists equivalent API function signatures in v2.0.0 (Release Candidate) and v3.4.0 declared in ota.h

v2.0.0 (Release Candidate) v3.4.0 Notes
OtaState_t OTA_Shutdown( uint32_t ticksToWait ); OtaState_t OTA_Shutdown( uint32_t ticksToWait, uint8_t unsubscribeFlag ); unsubscribeFlag indicates if unsubscribe operations should be performed from the job topics when shutdown is called. Set this as 1 to unsubscribe, 0 otherwise.

How to migrate from version 1.0.0 to version 3.4.0 for OTA applications

Refer to OTA Migration document for the summary of updates to the API. Migration document for OTA PAL also provides a summary of updates required for upgrading the OTA-PAL to work with v3.4.0 of the library.

Porting

In order to support AWS IoT Over-the-air Updates on your device, it is necessary to provide the following components:

  1. Port for the OTA Portable Abstraction Layer (PAL).

  2. OS Interface

  3. MQTT Interface

For enabling data transfer over HTTP dataplane the following component should also be provided:

  1. HTTP Interface

NOTE When using OTA over HTTP dataplane, MQTT is required for control plane operations and should also be provided.

CBMC

To learn more about CBMC and proofs specifically, review the training material here.

The test/cbmc/proofs directory contains CBMC proofs.

In order to run these proofs you will need to install CBMC and other tools by following the instructions here.

CBMC Locally

To run a single CBMC proof locally, you can build the Makefile in any of the CBMC proofs. The Makefile is located in the test/cbmc/proof/<the proof you want>/ directory.

Running make will produce a HTML-based report nearly identical to the one produced by the CI step.

A couple notes about CBMC Proofs

  • macOS doesn't implement POSIX message queues (mqueue.h);
  • It is possible that macOS fails to recognize your loop unwinding identifiers for function from the C standard libraries. For this case, you'll want to use the __builtin___<function>_chk identifier, e.g., instead of using memcpy add __builtin___memcpy_chk.
    • For example, the requestJob_Mqtt proof fails on macOS with the following error:
Loop unwinding failures
[trace] __builtin___strncpy_chk.unwind.0 in line 36 in file <builtin-library-__builtin___strncpy_chk>

To solve this issue, replace strncpy with __builtin___strncpy_ch on this line.

Reference examples

Please refer to the demos of the AWS IoT Over-the-air Updates library in the following location for reference examples on POSIX and FreeRTOS:

Platform Location
POSIX AWS IoT Device SDK for Embedded C
FreeRTOS FreeRTOS/FreeRTOS
FreeRTOS FreeRTOS AWS Reference Integrations

Documentation

Existing Documentation

For pre-generated documentation, please see the documentation linked in the locations below:

Location
AWS IoT Device SDK for Embedded C
FreeRTOS.org

Note that the latest included version of coreMQTT may differ across repositories.

Generating documentation

The Doxygen references were created using Doxygen version 1.9.2. To generate the Doxygen pages, please run the following command from the root of this repository:

doxygen docs/doxygen/config.doxyfile

Contributing

See CONTRIBUTING.md for information on contributing.

ota-for-aws-iot-embedded-sdk's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ota-for-aws-iot-embedded-sdk's Issues

[DOC] What is the current state of this library?

Following the link from the aws-iot-device-sdk-embedded-C repository, I got the information here, that the library will be deprecated in 2024, which basically signals to me, that basing a new project on this library is probably not a good idea.

  • Why is there is still the link to this library from the aws-iot-device-sdk-embedded-C library, if this library will be deprecated soon?
  • It looks like there is active development at this library, but there was no new version released in the last 2 years?

Assumed, I want to have a take on the "AWS IoT Over-the-air Update" feature, how would I start a new development?

Read outside of Json buffer

The printf format to LogDebug expects pJson to be null-terminated, but its length is given by messageLength. Thus it will read more than expected.

LogDebug( ( "JSON received: %s", pJson ) );

Something like this should fix the issue.
LogDebug(("JSON received: %.*s", messageLength, pJson));

[BUG] OTA API and OTA_EventProcessingTask is not task/thread safe when it comes to accessing common state.

Describe the bug
The OTA API and the task that is expected to be used use common data values without synchronization between tasks/threads. The OTA implementation is NOT Thread/Task safe.

There is a gross error in the way portions of the otaAgent internal state is being read/modified/written. Portions of it are assumed to be atomic across all tasks/threads but there are no guarantees that this is the case.

There are 3 Potential tasks/threads where actions can be performed and are currently in contention:

  • Application task - executing the OTA_* api - eg: OTA_Shutdown(), OTA_GetState(), OTA_SignalEvent(), OTA_ActivateNewImage(), etc.
  • OTA_EventProcessingTask()
  • Network task (mqtt or http) executing the callbacks
  • Timer task - for timers - This one is okay because all of the timers used are sending Events via a queue to the EventProcessingTask.

For the state and or callbacks there is no synchronization barrier (eg a semaphore or mutex) of the otaAgent information when any of these three tasks are accessing the otaAgent common control block.

These values MUST be either specified as atomic OR consumed within a semaphore/mutex lock so that actions performed upon them by either a task calling the OTA_*() API functions or the task running OTA_EventProcessingTask() will not inadvertently overwrite the values - especially within code portions that have - read - decision - write

I'm only providing the examples pertaining to the API (App -> OTA_EventProcessingTask()) but there are most likely others between the Network registered callbacks and the OTA_EventProcessingTask() as well.

Eg: the OTA_Init()

This should have something along the lines of:

    if (otaAgent.lock == NULL)
    {
        otaAgent.lock = xSemaphoreCreateMutex();
        assert(otaAgent.lock != NULL);
    }
    BaseType_t semRet = xSemaphoreTake(otaAgent.lock, portMAX_DELAY);
    assert(pdTRUE == semRet);
    (void)semRet;
    // All reads and/or modifications of otaAgent and it's associated values.
    //  Lines - https://github.com/aws/ota-for-aws-iot-embedded-sdk/blob/c3bd5840979cadfe1f9505e13e49cccb87333650/source/ota.c#L3264-L3347

    semRet            = xSemaphoreGive(otaAgent.lock);
    assert(pdTRUE == semRet);

Other API's that require this type of change are:

  • OTA_Shutdown - requires local copy of state and then return outside of semaphore/mutex lock.
  • OTA_GetState - requires local copy of state and then return outside of semaphore/mutex lock.
  • OTA_GetStatistics - otherwise portions of the stats may not be correct relative to each other. - might suggest a separate lock for this.
  • OTA_ActivateNewImage - requires creating a local copy of the ?? activateFn = otaAgent.pOtaInterface->pal.activate and then using that if not null.
  • OTA_SetImageState - required when setImageStateWithReason() is used.
  • OTA_GetImageState - requires creating a local copy of the imageState within a lock.
  • OTA_Suspend - should move that code into the action performed by the OtaAgentEventSuspend message being received by the OTA_EventProcessingTask
  • OTA_Resume - stopped here - you get the idea...
  • OTA_SignalEvent - for the statisitcs and read of state - the stats should probably have their own lock

API that looks to be okay:

  • OTA_CheckForUpdate()
  • OTA_Err_strerror()
  • OTA_JobParse_strerror
  • OTA_PalStatus_strerror
  • OTA_OsStatus_strerror

As mentioned, did not check any of the handlers that are registered to the network - but assuming there are most likely the same level of issue here.

Host

  • Host OS: Linux - but this is ANY OS including FreeRTOS
  • Version: Ubuntu 18.04

To Reproduce

  • N/A - done by inspection, but Could reproduce by running this through Thread Sanitizer (clang) and discovering the errors.

Expected behavior

See Above - expected all API calls that use or modify otaAgent.* internal construct - which is used by other tasks, the access of those fields are protected by a semaphore and/or mutex.

Screenshots

N/A

Wireshark logs

N/A

Additional context

N/A

OTA agent should not update non-OTA jobs as failed

The OTA agent currently updates the non-OTA jobs as failed:

/* If job parsing failed AND there's a job ID, update the job state to FAILED with
* a reason code. Without a job ID, we can't update the status in the job service. */
LogError( ( "Failed to parse the job document after parsing the job name: "
"OtaJobParseErr_t=%s, Job name=%s",
OTA_JobParse_strerror( err ), ( const char * ) pFileContext->pJobName ) );
if( strlen( ( const char * ) otaAgent.pActiveJobName ) > 0u )
{
/* Assume control of the job name from the context. */
( void ) memcpy( otaAgent.pActiveJobName, pFileContext->pJobName, OTA_JOB_ID_MAX_SIZE );
otaErr = otaControlInterface.updateJobStatus( &otaAgent,
JobStatusFailedWithVal,
( int32_t ) OtaErrJobParserError,
( int32_t ) err );

This interferes with the usage of non-OTA jobs and is IMO not desired. Is there a reason for this logic?

[BUG] Memory corruption if OTA_MAX_BLOCK_BITMAP_SIZE is too small for the file

Describe the bug
As per issue on re:Post: https://repost.aws/questions/QU4tQPeyESRUqKmMXC__TcYw/large-mqtt-ota-file-transfer-fails-at-block-1024

The customer appeared to suffer a memory corruption when OTA_MAX_BLOCK_BITMAP_SIZE was too small for their file size. Problem was resolved when OTA_MAX_BLOCK_BITMAP_SIZE was set to sufficient size.

Host
ESP32 with FreeRTOS.

To Reproduce
Do an OTA with OTA_MAX_BLOCK_BITMAP_SIZE set too small for your file size.

Expected behavior
OTA library should check the file size before beginning the file transfer, and fail gracefully if the file is too big. No memory corruption should occur.

Additional context
I haven't personally taken the time to reproduce the problem. From perusing the code, I couldn't see the library protecting itself in this situation. Discussed the problem with Soren before raising this issue.

Is it possible to use OTA library to download file via custom job?

It's understood that the library is built for firmware update, nevertheless, is it possible to use it to perform file download via coreHTTP? As otherwise we would have to implement the HTTP download mechanism again just for that, though most of the coreHTTP parts are already implemented in the OTA library... so we are hoping to make use of it if possible.

Protocol implementation details are insufficiently abstracted over

The library expects that users provide some generic MQTT subscribe and unsubscribe functions, but doesn't really tell what to do with packets recieved in response to these subscriptions, which is a bit unfortunate as the application has to pass the data received using different events (OtaAgentEventReceivedJobDocument or OtaAgentEventReceivedFileBlock). So one has to rifle through the source of the library to find out which topics it may subscribe to and then dispatch in the application code. And this is quite error prone as you have to very specific about which data you pass to the library, if you for example pass everything that matches "$aws/things/+/jobs/#" as an OtaAgentEventReceivedJobDocument event, you invariably run into

1454 39489 [Cloud] [DEBUG][Cloud][39489] Packet on $aws/things/MwAcABNQUk5VODkg/jobs/AFR_OTA-fw-dev-20210526-124903/update/accepted{"timestamp":1622034320}
...
1474 39658 [OTA] [DEBUG][OTA][39658] Found valid event handler for state transition: State=[WaitingForFileBlock], Event=[ReceivedJobDocument]

which then aborts the current transfer, re-requests the next job and starts the transfer again, which then gets about as far as the attempt before. Also, the AppCallback is never called with OtaJobEventProcessed for these messages, so the buffers used for these messages are never freed (which can be seen as a saving grace in this case, as it prevents the code from downloading the first few blocks of the update in an endless loop until the included data volume on the SIM card is exhausted).

Race condition and undefined behaviour in `OTA_Shutdown`

OTA_Shutdown contains this check at the top of the function:

    if( otaAgent.state == OtaAgentStateInit )
    {
        /* When in init state, the OTA state machine is not running yet. So directly set state to
         * stopped. */
        otaAgent.state = OtaAgentStateStopped;
    }

Since OTA_Shutdown is part of the public API, it may be called from any task, so there is no guarantee at all that the OTA_EventProcessingTask doesn't start executing somwhere else right after this check, setting otaAgent.state to otaAgentStateReady.

Aditionally, since otaAgent.state is defined as a plain, non-atomic enum value, there is no guarantee at all that any changes to otaAgent.state made by the task executing OTA_EventProcessingTask are ever seen by code calling OTA_Shutdown from another task. A sufficiently smart compiler that understood FreeRTOS tasks and saw e.g. a device state management task that only ever called OTA_Init, OTA_Suspend, OTA_Resume and OTA_Shutdown would be perfectly free to conclude that the only values otaAgent.state could ever have are OtaAgentStateStopped and OtaAgentStateInit and that the logic to actually send the shutdown event is dead code and inline an optimized version (which in that scenario without log messages would just be otaAgent.state = OtaAgentStateStopped; return otaAgent.state;) for that task.

What's the maximum OTA file block size?

We are trying to increase the OTA file block size to 64KB (otaconfigLOG2_FILE_BLOCK_SIZE = 16U) but found that it overflows the size of int16_t when returning the size in function otaPal_WriteBlock

Function Prototype

int16_t otaPal_WriteBlock( OtaFileContext_t * const C,
                           uint32_t ulOffset,
                           uint8_t * const pcData,
                           uint32_t ulBlockSize )

Log captured: (added additional debug info)
Scenario: received 48273 byte, but ended up become -17263 when it's casted to ( int16_t )
Running in ubuntu, using ota_pal_posix.c

4183945521:[t:134563][DEBUG] [vc] [ota_os_posix.c:173] OTA Event received.
4183955481:[t:134563][DEBUG] [vc] [ota.c:2898] Found valid event handler for state transition: State=[WaitingForFileBlock], Event=[ReceivedFileBlock]
4183975771:[t:134563][DEBUG] [vc] [ota_os_posix.c:312] OTA Timer started.
4184170219:[t:134563][INFO] [vc] [ota.c:2477] Received valid file block: Block index=0, Size=48273
4184253146:[t:134563][DEBUG] [vc] [ota_pal_posix.c:547] otaPal_WriteBlock: entering...C=0x5575389334a8
4184437339:[t:134563][DEBUG] [vc] [ota_pal_posix.c:573] writeSize=48273
4184460287:[t:134563][INFO] [vc] [ota_pal_posix.c:587] : otaPal_WriteBlock: leaving...[filerc=48273], ( int16_t ) filerc=-17263
4184471476:[t:134563][ERROR] [vc] [ota.c:2537] Failed to ingest received block: IngestResult_t=-9, iBytesWritten=-17263

Do you have any suggestion about this? Thanks.

[BUG] Submodule tinycbor not synced with v0.5.4 tag.

Describe the bug
tinycbor commit:
https://github.com/intel/tinycbor/tree/9924cfed3b95ad6de299ae675064430fdb886216

referenced in
https://github.com/aws/ota-for-aws-iot-embedded-sdk/tree/v3.4.0/source/dependency/3rdparty

is not pointing to tinycbor v0.5.4 tag as described in:

https://github.com/aws/ota-for-aws-iot-embedded-sdk/blob/v3.4.0/sbom.spdx
PackageName: tinycbor
SPDXID: SPDXRef-Package-tinycbor
PackageVersion: v0.5.4
PackageDownloadLocation: https://github.com/intel/tinycbor.git
PackageLicenseConcluded: MIT
FilesAnalyzed: True
PackageVerificationCode: da39a3ee5e6b4b0d3255bfef95601890afd80709
PackageCopyrightText: NOASSERTION
PackageSummary: NOASSERTION
PackageDescription: NOASSERTION

v0.5.4 tag is
intel/tinycbor@11590e4

Expected behavior
Submodule synced with official tinycbor tag (v0.5.4).

Screenshots
image

event.deinit is never called on shutdown

The event.deinit callback is not called on shutdown, and event.init will be called again if you call OTA_Init after the agent has shutdown.

The two examples in os/portable happen to use methods (xQueueCreateStatic and mq_open) where this does not create issues, but in general it may lead to resource leaks, for example if someone happens to compile FreeRTOS without
#define configSUPPORT_STATIC_ALLOCATION 1
and then replaces that call with xQueueCreate.

[DOC] Clarify use of handleCustomJob Function

Describe the issue
I am attempting to get an older ESP-IDF V4.3. application up to date. I have pulled the demos. The basic pub/sub examples work nicely. But I am struggling with the OTA.

The V4.3. code used custom jobs in AWS, where we pulled the ota_url and fed it to the http ota updater. We did not use code signing.

I am currently struggling to get the OTA to work here. The example provided is OTA over MQTT. I like this concept as it offers significant saving in terms of memory. So, I would like to use this. I've already set up the IAMs Role and IOT policy for the certificate so the Thing is ready for MQTT jobs and streams.

Would it be possible to provide some guidance on the flow with examples as to how we can get this to work with an unsigned binary. The AWS Console insists on code signing (as far as I can tell) so it's likely we'll need to use the SDK (or CLI as a quick place to test / prove).

Appreciate that this could go into the FreeRTOS forum, but I think it would also be a great addition to the documentation on this repo.

Remove dynamic memory allocation from ota lib

This is just a not high priority feature enhancement request;

It would be very nice in environments where dynamic memory allocation is forbidden (coming from #412 (comment) )

There seem to be not much places that would need to change, one is in ota_os_freertos.c for the timer creation, and then there are only thre os.mem.malloc's in ota.c hat also look to have only limtied used and seem like they could be done static as well without wasting too much RAM - this could also be just a transparent define/compile option for the ota_os_freertos port.

Error printing size_t value in log function.

I am building a library for ESP32S2.
I get the following errors during build:
ota.c:1615:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
As I understood, size_t for ESP32S2 is unsigned int size. Whereas the log function uses unsigned long int.

A good solution would be to use casting for log functions, as is done in the coreMQTT library.
https://github.com/FreeRTOS/coreMQTT/blob/main/source/core_mqtt.c

An example of a log function from coreMQTT:

LogDebug (("BytesSent =% ld, BytesRemaining =% lu",
             (long int) bytesSent,
             (unsigned long) bytesRemaining));

[BUG] OTA_SetImageState() fails but still activates the image

Describe the bug
In OTA_SetImageState(), the device is first activated and then the job status is update. If the job update fails, the image is still active, the device will reboot with the new image, but the job will return FAILED.

To Reproduce

  • Add delay in self test
  • Disconnect from internet whilst in the self test

Expected behavior
The device reverts to the previous version or retries to update the job state.

jobDoc.pJobId is memset to zero before dispatching to OtaAppCallback

Probably can memset the jobId after calling OtaAppCallback? As user application unable to retrieve the jobId.

static OtaErr_t processDataHandler( const OtaEventData_t * pEventData )
{
    // ...

    /* Set the job id and length from OTA context. */
    jobDoc.pJobId = otaAgent.pActiveJobName;
    jobDoc.jobIdLength = strlen( ( const char * ) otaAgent.pActiveJobName ) + 1U;
    jobDoc.fileTypeId = otaAgent.fileContext.fileType;

    if( result == IngestResultFileComplete )
    {
        /* Check if this is firmware update. */
        if( otaAgent.fileContext.fileType == configOTA_FIRMWARE_UPDATE_FILE_TYPE_ID )
        {
            jobDoc.status = JobStatusInProgress;
            jobDoc.reason = JobReasonSigCheckPassed;

            otaJobEvent = OtaJobEventActivate;
        }
        else
        {
            jobDoc.status = JobStatusSucceeded;
            jobDoc.reason = JobReasonAccepted;
            jobDoc.subReason = ( int32_t ) otaAgent.fileContext.fileType;

            otaJobEvent = OtaJobEventUpdateComplete;
        }

        /* File receive is complete and authenticated. Update the job status. */
        err = otaControlInterface.updateJobStatus( &otaAgent, jobDoc.status, jobDoc.reason, jobDoc.subReason );

        // otaAgent.pActiveJobName set to 0 before dispatch to OtaAppCallback
        dataHandlerCleanup();

        if( otaAgent.statistics.otaPacketsProcessed < UINT32_MAX )
        {
            /* Last file block processed, increment the statistics. */
            otaAgent.statistics.otaPacketsProcessed++;
        }

        /* Let main application know that update is complete */
        otaAgent.OtaAppCallback( otaJobEvent, &jobDoc );

[BUG] OTA errors on startup if there are no outstanding OTA jobs

Describe the bug
On startup, when the OTA agent receives an empty OTA job message indicating that there are no OTA jobs outstanding, a couple of error messages are logged, hinting at a bad OTA agent state.

E (7716) AWS_OTA: Failed to execute state transition handler: Handler returned error: OtaErr_t=OtaErrJobParserError
E (7746) AWS_OTA: Current State=[WaitingForJob], Event=[ReceivedJobDocument], New state=[CreatingFile]

Host
ESP32 with FreeRTOS/ESP-IDF

Additional context
In source/ota.c, the function handleCustomJob() correctly detects that the job document is empty, but the structure of the code makes it difficult to treat this as an expected and valid condition.

Build error in C ++ projects

I am building a library for ESP32S2. The main project for ESP32S2 is written in C ++.
While compiling the library, I get the error:
ota_os_interface.h: 272: 22: error: expected unqualified-id before 'delete'
As I understand it, this is due to the fact that in C ++, delete is a keyword. Therefore, it cannot be used as a variable name.

OtaDeleteTimer_t delete; /*!< @brief Delete timer. */

After changing the name, the assembly of the library was successful and the library does not cause problems.

[BUG] An MQTT OTA request made by the library to AWS using CBOR encoding results in the Invalid response on the /rejected topic

Describe the bug
An MQTT OTA request made by the library to AWS usinf CBOR encoding results in the Invalid response on the /rejected topic

Host

  • Host OS: esp32
  • Version: latest library code as is on github

To Reproduce

  • Initiate OTA update
  • Run on esp32 and observe all CBOR requests failing

Expected behavior
I was expecting to start receiving OTA file chunks on MQTT topics.

Additional context
Using the latest tinycbor version 0.6.0 from intel.

Initial request is made successfully

00:00:00:05.252 T: prvMqttPublish()
{"clientToken":":840d8ee419b8"}
7b 22 63 6c 69 65 6e 74 54 6f 6b 65 6e 22 3a 22 3a 38 34 30 64 38 65 65 34 31 39 62 38 22 7d 

AWS IOT responds with the job document as expected and a few ota methods are invoked as expected:

00:00:00:05.345 V: subscribeCB: Data
	topic: $aws/things/840d8ee419b8/jobs/$next/get/accepted
	payload: {"clientToken":":840d8ee419b8","timestamp":1685747465,"execution":{"jobId":"AFR_OTA-OTA_TEST-230602-1828","status":"QUEUED","queuedAt":1685747355,"lastUpdatedAt":1685747355,"versionNumber":1,"executionNumber":1,"jobDocument":{"afr_ota":{"protocols":["MQTT"],"streamname":"AFR_OTA-93e1bb19-0dcd-4aeb-9946-5df0260dc063","files":[{"filepath":"/ota/file","filesize":127762,"fileid":0,"certfile":"/pem","fileType":0,"sig-sha1-rsa":"na"}]}}}}
00:00:00:05.397 V: subscribeCB: handling JOB_ACCEPTED_RESPONSE_TOPIC
00:00:00:05.407 T: otaPal_GetPlatformImageState()
00:00:00:05.409 T: otaPal_CreateFileForRx()
00:00:01:26.993 T: otaPal_GetPlatformImageState()
00:00:01:27.000 V: prvMqttSubscribe: subscribed to topic $aws/things/840d8ee419b8/streams/AFR_OTA-93e1bb19-0dcd-4aeb-9946-5df0260dc063/data/cbor

then a request for the first chunk is published:

00:00:01:27.005 T: prvMqttPublish()
ยฆaccrdyaf al๏ฟฝ๏ฟฝ ao abDรฟรฟรฟรฟan๏ฟฝ
a6 61 63 63 72 64 79 61 66 00 61 6c 19 10 00 61 6f 00 61 62 44 ff ff ff ff 61 6e 01 
00:00:01:27.028 V: publishMessage: topic   : $aws/things/840d8ee419b8/streams/AFR_OTA-93e1bb19-0dcd-4aeb-9946-5df0260dc063/get/cbor

this message decoded is:
{"c": "rdy", "f": 0, "l": 4096, "o": 0, "b": h'FFFFFFFF', "n": 1}
according to this: https://cbor.me/

A6             # map(6)
   61          # text(1)
      63       # "c"
   63          # text(3)
      726479   # "rdy"
   61          # text(1)
      66       # "f"
   00          # unsigned(0)
   61          # text(1)
      6C       # "l"
   19 1000     # unsigned(4096)
   61          # text(1)
      6F       # "o"
   00          # unsigned(0)
   61          # text(1)
      62       # "b"
   44          # bytes(4)
      FFFFFFFF # "\xFF\xFF\xFF\xFF"
   61          # text(1)
      6E       # "n"
   01          # unsigned(1)

the response is:

00:00:01:27.149 V: subscribeCB: Message arrived
00:00:01:27.150 V: subscribeCB: Data
	topic: $aws/things/840d8ee419b8/streams/AFR_OTA-93e1bb19-0dcd-4aeb-9946-5df0260dc063/rejected/cbor
	payload: ยฟdcodekInvalidCboraokInvalidCborgmessagetInvalid CBOR messageamtInvalid CBOR messageรฟ

or 

{"code": "InvalidCbor", "o": "InvalidCbor", "message": "Invalid CBOR message", "m": "Invalid CBOR message"}

BF                                      # map(*)
   64                                   # text(4)
      636F6465                          # "code"
   6B                                   # text(11)
      496E76616C696443626F72            # "InvalidCbor"
   61                                   # text(1)
      6F                                # "o"
   6B                                   # text(11)
      496E76616C696443626F72            # "InvalidCbor"
   67                                   # text(7)
      6D657373616765                    # "message"
   74                                   # text(20)
      496E76616C69642043424F52206D657373616765 # "Invalid CBOR message"
   61                                   # text(1)
      6D                                # "m"
   74                                   # text(20)
      496E76616C69642043424F52206D657373616765 # "Invalid CBOR message"
   FF                                   # primitive(*)

This is consistent with the failure I reported for the MQTT download agent library here:
aws-samples/aws-iot-mqtt-download-agent#3

Can someone please look into this? The CBOR message seems to produce the bitmap that is rejected by the backend!

[BUG] Invalid topic used to publish request for outstanding OTA jobs

Describe the bug
This looks like a regression in commit e43672a, the thing name is not added to the topic parts in function requestJob_Mqtt() in source/ota_mqtt.c.

Host
ESP32 with FreeRTOS/ESP-IDF

To Reproduce
Initialize the OTA agent, causing it to request outstanding OTA jobs, which will fail with AWS dropping the connection.

[BUG] OTA Block size check wrong

If a network disconnect occurs and the ota task is suspended and resumed, when the job description comes back, the library checks the pFileContext->blocksRemaining against OTA_MAX_BLOCK_BITMAP_SIZE (128) and silently stops the ota, without updating the job. The same issue doesn't happen if the device reboots instead of resuming the ota task. The issue also never manifests if the ota job is not suspended in the middle of job, so the number of blocks is not checked unless we get a new job document while already having one?
The issue is 2 fold, the check is wrong, it should be(ota.c:2404):

    else if( pFileContext->blocksRemaining > (OTA_MAX_BLOCK_BITMAP_SIZE * BITS_PER_BYTE) )

Applying the change above fixes the issue. I can create a PR with the change if you can confirm that my understanding of the issue is correct.

The other is that the ota library fails silently and the job will not resume until the device reboots.

This is the log after a network reconnection:

I (105814) hota: Received job document: {...}
W (105074) awsota: Index: 3. OTA event id: 3
W (105080) awsota: OTA size (266 blocks) greater than can be tracked. Increase `OTA_MAX_BLOCK_BITMAP_SIZE`(128)
I (105090) awsota: Unable to initialize Job Parsing: OtaJobParseErr_t=OtaJobParseErrBadModelInitParams
I (105100) awsota: otaPal_GetPlatformImageState
E (105106) awsota: Failed to execute state transition handler: Handler returned error: OtaErr_t=OtaErrJobParserError
E (105117) awsota: Current State=[WaitingForJob], Event=[ReceivedJobDocument], New state=[CreatingFile]

This is all on the main branches of ota-for-aws-iot-embedded-sdk, coreMQTT, jobs etc.

Enhancement: the pOtaInterface pointer should be a pointer to const

OtaInterfaces_t * pOtaInterface; /*!< Collection of all interfaces used by the agent. */

OtaErr_t OTA_Init( OtaAppBuffer_t * pOtaBuffer,
OtaInterfaces_t * pOtaInterfaces,
const uint8_t * pThingName,
OtaAppCallback_t OtaAppCallback )

The OtaInterfaces_t * pOtaInterfaces pointer is used to supply the library with user callbacks implementing the platform abstractions. The contents of the OtaInterfaces_t structure are never modified within the library. It is natural in the user code to have a static constant instance of that structure and pass it to the library. However, the library requires a pointer to a non-constant instance. The application has to cast away const to pass a constant instance to the library.

[BUG] Issues when trying to mock these functions using googlemock.

Describe the bug
When trying to build mocks for the SDK's API using GoogleTest - i.e. integration into C++ framework, there are several issues:

  1. Headers do not have Extern C defined to ensure the names aren't mangled.
#if defined(__cplusplus)
extern "C" {
#endif // defined(__cplusplus)
...
#if defined(__cplusplus)
}
#endif // defined(__cplusplus)
  1. Use of delete in ota_os_interface.h - delete is a C++ reserved token - please rename.

Host

  • Host OS: Linux
  • Version: Ubuntu 18.04

To Reproduce

  • Include any of the headers into a C++ file.
  • Try and compiled file. Watch the errors fly.

Expected behavior

  • ability to use the API in a C++ environment as well. This is primarily for test.

Screenshots

 included from ../esp_aws_iot-src/libraries/ota-for-aws-iot-embedded-sdk/ota-for-aws-iot-embedded-sdk/source/include/ota.h:38:
../esp_aws_iot-src/libraries/ota-for-aws-iot-embedded-sdk/ota-for-aws-iot-embedded-sdk/source/include/ota_os_interface.h:272:22: error: expected member name or ';' after declaration specifiers
    OtaDeleteTimer_t delete; /*!< @brief Delete timer. */

Wireshark logs
N/A

Additional context
None

OS interface: Inconsistency in the receive timeout parameter

There seems to be an inconsistency in the OtaReceiveEvent_t documentation and its usage. Although the signature defines a timeout parameter, this seems to be called with 0 timeout all the time by OTA_EventProcessingTask, effectively spamming the thread and not letting it block. This was implemented with mq_receive() without using the timeout parameter in the demo, which blocks indefinitely. Please clarify the interface with appropriate documentation.

OTA_Shutdown does not clear timers (then leading to null pointer dereference)

In contrast to the docs for OTA_Shutdown stating "and clear all resources." that does not seem to care for any running timers.

This has been observed and reproduced when the MQTT operation for requesting a new job fails after OTA agent startup (e.g. due to disconnection or a shutdown of the connection meanwhile) which makes the OTA lib start the OtaRequestTimer.
And if then OTA_Shutdown is called, that leaves the timer running beyond OTA agent having shut down.

This is especially bad as OTA_Shutdown clears the whole otaAgentstruct, including the pOtaInterfacepointer.
So when the timer fires, it goes via otaTimerCallback into OTA_SignalEvent which then does a otaAgent.pOtaInterface->os.event.send( NULL, pEventMsg, 0 );
(Note: The potential for OTA_SignalEvent has already been pointed out here OTA_SignalEvent has undefined behaviour )

Besides this bad crash (kind of sad and funny: the os.event.send function is the second function in the whole struct, thus indexes to a 0x4 from pOtaInterface, so if pOtaInterface is NULL, this actually calls the after-reset-entry-point on our Kinetis ARM M4s, so this didn't even look like the usual rare HardFault but disguised as a very strange rare unexpected reset :( ), this will memory leak created timers.

Note, OTA_Suspend does a otaAgent.pOtaInterface->os.timer.stop( OtaRequestTimer );, but not OTA_Shutdown. However glancing over the code, it seems this scenario would also be true for the second OtaSelfTestTimer which is not even cleared by OTA_Suspend, if this is revisited, the whole timer logic needs to be checked.

(Further, I'd appreciate if ota_os_freertos.c would make use of static timers not only to avoid dynamic memory leaks on another level, but also because this is one of the last things that would allow the mqtt version to run without heap).

(This has been observed and confirmed on a 3.1.0 from July - I see though nothing in latest develop til now that touched this).

AppCallback API usage in all examples is not const correct

The second parameter of OtaAppCallback_t is defined as const void * pData , but then the only documented uses of that parameter are to use it for memory management purposes and cast the constness away:

https://github.com/FreeRTOS/FreeRTOS/blob/c134a581153c1b137af2af2e93564c495977872f/FreeRTOS-Plus/Demo/AWS/Ota_Windows_Simulator/Ota_Over_Mqtt_Demo/DemoTasks/OtaOverMqttDemoExample.c#L836

https://github.com/aws/aws-iot-device-sdk-embedded-C/blob/5eec3f03f420b904e1c041e5bda3eeebb6f5744b/demos/ota/ota_demo_core_mqtt/ota_demo_core_mqtt.c#L664

Also, if some messages happen to be in the queue the moment the agent decides to go into shutdown, the callback is never called for these packages and the buffers leak.

Meet invalid path ':' on git-checkout on Windows

Hi,

I meet invalid path ':' when I run git to clone and checkout this repository on Windows (git version 2.33.0.windows.2):

$ git clone https://github.com/aws/ota-for-aws-iot-embedded-sdk
Cloning into 'ota-for-aws-iot-embedded-sdk'...
remote: Enumerating objects: 2489, done.
remote: Counting objects: 100% (878/878), done.
remote: Compressing objects: 100% (403/403), done.
Reremote: Total 2489 (delta 672), reused 571 (delta 474), pack-reused 1611
Receiving objects: 100% (2489/2489), 1.62 MiB | 3.03 MiB/s, done.
Resolving deltas: 100% (1653/1653), done.
error: invalid path ':'
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

It seems that there are file names with ':' characters added into main branch, which are invalid for WIndows NTFS. Would you rename such files so that this repository gets available for Windows users?

[Enhancement] Unnecessary const char data. Bloated memory.

Is your feature request related to a problem? Please describe.
ota_mqtt.c#L111 is a waste of space. You can go back and forth from numeric to ascii with the relation

value_numeric = value_ascii - '0'`

Describe the solution you'd like
Less memory usage.

Describe alternatives you've considered
n/a

Additional context
n/a

[Feature Request] Misleading default config for otaconfigMAX_THINGNAME_LEN

Is your feature request related to a problem? Please describe.
AWS thingNames can be up to 128 length as documented here. Yet, by default the library makes itself incompatible with all things of nameLength > 64.

#ifndef otaconfigMAX_THINGNAME_LEN
    #define otaconfigMAX_THINGNAME_LEN    64U
#endif

Describe the solution you'd like
Correspondence with the AWS service.

Describe alternatives you've considered
User can override in their ota_config.h, but this is an AWS provided library so it should alleviate user the need to research other AWS docs, to correctly override this library's default config.

Additional context
n/a

ESP32 internet disconnect

Hi, the framework given by AWS for ESP32-DevkitC based OTA MQTT is crashing when internet disconnects. During crash, 6 out of 5 retry for TLS attempts is seen in serial log. Please suggest how to overcome this. If any other info required please let me know.

Feature request: Store download state in persistent memory

Downloads usually take a long time and data costs are high with low-bandwidth networks, such as NB-IoT. It would be nice to have a possibility to save the current download state to be able to continue later without needing to repeat the already downloaded blocks, e.g. in case of a reset/power loss. Is such a feature planned?

File ID plays undocumented role in versioning

if( ( otaAgent.serverFileID == 0U ) && ( otaAgent.fileContext.fileType == configOTA_FIRMWARE_UPDATE_FILE_TYPE_ID ) )

The version check is disabled whenever the "fileid" (otaAgent.serverFileID in the above code snippet) is not zero. It is not clear why this check is performed, the documentation for creating OTA jobs describes this as "An arbitrary integer between 0โ€“255 that identifies your firmware image" - not to be confused with "fileType" (otaAgent.fileContext.fileType in the snippet) which is "An integer value you can include in the job document to allow your devices to identify the type of file received from the cloud"

This behavior caught me by surprise as I expected all OTA jobs would incorporate the version check unless explicitly overridden by the application. As it seems redundant to me, I would like it if the check for otaAgent.serverFileID == 0U was entirely removed. Alternately it would also help if this functionality was documented somewhere.

[BUG] Issue with longer thing names

Apart from the thing name length issue described in this other issue: [Feature Request] Misleading default config for otaconfigMAX_THINGNAME_LEN, there is another limit to thing name length in this code.

In ota-for-aws-iot-embedded-sdk/source/ota_mqtt.c, the thing name is added into string pPayloadParts[ 3 ], which is part of a message published to AWS topic $aws/things/[thing_name]/jobs/notify-next that ends up something like {"clientToken":"[decimal_value_of_reqCounter_in_a_string]:[thing_name]"}. Since the decimal value of reqCounter can be up to 10 characters, the thing name can only be 64 - 10 -1 = 53 chars before there is (potentially) a problem with this request, as AWS complains (and does not send back the requested data) if a client token longer than 64 characters is used.

If the intention is to place a unique value in clientToken (to tie the request and response together), then surely the value of reqCounter is sufficient, without also including the thing name?

As a side note, the value of reqCounter is turned into a decimal string using function stringBuilderUInt32Decimal(), which returns an empty string when it is sent a value of 0, rather than "0", as might be expected.

That function may cause issues for other uses of the function, such as with versionMajorString, versionMinorString and versionBuildString.

Tested on an ESP32.

FreeRTOS timer interface: deleted timers started after shutdown lead to null pointer dereference

As in issue #412, timers that are started before the OTA agent runs the shutdownHandler (which zeros pOtaInterface) will enter OTA_SignalEvent from otaTimerCallback and then attempt to dereference otaAgent.pOtaInterface->os.event.send( NULL, pEventMsg, 0 );

This issue was marked as resolved following the merging of #413, however the issue still persists in the special case where timers are started after OTA_Shutdown is called.

I was able to reproduce this like so:

  • update to patch: #413
  • request and receive a bunch of file blocks
  • before all blocks are digested, call OTA_Shutdown
    • note that all timers are stopped and "deleted" here
  • on processing the last requested block, processDataHandler calls otaAgent.pOtaInterface->os.timer.start
    • os.timer.start calls OtaStartTimer_FreeRTOS which will re-create the timer that was deleted in OTA_Shutdown
  • the OtaAgentEventShutdown event gets processed and the ota agent is zeroed
  • timer callback expires and dereferences a null pointer.

This may be a problem with the way the FreeRTOS os interface is set up, but also hints that there is some ambiguity with how OTA timer interface APIs should be implemented.

MQTT OTA problem returning FAILED to OTA JOB after starting with new FW.

I tried running the MQTT OTA sample using the Cellular interface Library.
When I executed the job, block transfer started using MQTT communication.
After receiving all the blocks, the device restarted, and I confirmed that the app version had been updated to a new one. However, for the OTA job, it published status: FAILED, and it also showed as failed on the AWS console.

I suspect that the cause of this issue is that the values of aflag and pflag read from _esp_get_otadata_partition are both 0xffffffff.
However, I am just running the sample as is.
Do I need any additional implementation?

I (311627) [MQTT OTA]: Sent PUBLISH message: {"status":"IN_PROGRESS","statusDetails":{"self_test":"ready","updatedBy":"0x00010002"}} 
I (311657) [MQTT OTA]: Received OtaJobEventActivate callback from OTA Agent.
I (311667) esp_image: segment 0: paddr=00110020 vaddr=3f400020 size=27388h (160648) map
I (311717) esp_image: segment 1: paddr=001373b0 vaddr=3ffb0000 size=021e4h (  8676) 
I (311717) esp_image: segment 2: paddr=0013959c vaddr=40080000 size=06a7ch ( 27260) 
I (311737) esp_image: segment 3: paddr=00140020 vaddr=400d0020 size=4ba44h (309828) map
I (311827) esp_image: segment 4: paddr=0018ba6c vaddr=40086a7c size=07840h ( 30784) 
I (311837) esp_image: segment 0: paddr=00110020 vaddr=3f400020 size=27388h (160648) map
I (311887) esp_image: segment 1: paddr=001373b0 vaddr=3ffb0000 size=021e4h (  8676) 
I (311887) esp_image: segment 2: paddr=0013959c vaddr=40080000 size=06a7ch ( 27260) 
I (311897) esp_image: segment 3: paddr=00140020 vaddr=400d0020 size=4ba44h (309828) map
I (311987) esp_image: segment 4: paddr=0018ba6c vaddr=40086a7c size=07840h ( 30784) 
.
.
.
I (312187) [MQTT OTA]:  Received: 132   Queued: 132   Processed: 132   Dropped: 0
.
.
.
I (312447) [MQTT Sub Manager]: Invoking subscription callback of matching topic filter: TopicFilter=$aws/things/+/jobs/#, TopicName=$aws/things/test-things/jobs/AFR_OTA-ota_test/update/accepted
.
.
.
rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:7176
load:0x40078000,len:15500
ho 0 tail 12 room 4
load:0x40080400,len:4072
0x40080400: _init at ??:?

entry 0x40080670
I (29) boot: ESP-IDF v5.0.1-dirty 2nd stage bootloader
I (29) boot: compile time 17:05:26
I (29) boot: chip revision: v3.0
I (31) boot.esp32: SPI Speed      : 40MHz
I (36) boot.esp32: SPI Mode       : DIO
I (40) boot.esp32: SPI Flash Size : 4MB
I (45) boot: Enabling RNG early entropy source...
I (50) boot: Partition Table:
I (54) boot: ## Label            Usage          Type ST Offset   Length
I (61) boot:  0 nvs              WiFi data        01 02 00009000 00004000
I (69) boot:  1 otadata          OTA data         01 00 0000d000 00002000
I (76) boot:  2 phy_init         RF data          01 01 0000f000 00001000
I (84) boot:  3 factory          factory app      00 00 00010000 00100000
I (91) boot:  4 ota_0            OTA app          00 10 00110000 00100000
I (99) boot:  5 ota_1            OTA app          00 11 00210000 00100000
I (106) boot:  6 storage          Unknown data     01 82 00310000 000f0000
I (114) boot: End of partition table
I (118) esp_image: segment 0: paddr=00110020 vaddr=3f400020 size=27388h (160648) map
I (185) esp_image: segment 1: paddr=001373b0 vaddr=3ffb0000 size=021e4h (  8676) load
I (188) esp_image: segment 2: paddr=0013959c vaddr=40080000 size=06a7ch ( 27260) load
I (202) esp_image: segment 3: paddr=00140020 vaddr=400d0020 size=4ba44h (309828) map
I (314) esp_image: segment 4: paddr=0018ba6c vaddr=40086a7c size=07840h ( 30784) load
I (334) boot: Loaded app from partition at offset 0x110000
I (334) boot: Disabling RNG early entropy source...
I (346) cpu_start: Pro cpu up.
I (346) cpu_start: Starting app cpu, entry point is 0x400813f4
0x400813f4: call_start_cpu1 at /Users/keigo.imaizumi/esp/esp-idf/components/esp_system/port/cpu_start.c:142

I (332) cpu_start: App cpu up.
I (362) cpu_start: Pro cpu start user code
I (362) cpu_start: cpu freq: 160000000 Hz
I (362) cpu_start: Application information:
I (367) cpu_start: Project name:     iot-device-controller-fw-v3
I (374) cpu_start: App version:      b05b2c9-dirty
I (379) cpu_start: Compile time:     Apr  9 2023 17:05:16
I (385) cpu_start: ELF file SHA256:  848a185055b0e343...
Warning: checksum mismatch between flashed and built applications. Checksum of built application is bb55c49194b79103681f8c1953a217b1edb67426cf56b398453cbefe101398b4
I (391) cpu_start: ESP-IDF:          v5.0.1-dirty
I (397) cpu_start: Min chip rev:     v0.0
I (401) cpu_start: Max chip rev:     v3.99 
I (406) cpu_start: Chip rev:         v3.0
I (411) heap_init: Initializing. RAM available for dynamic allocation:
I (418) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (424) heap_init: At 3FFB7A28 len 000285D8 (161 KiB): DRAM
I (430) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (437) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (443) heap_init: At 4008E2BC len 00011D44 (71 KiB): IRAM
I (451) spi_flash: detected chip: generic
I (454) spi_flash: flash io: dio
W (458) spi_flash: Detected size(16384k) larger than the size in the binary image header(4096k). Using the size in the binary image header.
I (472) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (482) [Initialize Task]: ESP-IDF Version:v5.0.1-dirty
.
.
.
I (23882) [MQTT OTA]: Establishing a TLS session to hogehoge-ats.iot.ap-northeast-1.amazonaws.com:8883.
W (23892) [MbedtlsTransport]: TLS_FreeRTOS_Connect
I (23902) [TcpSocketWrapper]: Created CELLULAR Socket 0x3ffc4478.
I (23902) [TcpSocketWrapper]: Ip address hogehoge-ats.iot.ap-northeast-1.amazonaws.com port 8883
.
.
.
I (31082) [MbedtlsTransport]: (Network connection 0x3ffb40a4) TLS handshake successful!!!!
I (31092) [MbedtlsTransport]: (Network connection 0x3ffb40a4) Connection to hogehoge-ats.iot.ap-northeast-1.amazonaws.com established.
I (31102) [MQTT OTA]: Creating an MQTT connection to hogehoge-ats.iot.ap-northeast-1.amazonaws.com.
.
.
.
I (32992) [MQTT OTA]: Success creating MQTT connection to hogehoge-ats.iot.ap-northeast-1.amazonaws.com.
I (33002) ota_pal: otaPal_GetPlatformImageState
I (33002) esp_ota_ops: aws_esp_ota_get_boot_flags: 1
I (33072) esp_ota_ops: [0] aflags/seq:0xffffffff/0x1, pflags/seq:0xffffffff/0x0
I (33082) AWS_OTA: Current State=[RequestingJob], Event=[Start], New state=[RequestingJob]
I (39272) [MQTT OTA]: SUBSCRIBE topic $aws/things/test-things/jobs/notify-next to broker.
.
.
I (39282) [MQTT Sub Manager]: Added callback to registry: TopicFilter=$aws/things/+/jobs/#
I (39302) AWS_OTA: Subscribed to MQTT topic: $aws/things/test-things/jobs/notify-next
I (40912) [MQTT OTA]: Sent PUBLISH packet to broker $aws/things/test-things/jobs/$next/get to broker.
I (40922) [MQTT OTA]: Sent PUBLISH message: {"clientToken":":test-things"} 
I (41192) [MQTT OTA]: PUBACK received for packet id 2.
I (41192) [MQTT OTA]:  Received: 0   Queued: 0   Processed: 0   Dropped: 0

I (41502) [MQTT Sub Manager]: Invoking subscription callback of matching topic filter: TopicFilter=$aws/things/+/jobs/#, TopicName=$aws/things/test-things/jobs/$next/get/accepted
E (41512) [MQTT OTA]: prvMqttJobCallback
I (41522) [MQTT OTA]:  Received: 0   Queued: 0   Processed: 0   Dropped: 0
I (41522) AWS_OTA: Extracted parameter: [key: value]=[execution.jobId: AFR_OTA-ota_test]
I (41542) AWS_OTA: Extracted parameter: [key: value]=[execution.statusDetails.updatedBy: 65538]
I (41562) AWS_OTA: Extracted parameter: [key: value]=[execution.jobDocument.afr_ota.streamname: AFR_OTA-614f87e8-696d-42dd-a209-e702a3be8e50]
I (41572) AWS_OTA: Extracted parameter: [key: value]=[execution.jobDocument.afr_ota.protocols: ["MQTT"]]
I (41582) AWS_OTA: Extracted parameter: [key: value]=[filepath: /update]
I (41592) AWS_OTA: Extracted parameter: [key: value]=[filesize: 537296]
I (41602) AWS_OTA: Extracted parameter: [key: value]=[fileid: 0]
I (41602) AWS_OTA: Extracted parameter: [key: value]=[certfile: /OTA_Cert/auth.pem]
I (41612) AWS_OTA: Extracted parameter [ sig-sha256-ecdsa: MEUCIQCf7l1+6yfwEsSr+6sNJkSencLL... ]
I (41622) AWS_OTA: In self test mode.
I (41632) AWS_OTA: New image has a higher version number than the current image: New image version=4.1.2, Previous image version=0.1.2
I (41642) AWS_OTA: Image version is valid: Begin testing file: File ID=0
I (41652) ota_pal: otaPal_SetPlatformImageState, 1
W (41662) ota_pal: Set image as testing!
I (48152) [MQTT OTA]: Sent PUBLISH packet to broker $aws/things/test-things/jobs/AFR_OTA-ota_test/update to broker.
I (48162) [MQTT OTA]: Sent PUBLISH message: {"status":"IN_PROGRESS","statusDetails":{"self_test":"active","updatedBy":"0x04010002"}} 
.
.
I (48192) AWS_OTA: Job parsing success: OtaJobParseErr_t=OtaJobParseErrNone, Job name=AFR_OTA-ota_test
I (48202) [MQTT OTA]: Received OtaJobEventReceivedJob callback from OTA Agent.
I (48202) ota_pal: otaPal_GetPlatformImageState
I (48212) esp_ota_ops: aws_esp_ota_get_boot_flags: 1
I (48272) esp_ota_ops: [0] aflags/seq:0xffffffff/0x1, pflags/seq:0xffffffff/0x0
I (48272) [MQTT OTA]: Received OtaJobEventProcessed callback from OTA Agent.
I (48282) AWS_OTA: Current State=[CreatingFile], Event=[ReceivedJobDocument], New state=[CreatingFile]
I (48292) AWS_OTA: Beginning self-test.
I (48302) ota_pal: otaPal_GetPlatformImageState
I (48302) esp_ota_ops: aws_esp_ota_get_boot_flags: 1
I (48362) esp_ota_ops: [0] aflags/seq:0xffffffff/0x1, pflags/seq:0xffffffff/0x0
W (48372) AWS_OTA: Rejecting new image and rebooting:The job is in the self-test state while the platform is not.
I (48382) ota_pal: otaPal_SetPlatformImageState, 3
W (48382) ota_pal: Set image as invalid!
I (48392) esp_ota_ops: aws_esp_ota_get_boot_flags: 1
I (48432) esp_ota_ops: gen_0_seq:1, gen_1_seq:0
I (48442) esp_ota_ops: find_partition->address:d000
I (48442) esp_ota_ops: [0] aflags/seq:0xffffffff/0x1, pflags/seq:0xffffffff/0x0
W (48452) ota_pal: Image not in self test mode 4294967295
I (48462) esp_ota_ops: aws_esp_ota_get_boot_flags: 1
I (48522) esp_ota_ops: [0] aflags/seq:0xffffffff/0x1, pflags/seq:0xffffffff/0x0
E (48522) AWS_OTA: Job Status Other:1

I (49942) [MQTT Sub Manager]: Invoking subscription callback of matching topic filter: TopicFilter=$aws/things/+/jobs/#, TopicName=$aws/things/test-things/jobs/AFR_OTA-ota_test/update/accepted
E (49962) [MQTT OTA]: prvMqttJobCallback
W (49962) [MQTT OTA]: Received job message $aws/things/test-things/jobs/AFR_OTA-ota_test/update/accepted{"timestamp":1681131347}estamp":1681131340,"execution":{"jobId":"AFR_OTA-ota_test","status":"IN_PROGRESS","statusDetails":{"self_test":"ready","updatedBy":"0x00010002"},"queuedAt":1681130575,"startedAt":1681130676,"lastUpdatedAt":1681131298,"versionNumber":9,"executionNumber":1,"jobDocument":{"afr_ota":{"protocols":["MQTT"],"streamname":"AFR_OTA-123456-7890-1234-5678-e702a3be8e50","files":[{"filepath":"/update","filesize":537296,"fileid":0,"certfile":"/OTA_Cert/auth.pem","sig-sha256-ecdsa":"aaaaaaaaaaa+bbbbbbbbb+cccccccccccccccccccc/ddddddddddddddddd="}]}}}} size 24.
I (50022) [MQTT OTA]: Sent PUBLISH packet to broker $aws/things/test-things/jobs/AFR_OTA-ota_test/update to broker.
I (50032) [MQTT OTA]: Sent PUBLISH message: {"status":"FAILED","statusDetails":{"reason":"rejected: 0x00000011"}} 

[Feature Request] Use hardware accelerators for base64 operations

The OTA agent uses base64 decoding when downloading a file. This function is a pure software implementation. More often, microcontrollers have hardware accelerators that offer this functionality.

Possible Solution
The structure OtaInterfaces_t can be extended to set the interface for base64 operations. To support microcontrollers that do not have this possibility, the structure member could be set to NULL.
In function where the base64Decode() is called, the structure member is checked. If it is not NULL, use hardware accelerator API, else use the software implementation in the OTA SDK

Alternatives
The base64_decode function in the SDK could be configured as a WEAK function, offering the possibility to the application to implement a strong function (using hardware accelerators).

Change the default signature size to support RSA3072 or RSA4096

As of May 2021, Digicert has stopped issuing 2048 bit Code Signing Certificates and has moved to 3072 bit Certificates. Their notification page indicates this is an industry wide standards change so the AWS OTA library should follow the change.

The default setting of the OTA library is 256 bytes, which only supports RSA-2048. I pulled the latest main to verify that this is still the default setting. Unless it is a breaking issue somewhere else, should kOTA_MaxSignatureSize be defined for RSA4096 to remain flexible for future changes (who knows how long RSA3072 will remain the standard).

The change is simple...
In ota_private.h,
#define kOTA_MaxSignatureSize 512/* Max bytes supported for a file signature (4096 bit RSA is 512 bytes). */
or
#define kOTA_MaxSignatureSize 384/* Max bytes supported for a file signature (3072 bit RSA is 384 bytes). */

https://knowledge.digicert.com/alerts/code-signing-new-minimum-rsa-keysize.html

random Block ID

Hi there!
So when I was watching the OTA update logs I saw that the AWS sends blocks in random manner. The block IDs are not serial in fashion. It even misses out several numbers in the middle. So are the payloads sent randomly as the block IDs of the payloads are random, or there's some algorithm behind it.
Regards

OTA_SignalEvent has undefined behaviour

This line:

err = otaAgent.pOtaInterface->os.event.send( NULL, pEventMsg, 0 );

may dereference a null pointer, especially after an OTA_Shutdown(ticks, false). (Note that checking that pOtaInterface is valid before the call is not sufficient here, as the shutdownHandler might memset otaAgent to zero between the check and the call). Another possible race condition is in the called send method, as (in the FreeRTOS case) the queue itself might be deinitialized by the time execution reaches the configAssert( pxQueue ); in xQueueGenericSend, although this error path is currently masked by #217.

OTA in multiple device

OTA for a single device is working. Please suggest how can I program multiple devices with one OTA job. Single binary file should be programmed in selected number of devices through OTA. Is this possible using Thing group? Please suggest.

Thanks

Issues with integration into an application

I am trying to integrate the OTA agent to an application. While the demo works fairly well on its own, I believe that the integration into an application is not yet ideal. The missing features that occured to me:

  1. Not possible to defer the download after receiving an update job to a future time, e.g. using a StartDownload command.
  2. Not possible to get the result of the job document after a job document is received. Does an update job exist or is the job document empty? Can we move on with the application or is the download in progress? This is critical for applications which would like to suspend other operations while the download is in progress.
  3. How to figure out when an external MQTT_ProcessLoop is required to continue with the download? For systems with long socket receive timeouts, calling ProcessLoop (i.e. socket recv) repeatedly causes unnecessary delays and would be nice to avoid if the OTA agent does not expect a response at the moment.

BUG: Misleading documentation about OtaAppCallback_t

* The user may register a callback function when initializing the OTA Agent. This
* callback is used to notify the main application when the OTA update job is complete.
* Typically, it is used to reset the device after a successful update by calling
* @ref OTA_ActivateNewImage and may also be used to kick off user specified self tests
* during the Self Test phase. If the user does not supply a custom callback function,
* a default callback handler is used that automatically calls @ref OTA_ActivateNewImage
* after a successful update.

The documentation states that the application callback is optional, while it is not. The application must provide a valid function pointer to the OTA_Init function. The callback pointer is stored in the agent context without any check and called everywhere without checks. There is no default callback.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.