Git Product home page Git Product logo

tegra-boot-tools's People

Contributors

danielfullmer avatar kekiefer avatar madisongh avatar pseyfert-sevensense avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tegra-boot-tools's Issues

tegra-bootloader-update should update BCT

While there is only one BCT partition, it is subdivided into multiple entries. Examining the bootloader update code in cboot, there is a specific update sequence to follow for the BCT. We should follow the same sequence.

When testing with nv_update_engine, it looks like BCT updates are only applied when the BCT in the BUP payload does not match the current BCT.

tegra-bootloader-update should pre-validate TNSPECs

For payload entries that have specs to be matched against the system's TNSPEC, it should verify that there is at least one matching entry for the current TNSPEC, and report a failure if there are no matches.

Many upgrade failures with "ERR: cannot perform bootloader update"

Hi guys,

We're seeing a lot of failures when we upgrade our devices with the following message:
ERR: cannot perform bootloader update

We then created a build with a patched Mender script. Here's the patch for reference:

--- meta-mender-community/meta-mender-tegra/recipes-mender/tegra-state-scripts/files/redundant-boot-install-script-uboot        2023-06-29 16:33:18.398067637 +0200
+++ meta-nobi/meta-nobi-tegra/recipes-mender/tegra-state-scripts/files/redundant-boot-install-script-uboot      2023-07-02 16:21:23.247559416 +0200
@@ -60,7 +60,7 @@
     # If the tool reports that the version partitions are corrupted, this is an update on a tegra210
     # device with the old partition layout where the U-Boot environment overwrote the version partition(s),
     # in which case we recover via complete initialization.
-    if chroot "${mnt}" /usr/bin/tegra-bootloader-update --dry-run /opt/ota_package/bl_update_payload 2>&1 | grep -q 'version partitions are corrupted'; then
+    if chroot "${mnt}" /usr/bin/tegra-bootloader-update --dry-run /opt/ota_package/bl_update_payload 2>&1 | tee /tmp/bl_update_output | grep -q 'version partitions are corrupted'; then
        # For the recoverable case, we will have also detected a change the U-Boot environment change
        if [ -n "$install_fwenv" ]; then
            echo "Detected bootloader partition upgrade, reinitializing" >&2
@@ -76,11 +76,15 @@
        fi
     else
        echo "ERR: cannot perform bootloader update" >&2
+       echo "tegra-bootloader-update output:" >&2
+       cat /tmp/bl_update_output >&2
        cleanup
        exit 1
     fi
-elif ! chroot "${mnt}" /usr/bin/tegra-bootloader-update /opt/ota_package/bl_update_payload; then
+elif ! chroot "${mnt}" /usr/bin/tegra-bootloader-update /opt/ota_package/bl_update_payload > /tmp/bl_update_output; then
     echo "ERR: bootloader update failed" >&2
+    echo "tegra-bootloader-update output:" >&2
+    cat /tmp/bl_update_output >&2
     cleanup
     exit 1
 fi

We then get the following output in Mender:

ERR: cannot perform bootloader update
tegra-bootloader-update output:
/opt/ota_package/bl_update_payload: Cannot allocate memory

This leads me to the following lines in the tegra-bootloader-update.c file. It's the perror below that triggers the printed error:

	bupctx = bup_init(argv[optind]);
	if (bupctx == NULL) {
		perror(argv[optind]);
		return 1;
	}

Digging into bup_init I found this:

#define BUFFERSIZE (1024 * 1024 * 1024)
[...]
ctx->buffer = malloc(BUFFERSIZE);

I've tried to investigate what this buffer is used for, but to me it seems like it's just used to load the bup payload file into memory? I'm wondering why a buffer of 1GiB is allocated to do that? Can we reduce this size safely?

Thanks!
Niels

Corrupted partition table on sudden power-loss

Issue

We have investigated an issue on our devices that got stuck in the bootloader after a sudden power-loss during boot. We could trace it back to the bootcheckcount script. The partition table on mmcblk0 seems to be corrupted when power is lost during its execution.

Before starting to look into the code in more detail, I wanted to ask here if you might already be aware of the problem or have some advice on how to approach it?

How to reproduce

Power-off the system while the following command is running. It might require some attempts to hit the critical section.

tegra-bootinfo -b

This was tested on a Xavier NX w. sdcard on Devkit

Bootloader logs during fault:

Endless loop of the following:

[0000.024] W> RATCHET: MB1 binary ratchet value 4 is too large than ratchet level 2 from HW fuses.
[0000.033] I> MB1 (prd-version: 1.5.1.7-t194-41334769-98030a79)
[0000.038] I> Boot-mode: Coldboot
[0000.041] I> Chip revision : A02P
[0000.044] I> Bootrom patch version : 15 (correctly patched)
[0000.049] I> ATE fuse revision : 0x200
[0000.053] I> Ram repair fuse : 0x0
[0000.056] I> Ram Code : 0x0
[0000.058] I> rst_source : 0xb
[0000.061] I> rst_level : 0x1
[0000.065] I> Boot-device: QSPI
[0000.067] I> Qspi flash params source = brbct
[0000.071] I> Qspi using bpmp-dma
[0000.074] I> Qspi clock source : pllp
[0000.078] I> QSPI Flash Size = 32 MB
[0000.081] I> Qspi initialized successfully
[0000.085] W> No valid slot number is found in scratch register
[0000.091] W> Return default slot: _a
[0000.094] I> Active Boot chain : 0
[0000.097] I> Boot-device: QSPI
[0000.100] I> Qspi flash params source = brbct
[0000.106] W> MB1_PLATFORM_CONFIG: device prod data is empty in MB1 BCT.
[0000.112] I> Temperature = 27500
[0000.115] W> Skipping boost for clk: BPMP_CPU_NIC
[0000.119] W> Skipping boost for clk: BPMP_APB
[0000.123] W> Skipping boost for clk: AXI_CBB
[0000.127] W> Skipping boost for clk: AON_CPU_NIC
[0000.132] W> Skipping boost for clk: CAN1
[0000.135] W> Skipping boost for clk: CAN2
[0000.140] I> Boot-device: QSPI
[0000.142] I> Boot-device: QSPI
[0000.145] I> Qspi flash params source = mb1bct
[0000.149] I> Qspi using bpmp-dma
[0000.152] I> Qspi clock source : pllc_out0
[0000.156] I> Qspi reinitialized
[0000.159] I> Qspi flash params source = mb1bct
[0000.164] I> ECC region[0]: Start:0x0, End:0x0
[0000.169] I> ECC region[1]: Start:0x0, End:0x0
[0000.173] I> ECC region[2]: Start:0x0, End:0x0
[0000.177] I> ECC region[3]: Start:0x0, End:0x0
[0000.181] I> ECC region[4]: Start:0x0, End:0x0
[0000.185] I> Non-ECC region[0]: Start:0x80000000, End:0x100000000
[0000.191] I> Non-ECC region[1]: Start:0x0, End:0x0
[0000.195] I> Non-ECC region[2]: Start:0x0, End:0x0
[0000.200] I> Non-ECC region[3]: Start:0x0, End:0x0
[0000.204] I> Non-ECC region[4]: Start:0x0, End:0x0
[0000.210] E> FAILED: Thermal config
[0000.217] E> FAILED: MEMIO rail config
[0000.227] I> Boot-device: QSPI
[0000.230] I> Qspi flash params source = mb1bct
[0000.239] I> Qspi flash params source = mb1bct
[0000.250] I> Qspi flash params source = mb1bct
[0000.317] I> Qspi flash params source = mb1bct
[0000.326] I> Qspi flash params source = mb1bct
[0000.355] I> Qspi flash params source = mb1bct
[0000.367] I> MB1 done

main enter
SPE VERSION #: R01.00.14 Created: Sep 19 2018 @ 11:03:21
HW Function test
Start Scheduler.
in late init

  [0000.375] I> Welcome to MB2(TBoot-BPMP) (version: 00.00.2018.32-mobile-938a8cd8)
[0000.375] I> DMA Heap @ [0x526fa000 - 0x52ffa000]
[0000.376] I> Default Heap @ [0xd486400 - 0xd48a400]
[0000.376] E> DEVICE_PROD: Invalid value data = 70020000, size = 0.
[0000.382] W> device prod register failed
[0000.386] I> Boot-device: QSPI
[0000.389] I> Boot_device: QSPI_FLASH instance: 0
[0000.394] I> QSPI Flash Size = 32 MB
[0000.400] I> Qspi initialized successfully
[0000.401] I> qspi flash-0 params source = boot args
[0000.407] W> Cannot find any partition table for 00030000
[0000.411] W> No valid slot number is found in scratch register
[0000.417] W> Return default slot: _a
[0000.420] I> Active Boot chain : 0
[0000.423] E> Cannot find partition bpmp-fw
[0000.427] E> Partition bpmp-fw not found
[0000.431] I> load/auth: execution failed
[0000.435] E> Top caller module: LOADER, error module: PARTITION_MANAGER, reason: 0x0d, aux_info: 0x00
[0000.444] I> AB warm reset

Need better handling of missing partitions

Some of the partitions packed into a BUP payload are optional and may not be present on some platforms, such as BMP and EKS. tegra-bootloader-update is reporting errors when partitions don't exist for all of the entries in a BUP payload.

Since the BUP generator scripts hard-code all of the entries they include, we need some way to allow for some partitions to be missing. For example, extending the configuration file that enumerates the boot device partition entries to list all of the partitions, and skipping any BUP entries for partitions not in the full list, should take care of this.

Either that, or the BUP generator could be improved to align the included partitions with the flash layout XML file, so only those partitions present get included. That would probably be harder to implement, though.

Implement locking in tegra-bootinfo

Add an advisory lock between users of the tegra-bootinfo. With the addition of boot variables, it's more likely that there may be multiple processes trying to read or write the boot info simultaneously. Should guard against uncoordinate writes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.