Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esp_ota_write unexpectedly slow on encrypted flash (IDFGH-11415) #12553

Closed
3 tasks done
boborjan2 opened this issue Nov 9, 2023 · 5 comments
Closed
3 tasks done

esp_ota_write unexpectedly slow on encrypted flash (IDFGH-11415) #12553

boborjan2 opened this issue Nov 9, 2023 · 5 comments
Assignees
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally Type: Bug bugs in IDF

Comments

@boborjan2
Copy link

boborjan2 commented Nov 9, 2023

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

v4.4.6-150-g2bab3b36bc

Espressif SoC revision.

Chip is ESP32-D0WD-V3 (revision v3.1)

Operating System used.

Linux

How did you build your project?

Command line with Make

If you are using Windows, please specify command line type.

None

Development Kit.

Custom board using ESP32-WROOM-32E

Power Supply used.

External 3.3V

What is the expected behavior?

We have a working product that uses OTA via http server running on ESP32. Client is PUTting firmware image to the server that writes the image to flash via esp_ota_write(). This works just fine.
However now we decided to turn on flash encryption and secure boot (V2). No issues there either, but: when flash encryption is enabled, esp_ota_write() is plainfully slow: it performs a write of a 8kbyte block in 400ms. Interestingly, the flash write times during serial update does not change much. Here are the numbers:
plaintext

  • serial: 17.8s (esptool, 921k, 1.2M image)
  • http: 11.7s (same image via the http server. Curl as client)
  • 8k: 25ms (time for writing a 8k block by esp_ota_write(), using esp_timer_get_time())

encrypted:

  • serial: 22.2s (seems alright)
  • http: 67s (!!!)
  • 8k: 400ms (!!!)

Note: the measurement is done using the same binary running on two identical modules of the same batch, one of them plaintext, the other encrypted.

What is the actual behavior?

esp_ota_write() speed should be much higher.

Steps to reproduce.

See the comment below where I included a simple test.

Debug Logs.

bootup of the encrypted module:
I (51) boot: compile time 21:38:27
I (51) boot: Multicore bootloader
I (53) boot: chip revision: v3.1
I (55) qio_mode: Enabling default flash chip QIO
I (58) boot.esp32: SPI Speed      : 80MHz
I (60) boot.esp32: SPI Mode       : QIO
I (62) boot.esp32: SPI Flash Size : 4MB
I (64) boot: Enabling RNG early entropy source...
I (67) boot: Partition Table:
I (69) boot: ## Label            Usage          Type ST Offset   Length
I (73) boot:  0 nvs              WiFi data        01 02 0000e000 00004000
I (76) boot:  1 otadata          OTA data         01 00 00012000 00002000
I (80) boot:  2 phy_init         RF data          01 01 00014000 00001000
I (84) boot:  3 ota_0            OTA app          00 10 00040000 001e0000
I (88) boot:  4 ota_1            OTA app          00 11 00220000 001e0000
I (91) boot: End of partition table
I (93) esp_image: segment 0: paddr=00040020 vaddr=3f400020 size=37d1ch (228636) map
I (163) esp_image: segment 1: paddr=00077d44 vaddr=3ff80063 size=00008h (     8) load
I (164) esp_image: segment 2: paddr=00077d54 vaddr=3ffbdb60 size=056a0h ( 22176) load
I (173) esp_image: segment 3: paddr=0007d3fc vaddr=40080000 size=02c1ch ( 11292) load
I (177) esp_image: segment 4: paddr=00080020 vaddr=400d0020 size=cfcach (851116) map
I (422) esp_image: segment 5: paddr=0014fcd4 vaddr=40082c1c size=1c250h (115280) load
I (462) esp_image: segment 6: paddr=0016bf2c vaddr=400c0000 size=00064h (   100) load
I (462) esp_image: segment 7: paddr=0016bf98 vaddr=00000000 size=04038h ( 16440)
I (469) esp_image: Verifying image signature...
I (470) secure_boot_v2: Verifying with RSA-PSS...
I (474) secure_boot_v2: Signature verified successfully!
I (489) boot: Loaded app from partition at offset 0x40000
I (489) secure_boot_v2: enabling secure boot v2...
I (489) secure_boot_v2: secure boot v2 is already enabled, continuing..
I (493) boot: Checking flash encryption...
I (495) flash_encrypt: flash encryption is enabled (2 plaintext flashes left)
I (499) boot: Disabling RNG early entropy source...
I (508) cpu_start: Multicore app
I (508) cpu_start: Pro cpu up.
I (508) cpu_start: Starting app cpu, entry point is 0x40081630
I (485) cpu_start: App cpu up.
I (518) cpu_start: Pro cpu start user code
I (518) cpu_start: cpu freq: 80000000
I (518) cpu_start: Application information:
I (520) cpu_start: Project name:     btchanger
I (523) cpu_start: App version:      btchanger.1.8.0-dirty
I (526) cpu_start: Compile time:     Nov  9 2023 09:56:21
I (529) cpu_start: ELF file SHA256:  c28c0438fb4e1afc...
I (532) cpu_start: ESP-IDF:          v4.4.6-156-g87a39d9689-dirty
I (535) cpu_start: Min chip rev:     v3.0
I (538) cpu_start: Max chip rev:     v3.99
I (540) cpu_start: Chip rev:         v3.1
I (543) heap_init: Initializing. RAM available for dynamic allocation:
I (546) heap_init: At 3FFAFF10 len 000000F0 (0 KiB): DRAM
I (549) heap_init: At 3FFB6BF8 len 00001408 (5 KiB): DRAM
I (552) heap_init: At 3FFB9A20 len 00004108 (16 KiB): DRAM
I (555) heap_init: At 3FFD2250 len 0000DDB0 (55 KiB): DRAM
I (558) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (561) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (565) heap_init: At 4009EE6C len 00001194 (4 KiB): IRAM
I (569) spi_flash: detected chip: generic
I (570) spi_flash: flash io: qio
W (572) flash_encrypt: Flash encryption mode is DEVELOPMENT (not secure)
I (577) sleep: Configure to isolate all GPIO pins in sleep state
I (579) sleep: Enable automatic switching of GPIO sleep configuration
I (583) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.

More Information.

No response

@boborjan2 boborjan2 added the Type: Bug bugs in IDF label Nov 9, 2023
@boborjan2
Copy link
Author

This is the sdkconfig of the encrypted module:
sdkconfig.txt

@espressif-bot espressif-bot added the Status: Opened Issue is new label Nov 9, 2023
@github-actions github-actions bot changed the title esp_ota_write unexpectedly slow on encrypted flash esp_ota_write unexpectedly slow on encrypted flash (IDFGH-11415) Nov 9, 2023
@boborjan2
Copy link
Author

boborjan2 commented Nov 10, 2023

I checked the tasks on the system during OTA update. There are some strange phenomena:

  1. httpd task core (xCoreID in TaskStatus_t) is odd
  2. ipc1 task cpu usage is unexpectedly high

(cpu usage %, @coreIDm b:baseprio, c:currentprio, s:stack high watermark)
(only tasks with >0% cpu usage are shown)

httpd                 75%  (@2147483647,b:24,c:24,s:3196)
ipc1                  62%  (@1,b:24,c:24,s:1500)
IDLE0                 28%  (@0,b:0,c:0,s:992)
main                  9%  (@0,b:1,c:1,s:6116)
CAN                   7%  (@1,b:22,c:22,s:3316)
IDLE1                 5%  (@1,b:0,c:0,s:1004)
ipc0                  4%  (@0,b:24,c:24,s:1544)
wifi                  3%  (@0,b:23,c:23,s:1272)
tiT                   2%  (@2147483647,b:18,c:18,s:1448)

When I pin http to core 0 or 1, I strangely get somewhat better flash write performance. Pinned to core0, it looks like this:

httpd                 89%  (@0,b:5,c:5,s:3284)
ipc1                  74%  (@1,b:24,c:24,s:1500)
CAN                   12%  (@1,b:22,c:22,s:3316)
IDLE1                 10%  (@1,b:0,c:0,s:1004)
wifi                  4%  (@0,b:23,c:23,s:1272)
tiT                   3%  (@2147483647,b:18,c:18,s:1448)
main                  3%  (@0,b:1,c:1,s:6116)

pinned to core 1:

httpd                 83%  (@1,b:5,c:5,s:3188)
ipc0                  65%  (@0,b:24,c:24,s:1496)
main                  15%  (@0,b:1,c:1,s:6052)
CAN                   13%  (@1,b:22,c:22,s:3316)
IDLE0                 12%  (@0,b:0,c:0,s:992)
wifi                  4%  (@0,b:23,c:23,s:1272)
tiT                   2%  (@2147483647,b:18,c:18,s:1448)
IDLE1                 1%  (@1,b:0,c:0,s:1004)

What are ipc tasks doing during OTA update? Where shall OTA update run? (now it runs from httpd server context).

@boborjan2
Copy link
Author

boborjan2 commented Nov 10, 2023

I made a small app_main-only test: (run on the board with flash encryption enabled)

    esp_partition_t* partition = esp_partition_find_first(
        ESP_PARTITION_TYPE_APP, ESP_PARTITION_SUBTYPE_ANY, "ota_0");

    if(partition) {
        int64_t beginT, endT;
        //const int RUNS = 1000;
        static uint8_t data[8192];
        esp_fill_random(data, sizeof(data));

        for(;;) {
            ESP_ERROR_CHECK(esp_partition_erase_range(partition, 0, sizeof(data)));//sizeof(data)));

            beginT = esp_timer_get_time();
            partition->encrypted = 1;
            ESP_ERROR_CHECK(esp_partition_write(partition, 0, data, sizeof(data)));
            endT = esp_timer_get_time();
            printf("esp_partition_write: %dms\n", (uint32_t)(endT - beginT)/1000);

            ESP_ERROR_CHECK(esp_partition_erase_range(partition, 0, sizeof(data)));//sizeof(data)));

            beginT = esp_timer_get_time();
            partition->encrypted = 0;
            ESP_ERROR_CHECK(esp_partition_write(partition, 0, data, sizeof(data)));
            endT = esp_timer_get_time();
            printf("esp_partition_write: %dms\n", (uint32_t)(endT - beginT)/1000);

            vTaskDelay(1000);
        }
    }

Results (write times of a 8kbyte block):
25ms (plaintext), 105ms (encrypted).

Interestingly legacy implementation (SPI_FLASH_USE_LEGACY_IMPL) improves flash write speed but the difference is there:
18ms (plaintext), 88ms (encrypted).

This difference is also an issue but the main problem is that why does encrypted flash speed decay so much when serial write via esptool.py does not show such degradation?

@boborjan2
Copy link
Author

I made some measurements to find where the time is spent during flash writes. I did this by accumulating the ccounts needed by various parts of the code.
e.g.:

_tick_prev = xthal_get_ccount();
        err = rom_spiflash_api_funcs->start(chip);
_tick_sum += xthal_get_ccount() - _tick_prev;

From the ~100ms that is needed for writing 8192 bytes to encrypted flash, 47ms is spent in rom_spiflash_api_funcs->start(chip) in esp_flash_api.c, functionesp_flash_write_encrypted(). This number is 0ms for the non-encrypted write. The same is true with spi_flash_guard_start() in the legacy code.

The reason for the increased time needed for flash writes when flash encryption is enabled:
Non-encrypted flash writes are performed in chunks of CONFIG_SPI_FLASH_WRITE_CHUNK_SIZE that is set to 8192 by default.
Before every write, spi_flash_guard_start() is called in the legacy implementation and rom_spiflash_api_funcs->start(chip) is called in the non-legacy case. This prevents all flash based code from running during flash writes. The chunk size is set to avoid starvation.

When encryption is enabled, CONFIG_SPI_FLASH_WRITE_CHUNK_SIZE does not come into play: the writes are always done in small chunks of max 64 bytes. These single writes are ALL protected by the guards that takes most of the time.

Imho this is a bug, spi guards shall only be called in larger chunks, the same way as it is done in the non-encrypted case.
Please advise.

@boborjan2
Copy link
Author

In the legacy implementation, this modified spi_flash_write_encrypted_chip() (in flash_ops_esp32.c) speeds up encrypted writes from 100ms to 20ms:

esp_rom_spiflash_result_t IRAM_ATTR spi_flash_write_encrypted_chip(size_t dest_addr, const void *src, size_t size)
{
    const uint8_t *ssrc = (const uint8_t *)src;
    esp_rom_spiflash_result_t rc = ESP_ROM_SPIFLASH_RESULT_OK;

    assert((dest_addr % 16) == 0);
    assert((size % 16) == 0);

    /* esp_rom_spiflash_write_encrypted encrypts data in RAM as it writes,
       so copy to a temporary buffer - 32 bytes at a time.

       Each call to esp_rom_spiflash_write_encrypted takes a 32 byte "row" of
       data to encrypt, and each row is two 16 byte AES blocks
       that share a key (as derived from flash address).
    */
    uint8_t encrypt_buf[32] __attribute__((aligned(4)));
    uint32_t row_size;
    uint8_t pre_buf[16];
    uint8_t post_buf[16];

    if((dest_addr % 32) != 0) {
        spi_flash_read_encrypted(dest_addr - 16, pre_buf, 16);
    }
    if(((dest_addr + size) % 32) != 0) {
        spi_flash_read_encrypted(dest_addr + size, post_buf, 16);
    }

    spi_flash_guard_start();
    for (size_t i = 0; i < size; i += row_size) {
        uint32_t row_addr = dest_addr + i;
        if (i == 0 && (row_addr % 32) != 0) {
            /* writing to second block of a 32 byte row */
            row_size = 16;
            row_addr -= 16;
            /* copy to second block in buffer */
            memcpy(encrypt_buf + 16, ssrc + i, 16);
            /* decrypt the first block from flash, will reencrypt to same bytes */
            memcpy(encrypt_buf, pre_buf, 16);
        } else if (size - i == 16) {
            /* 16 bytes left, is first block of a 32 byte row */
            row_size = 16;
            /* copy to first block in buffer */
            memcpy(encrypt_buf, ssrc + i, 16);
            /* decrypt the second block from flash, will reencrypt to same bytes */
            memcpy(encrypt_buf + 16, post_buf, 16);
        } else {
            /* Writing a full 32 byte row (2 blocks) */
            row_size = 32;
            memcpy(encrypt_buf, ssrc + i, 32);
        }

        flash_rom_init();
        rc = esp_rom_spiflash_write_encrypted(row_addr, (uint32_t *)encrypt_buf, 32);
        if (rc != ESP_ROM_SPIFLASH_RESULT_OK) {
            break;
        }
    }
    spi_flash_guard_end();
    bzero(encrypt_buf, sizeof(encrypt_buf));

    return rc;
}

The same can be applied to the non-legacy solution as well. However, even with the mods similar to the one above, the non-legacy is still slower than the legacy one. (~20ms compared to ~38ms). Please elaborate.

@espressif-bot espressif-bot added Status: Selected for Development Issue is selected for development and removed Status: Opened Issue is new labels Nov 14, 2023
@espressif-bot espressif-bot added Status: Done Issue is done internally Resolution: NA Issue resolution is unavailable and removed Status: Selected for Development Issue is selected for development labels Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally Type: Bug bugs in IDF
Projects
None yet
Development

No branches or pull requests

3 participants