Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvs_storage implementation uses asserts and crashes system instead of returning an error (IDFGH-6629) #8271

Closed
mastrogippo opened this issue Jan 21, 2022 · 7 comments
Assignees
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally

Comments

@mastrogippo
Copy link

mastrogippo commented Jan 21, 2022

Environment

  • Development Kit: Any chip supporting NVS library
  • IDF version v4.4-beta1
  • Build System: idf.py
  • Compiler version xtensa-esp32-elf-gcc (crosstool-NG esp-2021r2) 8.4.0
  • Operating System: Linux

Problem Description

NVS blob read function uses asserts in its implementation; this causes a CPU reset in case of error.
In my specific case, the problematic line is:

assert(dataSize == item.blobIndex.dataSize);

But there are asserts sprinkled all around the code.

In my use case, I'm reading a blob using:

   uint16_t tmp_reg[0x10];
    esp_err_t err = settings_handle->get_blob(key, tmp_reg, (len+1) * 2);
    if(err == ESP_ERR_NVS_NOT_FOUND)
    {
        SYS_ERR_code |= err_NVS;
        ESP_LOGE("NVS", "VAL not initialized 0x%02X", addr);
        return err;
    }
    else if(err != ESP_OK)
    {
        SYS_ERR_code |= err_NVS;
        ESP_LOGE("NVS", "Error reading 0x%02X", addr);
        return err;
    }

If len is wrong (e.g. bigger than it's stored value), the CPU crashes

Are those asserts there for a specific reason? Would you merge a fix that returns an error if I spend the time to write it, or should I just implement a workaround for my own code?

Expected Behavior

I would expect a returned error (ESP_ERR_NVS_INVALID_LENGTH) from "get_blob" that I can check and act upon (in my specific case, I would delete the relevant config section and rewrite it with the correct size).

Actual Behavior

System crashes due to assert

Steps to reproduce

  1. read a blob from NVS specifying a different size than what is saved

Code to reproduce this issue

uint8_t reg[20] = "testTESTtest";
settings_handle->set_blob("hello", reg, 8);
esp_err_t err = settings_handle->get_blob("hello", reg, 9); //Crashed CPU instead of returning err

Debug Logs

I (464) NVS: Reading data

assert failed: esp_err_t nvs::Storage::readMultiPageBlob(uint8_t, const char*, void*, size_t) nvs_storage.cpp:460 (dataSize == item.blobIndex.dataSize)


Backtrace:0x40081be2:0x3ffb60000x40087c61:0x3ffb6020 0x4008def1:0x3ffb6040 0x400e3320:0x3ffb6160 0x400e3413:0x3ffb61e0 0x400e3a95:0x3ffb6240 0x400e3c81:0x3ffb6260 0x400d6430:0x3ffb6280 0x400d6594:0x3ffb62d0 0x400d6f4e:0x3ffb62f0 0x400d70c6:0x3ffb6330 0x400f60e4:0x3ffb6360 0x4008af99:0x3ffb6380 
0x40081be2: panic_abort at /home/esp-env/esp/esp-idf/components/esp_system/panic.c:402

0x40087c61: esp_system_abort at /home/esp-env/esp/esp-idf/components/esp_system/esp_system.c:121

0x4008def1: __assert_func at /home/esp-env/esp/esp-idf/components/newlib/assert.c:85

0x400e3320: nvs::Storage::readMultiPageBlob(unsigned char, char const*, void*, unsigned int) at /home/esp-env/esp/esp-idf/components/nvs_flash/src/nvs_storage.cpp:460 (discriminator 1)

0x400e3413: nvs::Storage::readItem(unsigned char, nvs::ItemType, char const*, void*, unsigned int) at /home/esp-env/esp/esp-idf/components/nvs_flash/src/nvs_storage.cpp:538

0x400e3a95: nvs::NVSHandleSimple::get_blob(char const*, void*, unsigned int) at /home/esp-env/esp/esp-idf/components/nvs_flash/src/nvs_handle_simple.cpp:66

0x400e3c81: nvs::NVSHandleLocked::get_blob(char const*, void*, unsigned int) at /home/esp-env/esp/esp-idf/components/nvs_flash/src/nvs_handle_locked.cpp:42

0x400d6430: load_ATM_config(char const*, unsigned char, unsigned char) at /home/esp-env/esp/SmartMeterMB/build/../main/sm_settings.hpp:117

0x400d6594: load_ATM_settings() at /home/esp-env/esp/SmartMeterMB/build/../main/sm_settings.hpp:172

0x400d6f4e: load_settings() at /home/esp-env/esp/SmartMeterMB/build/../main/sm_settings.hpp:221

0x400d70c6: app_main at /home/esp-env/esp/SmartMeterMB/build/../main/main.cpp:148

0x400f60e4: main_task at /home/esp-env/esp/esp-idf/components/freertos/port/port_common.c:129 (discriminator 2)

0x4008af99: vPortTaskWrapper at /home/esp-env/esp/esp-idf/components/freertos/port/xtensa/port.c:130
@espressif-bot espressif-bot added the Status: Opened Issue is new label Jan 21, 2022
@github-actions github-actions bot changed the title nvs_storage implementation uses asserts and crashes system instead of returning an error nvs_storage implementation uses asserts and crashes system instead of returning an error (IDFGH-6629) Jan 21, 2022
@igrr
Copy link
Member

igrr commented Jan 21, 2022

Hi @mastrogippo, thanks for pointing out this issue. That specific assert indeed looks incorrect, it should have been a check returning an error, instead.

Are those asserts there for a specific reason?

Generally we try to report errors instead of asserting/aborting on what could be user input, in this case we made a mistake.

There are also asserts in NVS which check the internal consistency of the NVS state. The motivation for those asserts is that if one of those fails, then it means the NVS implementation is incorrect or logically inconsistent. Our standard approach for this case is to fail at the earliest point possible to detect the error, and assert does that.

However over time we found that occasionally users report assertion failures in NVS where NVS asserts on each start-up (i.e. call to nvs_flash_init) and is never able to recover. Usually in these cases we also see that NVS partition is somehow corrupted. So realistically we should try to give the application a chance to recover from such situation in production, even if in debug mode and during testing we would prefer to assert. There is a task open in our internal issue tracker to replace some/most NVS assertions with recoverable error checks, but it's not completed yet.

You are welcome to send a PR for fixing the specific assert which you found! However please note that we would still need to add tests to the NVS test suite to cover this situation, so it's not going to be an automatic merge. Likely we'll have to amend you commit and add the tests.

@igrr
Copy link
Member

igrr commented Jan 21, 2022

I found that there is another issue open which seems related to the same problem, #7837.

@igrr
Copy link
Member

igrr commented Jan 21, 2022

I think the temporary workaround is to first call nvs_get_blob to query the correct size and then call it again with the correct size. Copying the code from the docs,

size_t required_size;
nvs_get_str(my_handle, "server_name", NULL, &required_size);
char* server_name = malloc(required_size);
nvs_get_str(my_handle, "server_name", server_name, &required_size);

(this is about nvs_get_str, but nvs_get_blob would work in a similar way)

@mastrogippo
Copy link
Author

Thanks @igrr , since you're already working on it and it's a more widespread issue than my single assert, I'll just implement the workaround for the time being.

@0xjakob 0xjakob self-assigned this Jan 24, 2022
@espressif-bot espressif-bot added Resolution: Duplicate This issue or pull request already exists Status: Done Issue is done internally Status: In Progress Work is in progress and removed Status: Opened Issue is new Resolution: Duplicate This issue or pull request already exists Status: Done Issue is done internally labels May 4, 2022
@pacucha42
Copy link
Collaborator

Reopen until the main fix gets merged

@espressif-bot espressif-bot assigned Mliron and unassigned 0xjakob May 18, 2022
@espressif-bot espressif-bot assigned pacucha42 and unassigned Mliron Jun 30, 2022
@espressif-bot espressif-bot added Resolution: Done Issue is done internally Status: Done Issue is done internally and removed Status: In Progress Work is in progress labels Jul 25, 2022
@Alvin1Zhang
Copy link
Collaborator

Thanks for reporting, sorry for slow turnaround, fix on master branch is available d1ec558, and on release/4.4 branch is available 582f954. Feel free to reopen.

@guilhermep-git
Copy link

Hello.
I’m facing the same problem with nvs_get_blob and nvs_get_str.
I’m using the workaround proposed by Igrr, that is used in API Application Example also: https://docs.espressif.com/projects/esp-idf/en/v5.0.6/esp32/api-reference/storage/nvs_flash.html#application-example

Environment:

Target: ESP32
IDF version v5.0.6
Build System: Eclipse IDE
Operating System: Windows 10
Python 3.11.1

Debug Logs

Backtrace:

0x4008b3eb:0x3fff2410 0x4008c227:0x3fff2440 0x4008c22f:0x3fff2460 0x4008fcc9:0x3fff2480 0x4008bdfd:0x3fff24a0 0x4010c8d9:0x3fff24e0 0x40110926:0x3fff2510 0x40194141:0x3fff2530 0x401114fe:0x3fff2550 0x4010fbbd:0x3fff25b0 0x4010ff86:0x3fff2600 0x401100a5:0x3fff2680 0x40110799:0x3fff26f0 0x4010f509:0x3fff2710 0x4010f8d1:0x3fff2750 0x400d74ee:0x3fff2770 0x400df6e0:0x3fff2830

Files:

0x4008b3eb: spi_flash_disable_interrupts_caches_and_other_cpu at C:/Users/guilhermepiedade/esp-idf-v5.0.6/esp-idf/components/spi_flash/cache_utils.c:173 (discriminator 1)
0x4008c227: cache_disable at C:/Users/guilhermepiedade/esp-idf-v5.0.6/esp-idf/components/spi_flash/spi_flash_os_func_app.c:74
0x4008c22f: spi1_start at C:/Users/guilhermepiedade/esp-idf-v5.0.6/esp-idf/components/spi_flash/spi_flash_os_func_app.c:114
0x4008fcc9: spiflash_start_default at C:/Users/guilhermepiedade/esp-idf-v5.0.6/esp-idf/components/spi_flash/esp_flash_api.c:123
0x4008bdfd: esp_flash_read at C:/Users/guilhermepiedade/esp-idf-v5.0.6/esp-idf/components/spi_flash/esp_flash_api.c:831
0x4010c8d9: esp_partition_read at C:/Users/guilhermepiedade/esp-idf-v5.0.6/esp-idf/components/esp_partition/partition_target.c:38
0x40110926: nvs::NVSPartition::read(unsigned int, void*, unsigned int) at C:/Users/guilhermepiedade/esp-idf-v5.0.6/esp-idf/components/nvs_flash/src/nvs_partition.cpp:45
(inlined by) nvs::NVSPartition::read(unsigned int, void*, unsigned int) at C:/Users/guilhermepiedade/esp-idf-v5.0.6/esp-idf/components/nvs_flash/src/nvs_partition.cpp:39
0x40194141: nvs::Page::readEntry(unsigned int, nvs::Item&) const at C:/Users/guilhermepiedade/esp-idf-v5.0.6/esp-idf/components/nvs_flash/src/nvs_page.cpp:879
0x401114fe: nvs::Page::findItem(unsigned char, nvs::ItemType, char const*, unsigned int&, nvs::Item&, unsigned char, nvs::VerOffset) at C:/Users/guilhermepiedade/esp-idf-v5.0.6/esp-idf/components/nvs_flash/src/nvs_page.cpp:929
(inlined by) nvs::Page::findItem(unsigned char, nvs::ItemType, char const*, unsigned int&, nvs::Item&, unsigned char, nvs::VerOffset) at C:/Users/guilhermepiedade/esp-idf-v5.0.6/esp-idf/components/nvs_flash/src/nvs_page.cpp:886
0x4010fbbd: nvs::Storage::findItem(unsigned char, nvs::ItemType, char const*, nvs::Page*&, nvs::Item&, unsigned char, nvs::VerOffset) at C:/Users/guilhermepiedade/esp-idf-v5.0.6/esp-idf/components/nvs_flash/src/nvs_storage.cpp:163
0x4010ff86: nvs::Storage::readMultiPageBlob(unsigned char, char const*, void*, unsigned int) at C:/Users/guilhermepiedade/esp-idf-v5.0.6/esp-idf/components/nvs_flash/src/nvs_storage.cpp:463
0x401100a5: nvs::Storage::readItem(unsigned char, nvs::ItemType, char const*, void*, unsigned int) at C:/Users/guilhermepiedade/esp-idf-v5.0.6/esp-idf/components/nvs_flash/src/nvs_storage.cpp:550
0x40110799: nvs::NVSHandleSimple::get_typed_item(nvs::ItemType, char const*, void*, unsigned int) at C:/Users/guilhermepiedade/esp-idf-v5.0.6/esp-idf/components/nvs_flash/src/nvs_handle_simple.cpp:36
(inlined by) nvs::NVSHandleSimple::get_typed_item(nvs::ItemType, char const*, void*, unsigned int) at C:/Users/guilhermepiedade/esp-idf-v5.0.6/esp-idf/components/nvs_flash/src/nvs_handle_simple.cpp:32
0x4010f509: nvs_get_str_or_blob(unsigned long, nvs::ItemType, char const*, void*, unsigned int*) at C:/Users/guilhermepiedade/esp-idf-v5.0.6/esp-idf/components/nvs_flash/src/nvs_api.cpp:510
0x4010f8d1: nvs_get_blob at C:/Users/guilhermepiedade/esp-idf-v5.0.6/esp-idf/components/nvs_flash/src/nvs_api.cpp:520
0x400d74ee: getPushIdStorageNVS at D:/Projetos/.../main/storageNVS.c:313
0x400df6e0: taskNotification at D:/Projetos/.../main/notification.c:118

The last two files refer to my application.

This problem happens randomically.
Is there some solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

8 participants