Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage-kvstore-tdbstore-tests-TESTS-tdbstore-whitebox test fails on Toshiba Target #14128

Closed
deepak-shreshti opened this issue Jan 8, 2021 · 22 comments · Fixed by #14483
Closed
Assignees

Comments

@deepak-shreshti
Copy link
Contributor

Description of defect

We are trying to port new Toshiba MCU to mbed-os.
One of the greentea tests(storage-kvstore-tdbstore-tests-TESTS-tdbstore-whitebox) fails on this target.
The following FAIL log gets generated:

mbed-os-master-FAIL.log

Same test passes when run on target for previously released versions(mbed-os-6.6.0 or below).
In those versions, it skips the test telling "SKIP: Not enough heap to run test".
Below is the PASS log that gets generated using mbed-os-6.6.0:

mbed-os-6.6.0-PASS.log

In the current master package, flash simulator BD support has been removed and heap BD is used.
We are suspecting whether less ram is causing this failure or what.
This Toshiba Target has 24KB ram and 256KB flash storage.

Please help us in this regard.

Target(s) affected by this defect ?

New TOSHIBA MCU of which still porting to mbed-os is ongoing.

Toolchain(s) (name and version) displaying this defect ?

GCC_ARM 9 2019-q4-major

What version of Mbed-os are you using (tag or sha) ?

mbed-os-99.99.99

What version(s) of tools are you using. List all that apply (E.g. mbed-cli)

mbed cli 1.8.2

How is this defect reproduced ?

Using the following command on mbed-os-master package:
mbed test -v -n storage-kvstore-tdbstore-tests-tests-tdbstore-whitebox

@ciarmcom
Copy link
Member

ciarmcom commented Jan 8, 2021

Thank you for raising this detailed GitHub issue. I am now notifying our internal issue triagers.
Internal Jira reference: https://jira.arm.com/browse/IOTOSM-3183

@0xc0170
Copy link
Contributor

0xc0170 commented Jan 11, 2021

cc @ARMmbed/mbed-os-core

@deepak-shreshti
Copy link
Contributor Author

Hi @hugueskamba . Could you please help us in resolving this issue?

@hugueskamba
Copy link
Collaborator

Hi @hugueskamba . Could you please help us in resolving this issue?

Are you also testing with the ARM toolchain? If so, please share the scatter file. If not, please provide the content of the GCC linker file.

@deepak-shreshti
Copy link
Contributor Author

We have tested with ARM Toolchain. The results are slightly different when we use ARM Toolchain.
In "storage-kvstore-tdbstore-tests-TESTS-tdbstore-whitebox" test, there are two test cases. With ARM, "White box test" test case is passing and "Multiple set test" test case is failing.
Please find the test logs with ARM toolchain below:
mbed-os-master_ARM_FAIL.log

Please find the ARM scatter file below:
tmpm4knfyafg.zip

Please find the GCC linker script below:
tmpm4knfyafg.zip

@hugueskamba
Copy link
Collaborator

hugueskamba commented Jan 15, 2021

We have tested with ARM Toolchain. The results are slightly different when we use ARM Toolchain.
In "storage-kvstore-tdbstore-tests-TESTS-tdbstore-whitebox" test, there are two test cases. With ARM, "White box test" test case is passing and "Multiple set test" test case is failing.
Please find the test logs with ARM toolchain below:
mbed-os-master_ARM_FAIL.log

Please find the ARM scatter file below:
tmpm4knfyafg.zip

Please find the GCC linker script below:
tmpm4knfyafg.zip

I had a look at the scatter file and here are my observations:

  1. The shebang should be for Arm Compiler 6 (armclang) instead of
    previous versions (armcc)
    It should therefore be:
    #! armclang -E --target=arm-arm-none-eabi -x c -mcpu=cortex-m4. However, this would not affect the test resut.

  2. There was no ARM_LIB_HEAP section defined. This could be the cause of the failure. See https://www.keil.com/support/man/docs/ARMLINK/ARMLINK_pge1362065977713.htm

I amend the scatter file as follows. Can you please try it?

#! armclang -E --target=arm-arm-none-eabi -x c -mcpu=cortex-m4

#if !defined(MBED_APP_START)
  #define MBED_APP_START 0x00000000
#endif

#if !defined(MBED_APP_SIZE)
  #define MBED_APP_SIZE 0x00040000
#endif

#if !defined(MBED_RAM_START)
  #define MBED_RAM_START 0x20000000
  #define MBED_RAM_SIZE 0x6000
#endif

#if !defined(MBED_CONF_TARGET_BOOT_STACK_SIZE)
#  if defined(MBED_BOOT_STACK_SIZE)
#    define MBED_CONF_TARGET_BOOT_STACK_SIZE MBED_BOOT_STACK_SIZE
#  else
#    define MBED_CONF_TARGET_BOOT_STACK_SIZE 0x400
#  endif
#endif

#define Stack_Size MBED_CONF_TARGET_BOOT_STACK_SIZE

#define MBED_RAM0_START MBED_RAM_START
#define MBED_RAM0_SIZE 0x238

#define MBED_IRAM1_START (MBED_RAM0_START + MBED_RAM0_SIZE)
#define MBED_RAM1_SIZE (MBED_RAM_SIZE - MBED_RAM0_SIZE)

#define RAM_FIXED_SIZE   (MBED_CONF_TARGET_BOOT_STACK_SIZE + MBED_RAM0_SIZE)

; TMPM4KNA: 256 KB FLASH (0x40000) + 24 KB SRAM (0x6000)

LR_IROM1 MBED_APP_START MBED_APP_SIZE ; load region size_region {
  ER_IROM1 MBED_APP_START MBED_APP_SIZE {
   *.o (RESET, +First)
   *(InRoot$$Sections)
   .ANY (+RO)
  }

  RW_IRAM1 MBED_IRAM1_START MBED_RAM1_SIZE {
    .ANY (+RW, +ZI)
  }

  ARM_LIB_STACK (MBED_RAM_START + MBED_RAM_SIZE) EMPTY -Stack_Size { ; Stack region grows down
  }

  ARM_LIB_HEAP AlignExpr(+0, 16) EMPTY (MBED_RAM_SIZE - RAM_FIXED_SIZE - (AlignExpr(ImageLimit(RW_IRAM1), 16)) - MBED_IRAM1_START) { ; Heap region grows up
  }
}

@LDong-Arm
Copy link
Contributor

LDong-Arm commented Jan 15, 2021

It looks like allocation failures in storage are not checked sometimes. For example, uses of new in TDBStore.cpp and HeapBlockDevice.cpp.

I've created #14157 - I'm not sure if it's related to this issue or not. So it would be good to try @hugueskamba's suggestion first.

@deepak-shreshti
Copy link
Contributor Author

Hi @hugueskamba ,we tried with the updated scatter file provided by you, having two region model.
Results did not change. Same error log generated.
mbed-os-master_ARM_FAIL.log

@deepak-shreshti
Copy link
Contributor Author

Hi @hugueskamba, did you find anything that must be causing this failure?
Please do let us know if you anything related to this failure.
Thanks.

@hugueskamba
Copy link
Collaborator

Hi @hugueskamba, did you find anything that must be causing this failure?

Please do let us know if you anything related to this failure.

Thanks.

Hi,
I haven't found anything as it does not fail on any Mbed target currently supported.
Do all you need to add support for that board and raise a PR so we can see what you are working with and give ourselves a better chance at spotting what the problem with the board you are working with is.
Thank you
Hugues

@hugueskamba
Copy link
Collaborator

hugueskamba commented Jan 19, 2021

@deepak-shreshti
Looking at the failure when the ARM toolchain is used, it points at a write failure here.
You may want to look at how your hardware handles target specific calls made from the mbed::TDBStore::reset() function.

@LDong-Arm
Copy link
Contributor

LDong-Arm commented Jan 19, 2021

@deepak-shreshti
Could you please try https://github.com/LDong-Arm/mbed-os/pull/2/files and see if the test skips correctly? I think there might be some issues with allocation checks. Hmm, C++ allocation failure should lead to a crash, so it's probably not the problem...

From your log

[1610724298.48][CONN][RXD] >>> Running case #2: 'TDBStore: Multiple set test'...
[1610724298.55][CONN][INF] found KV pair in stream: {{__testcase_start;TDBStore: Multiple set test}}, queued...
[1610724298.61][CONN][RXD] Elapsed time for initial init is 28 ms
mbedgt: :387::FAIL: Expected 0 Was 284
[1610724298.67][CONN][RXD] :387::FAIL: Expected 0 Was 284

So what this means is, the assertion that failed is line 387 of the test (i.e. storage/kvstore/tdbstore/tests/TESTS/tdbstore/whitebox/main.cpp). Just as @hugueskamba suggested, if would be good if you could debug into the reason reset() failed, by printing some messages for example.

P.S.
It's preferable to run applications and tests with the bare metal profile on memory-constrained targets. You can try to add --app-config TESTS/configs/baremetal.json to your regular mbed test ... command.

@LDong-Arm
Copy link
Contributor

LDong-Arm commented Jan 19, 2021

In the current master package, flash simulator BD support has been removed and heap BD is used.
We are suspecting whether less ram is causing this failure or what.

The removal of flash simulator frees up some RAM, which is likely why the test now runs (and fails) instead of getting skipped like before. (The heap BD is always there, and the flash simulator used to be a thin layer on top.)

@evedon
Copy link
Contributor

evedon commented Jan 20, 2021

I hope that the comments were useful in narrowing down your issue.
Given that your board is not supported, we are unable to devote more time on this issue.

As an open source project, we welcome fixes to Mbed OS, so we have opened this issue up for community contribution.

@deepak-shreshti
Copy link
Contributor Author

We tried debugging this issue but could not arrive at some concrete conclusions.

In mbed::TDBStore::reset() function call, it calls check_erase_before_write()(line 1199).
For variable area = 0, this call returns success.
But when variable area = 1, check_erase_before_write() returns a non-zero value and reset() call fails.

After following the program flow for area = 1, we found check_erase_before_write() was calling erase_erase_unit() which gave call to _buff_bd->program()(program() function call of BufferedBlockDevice class).
In _buff_bd->program() function, it calls _bd->program()(line 252 of BufferedBlockDevice.cpp), which was returning error.
_bd->program() seems to belong to BlockDevice class.

We feel memory issue is restricting things because of which test case is failing.
Our Target combines 24K ram and 256K rom.
Other Toshiba targets which supports mbed os 6 have greater than 64K memory on which this test is passing with similar linker and scatter files.

We could not debug the real reason why it is failing in this 24K target.
Could you tell us what amount of total ram memory is this test using or any method to find the same during runtime?
Also if possible, could you tell us some ways to reduce the memory this test is consuming(maybe some allocation data size or something) and then check if it is passing or not!!

@hugueskamba
Copy link
Collaborator

We tried debugging this issue but could not arrive at some concrete conclusions.

In mbed::TDBStore::reset() function call, it calls check_erase_before_write()(line 1199).
For variable area = 0, this call returns success.
But when variable area = 1, check_erase_before_write() returns a non-zero value and reset() call fails.

After following the program flow for area = 1, we found check_erase_before_write() was calling erase_erase_unit() which gave call to _buff_bd->program()(program() function call of BufferedBlockDevice class).
In _buff_bd->program() function, it calls _bd->program()(line 252 of BufferedBlockDevice.cpp), which was returning error.
_bd->program() seems to belong to BlockDevice class.

We feel memory issue is restricting things because of which test case is failing.
Our Target combines 24K ram and 256K rom.
Other Toshiba targets which supports mbed os 6 have greater than 64K memory on which this test is passing with similar linker and scatter files.

We could not debug the real reason why it is failing in this 24K target.
Could you tell us what amount of total ram memory is this test using or any method to find the same during runtime?
Also if possible, could you tell us some ways to reduce the memory this test is consuming(maybe some allocation data size or something) and then check if it is passing or not!!

Hi @deepak-shreshti
Have you tried running the test with the baremetal profile as suggested above by Lingkai? If so, does the test pass or fail?

@deepak-shreshti
Copy link
Contributor Author

deepak-shreshti commented Feb 6, 2021

Hi Team,

Have you tried running the test with the baremetal profile as suggested above by Lingkai? If so, does the test pass or fail?

Yes we tried with baremetal and received same error.

We analyzed further and could PASS the testcase after reducing memory allocation for test case execution.
We found below details on further analysis:
We pointed out in previous note that , API _buff_bd->program() function, invokes _bd->program()(line 252 of BufferedBlockDevice.cpp), which is returning error.
_bd->program() calls HeapBlockDevice::program in storage/blockdevice/source/HeapBlockDevice.cpp
In function HeapBlockDevice::program malloc fails and it returns BD_ERROR_DEVICE_ERROR. (Line 159, _blocks[hi] = (uint8_t *)malloc(_erase_size);)

  1. So we tried running test case “White box test" by reducing erase_size from 4096 to 2048 in bd_params[] table in storage/kvstore/tdbstore/tests/TESTS/tdbstore/whitebox/main.cpp.
    Test case passed with below changes
@@ -89,8 +89,8 @@ static const char *const res_val2  = "This should surely not be saved as the res
static void white_box_test()
{
     bd_params_t bd_params[] = {
  
 -        {8192,     1,  16, 4096}, // Standard
-         {4096 * 4, 1,   1, 4096}, // K82F like
+        {8192,     1,  16, 2048}, // Standard
+        {4096 * 4, 1,   1, 2048}, // K82F like
          {8192,    64, 128, 2048}, // Awkward read and program sizes, both larger than header size
          {2048,     1,   4,  128}, // Small sector and total sizes
          {0,        0,   0,    0}, // Place holder for real non volatile device (if defined)
  1. In similar way, we reduced erase size and data size for “Multiple set test” test case. Test case is passed. Changes are below
@@ -57,7 +57,7 @@ FlashIAPBlockDevice flash_bd(0xF0000, 0x10000);
#else
#define USE_HEAP_BD 1
  -HeapBlockDevice flash_bd(8 * 4096, 1, 1, 4096);
  +HeapBlockDevice flash_bd(8 * 4096, 1, 1, 2048);
#endif

@@ -328,9 +328,9 @@ static void multi_set_test()
     char *key;
     uint8_t *get_buf, *set_buf;
     size_t key_size = 32;
-    size_t data_size = 512;
+    size_t data_size = 128; //512;
     size_t num_keys = 16;
-    size_t set_iters = 3;
+    size_t set_iters = 1;//3;
     size_t actual_data_size;
     int result;
     mbed::Timer timer;

Test cases are passed with above changes for both GCC and ARM.
Since our target support only 24KB RAM we will have reduce memory required for test case.

Please let us know your feedback.

Best Regards
Deepak

@hugueskamba
Copy link
Collaborator

Test cases are passed with above changes for both GCC and ARM.
Since our target support only 24KB RAM we will have reduce memory required for test case.

Please let us know your feedback.

Best Regards
Deepak

Hi Deepak,
Thank you.
@LDong-Arm, there is a function to get the erase size (mbed::BlockDevice::get_erase_size()). Is there a reason it is hardcoding instead?

@evedon
Copy link
Contributor

evedon commented Feb 8, 2021

@LDong-Arm, there is a function to get the erase size (mbed::BlockDevice::get_erase_size()). Is there a reason it is hardcoding instead?

A block device can have a multitude of different storage types underneath. The erase size of the actual storage cannot be known; it has to be configured or passed in the constructor.

@deepak-shreshti Your analysis was correct. Since this commit the test is no longer skipped as previously.
This comment suggests that an erase failure of the flash simulator bd would indicate that there is not enough memory on the target. Could you revert this commit and let us know if the test is correctly skipped?

Now, it would be useful to run this test on targets with less RAM, like yours. We could defiine a configuration parameter to run only a subset of the configurations defined in bd_params for constrained targets.

@deepak-shreshti
Copy link
Contributor Author

Dear Evelyne Donnaes ,

We reverted the commit you suggested and could PASS the test cases in GCC and ARM.
The test case is skipped when we revert this commit.

Best Regards
Deepak

@evedon
Copy link
Contributor

evedon commented Feb 9, 2021

@deepak-shreshti Thanks for confirming. Since you can validate the changes on your target, can I suggest that you raise a PR with the commit mostly reverted? This will ensure that the test is skipped for constrained targets.

Thank you for the contribution!

@LDong-Arm
Copy link
Contributor

Hi @deepak-shreshti,

I've created #14483 to fix this test. It now uses HeapBlockDevice::program() (instead of erase()) to allocate each block from the heap on the fly. Previously, FlashSimBlockDevice::erase() used the same HeapBlockDevice::program() so the idea is the same. But we avoid bringing back the flash simulation layer which is unnecessary.

Please let us know if there's any problem. Thanks again for raising this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants