Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CThunk for CM7 #2522

Merged
merged 1 commit into from
Sep 14, 2016
Merged

Add CThunk for CM7 #2522

merged 1 commit into from
Sep 14, 2016

Conversation

svastm
Copy link
Contributor

@svastm svastm commented Aug 23, 2016

The asynchronous versions of the communication interfaces (serial, i2c and spi) need the CThunk class which is not yet implemented for cortex-M7.

My first idea was to use the same OP code than CM3 and CM4, ldm.w pc,{r0,r1,r2,pc}, because they use ARMv7-M and thumb2 like the CM7. But this OP code give me an hard fault and I don't understand why.

I poorly used the code from CM0, CM0+ which looks cleaner but a bit longer. Then I was able to run the following test code on ARM and IAR but it still fail on GCC. I tried with a NUCLEO_F746ZG and a DISCO_F746NG.

#include "mbed.h"
#include "CThunk.h"

class DummyClass {
  public: 
    void method() {
        printf("method called => attribute: %d\n", attribute);
    }

  public:
    int attribute;
};

int main() {
    printf("Begin\n");

    DummyClass dm;
    CThunk<DummyClass> ct = CThunk<DummyClass>(&dm, &DummyClass::method);

    dm.attribute = 0;
    ct.call();

    dm.attribute += 1;
    ct.call();

    printf("End\n");

    while (1) {

    }
}

I'm a bit stuck there and i can't find something in the ARMv7-M manual which say to not use the ldm.w pc,{r0,r1,r2,pc}.

Someone has already looked at this ?

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 23, 2016

My first idea was to use the same OP code than CM3 and CM4, ldm.w pc,{r0,r1,r2,pc}, because they use ARMv7-M and thumb2 like the CM7. But this OP code give me an hard fault and I don't understand why.

cc @c1728p9

@svastm
Copy link
Contributor Author

svastm commented Aug 29, 2016

Then in this manual p 3-35, LDM PC, xxx is forbidden on CM7. So we can't use the short code.

@svastm
Copy link
Contributor Author

svastm commented Aug 29, 2016

Rebased with a fully working version. GCC was failing because this toolchain define the __thumb2__ macro. So it was still using the short trampoline.

@bogdanm
Copy link
Contributor

bogdanm commented Sep 9, 2016

Cc @meriac for any suggestions. According to the manual for Cortex-M4, Cortex-M3 and Cortex-M7, LDM can't accept pc as the destination, and yet this seems to work on M4 and M3. I don't know what happens in M7.

@c1728p9
Copy link
Contributor

c1728p9 commented Sep 9, 2016

Since the documentation for M3 and M4 don't allow the PC to be used to as the address to load from in a LDM instruction should all the targets be using the M0 implementation? @meriac, what do you think?

@0xc0170
Copy link
Contributor

0xc0170 commented Sep 12, 2016

Since the documentation for M3 and M4 don't allow the PC to be used to as the address to load from in a LDM instruction should all the targets be using the M0 implementation? @meriac, what do you think?

Looks like that restriction was put into practise for m7 cores (hardfaults). If we don't find a better alternative (like not touching anything just scratch registers), we might just use thumb instruction for all cores. I can't think of anything better at the moment. You guys?

@geky
Copy link
Contributor

geky commented Sep 12, 2016

Was toying around with this a while back, here's a small thumb trampoline that avoids undefined instructions:

0xa002 // add r0, pc, #8
0xc803 // ldm r0, {r0, r1}
0x4708 // bx r1
0x0000 // make sure context and callback are aligned
context
callback

EDIT: I realized the above was a 1-argument thunk. The CThunk class needs a 3-argument thunk. An alternative is to modify the CThunk class to emit a single-argument static function (example), which may make writing efficient thunks easier in the future.

0xa002 // add r0, pc, #8
0xc80f // ldm r0, {r0, r1, r2, r3}
0x4718 // bx r3
0x0000 // make sure context and callback are aligned
context
instance
callback
trampoline

It's kinda sad to see the ldm.w pc,{r0,r1,r2,pc} go (it was a very neat trick), but it is probably for the better if it was undefined behaviour.

@svastm
Copy link
Contributor Author

svastm commented Sep 13, 2016

This solution looks great to me. I don't think we can do shorter. Because it use only thumb instruction set, we could also use it for CM0/CM0+.
Should i rebase it ?

@bogdanm
Copy link
Contributor

bogdanm commented Sep 13, 2016

@meriac @AlessandroA : if you have something against this, speak now :)

@AlessandroA
Copy link
Contributor

The only reason why I could imagine a different behavior between the cores is because of this condition not passing (from ARM DDI 0403E.b (ARMv7M ARM) page A7-248):

n = UInt(Rn); registers = P:M:'0':register_list; wback = (W == '1');
if n == 15 || BitCount(registers) < 2 || (P == '1' && M == '1') then UNPREDICTABLE;
if registers<15> == '1' && InITBlock() && !LastInITBlock() then UNPREDICTABLE;

Where n == 15 means pc. The unpredictable behavior is probably different between M3/M4 and M7.

The following instructions would be an alternative to what @geky proposed:

add r0, pc, #8
ldm r0, {r0, r1, r2, pc}

You don't need to take pc out of the register list, that is allowed and implicitly turns into a branch. The footprint is the same though, and you still need alignment.

@0xc0170
Copy link
Contributor

0xc0170 commented Sep 13, 2016

The above code snippet looks good, saves 4 bytes :) @AlessandroA thanks for looking at this. Same footprint, but we might want unified version.

@svastm Can you test locally please and rebase?

@bogdanm
Copy link
Contributor

bogdanm commented Sep 13, 2016

Thanks @AlessandroA! LGTM.

- Add support of cortex-M7 for cthunk.
- Change the cthunk trampoline implementation to safer and quicker
solutions:
 * thumb2, the behaviour was undefined. new implementation use now 2
instructions
 * thumb, The new implementation use 3 instructions instead of 6.
@svastm
Copy link
Contributor Author

svastm commented Sep 13, 2016

I rebase with the new implementations thanks to @geky and @AlessandroA.

I kept the geky solution for thumb implementation because the ldm thumb instruction can use only low register.
Also, i remove the __thumb2__ macro check because it was only defined by GCC_ARM.
I tested with a NUCLEO_F091RC, NUCLEO_F303RE and NUCLEO_F746ZG. For ARM, GCC_ARM and IAR it's ok.

For Cortex-A9 i switched to thumb2 solution but I don't have a Cortex-A9 so I can't test it.

@geky
Copy link
Contributor

geky commented Sep 13, 2016

Tested locally with the Cortex-A9 RZ_A1H, looks good to me: 👍

Begin
method called => attribute: 0
method called => attribute: 1
End

@0xc0170
Copy link
Contributor

0xc0170 commented Sep 13, 2016

/morph test

@mbed-bot
Copy link

Result: SUCCESS

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 850

All builds and test passed!

@0xc0170 0xc0170 merged commit e014b0f into ARMmbed:master Sep 14, 2016
@svastm svastm deleted the cthunk_cm7 branch September 16, 2016 09:27
aisair pushed a commit to aisair/mbed that referenced this pull request Apr 30, 2024
Ports for Upcoming Targets

2669: Added u-blox C029 target ARMmbed/mbed-os#2669
2707: [EFM32] Add IAR support for remaining Silicon Labs targets ARMmbed/mbed-os#2707
2819: MultiTech xDot platform support - 09.26.2016 ARMmbed/mbed-os#2819
2827: include MultiTech xDot in mbed 5 releases ARMmbed/mbed-os#2827

Fixes and Changes

2522: Add CThunk for CM7 ARMmbed/mbed-os#2522
2518: Enable uvisor on Beetle ARMmbed/mbed-os#2518
2571: STM32F7 - Add asynchronous serial ARMmbed/mbed-os#2571
2616: STM32F3xx - Add Serial Flow Control pins + enable it ARMmbed/mbed-os#2616
2619: NUCLEO_L152RE - Add Serial Flow Control ARMmbed/mbed-os#2619
2620: NUCLEO_F429ZI - Add SERIAL_FC macro ARMmbed/mbed-os#2620
2666: [EFM32] Microsecond ticker optimization ARMmbed/mbed-os#2666
2681: STM32F0xx - Add support of ADC internal channels ARMmbed/mbed-os#2681
2687: [NRF5] Add fs_data symbol in data secton for gcc ARMmbed/mbed-os#2687
2696: Add device_has to all nrf51 devices ARMmbed/mbed-os#2696
2703: TARGET_NRF5: Changed 'serial_baud' implementation to support special baud rates. ARMmbed/mbed-os#2703
2704: DISCO_L476VG: add SPI nicknames ARMmbed/mbed-os#2704
2723: KSDK serial_api.c: Fix assertion error for ParityEven ARMmbed/mbed-os#2723
2463: [STM32L0] Add asynchronous serial ARMmbed/mbed-os#2463
2572: Fix STM32F407VG target name and LPC11U6X linker errors ARMmbed/mbed-os#2572
2698: DELTA_DFBM_NQ620 target ARMmbed/mbed-os#2698
2542: Dev spi asynch stm32f4 ARMmbed/mbed-os#2542
2650: STM32F3 - Add low power timer ARMmbed/mbed-os#2650
2415: [STM32F0] Add asynchronous serial ARMmbed/mbed-os#2415
2585: Added support for ADC only pins in LPC43xx ARMmbed/mbed-os#2585
2622: [STM32F4] Add asynchronous I2C ARMmbed/mbed-os#2622
2719: Updated ARM linker scripts for Kinetis platforms that use SDK 2.0 ARMmbed/mbed-os#2719
2728: Added ethernet and enabled IPV4 feature for the EVK-ODIN-W2/C029 target ARMmbed/mbed-os#2728
2747: [LPC11U68] Fix pin interrupt select offset ARMmbed/mbed-os#2747
2751: STM32L0xx - Add Serial Flow Control ARMmbed/mbed-os#2751
2753: [NUCLEO_F767ZI] Add CAN capability ARMmbed/mbed-os#2753
2759: STM32F0 - Add low power timer ARMmbed/mbed-os#2759
2763: STM32L1 - Add low power timer ARMmbed/mbed-os#2763
2764: STM32L4 - Add low power timer ARMmbed/mbed-os#2764
2771: STM32L4 - Update deepsleep implementation ARMmbed/mbed-os#2771
2775: Update KSDK SDHC driver for K64F & K66F ARMmbed/mbed-os#2775
2792: [NUCLEO_F303ZE] MBED-OS5 capability ARMmbed/mbed-os#2792
2762: STM32L0 - Add low power timer ARMmbed/mbed-os#2762
2761: STM32F7 - Add low power timer ARMmbed/mbed-os#2761
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants