kernel: thread: k_thread_foreach_unlocked: Implementation of unlocked version of k_thread_foreach #20937

rakons · 2019-11-22T11:34:07Z

The k_thread_foreach function is used in the code for debugging.
It is used to print thread names and analyze used stacks size.
Stacks analysis takes too much time for spin locks and this
may lead to bluetooth stack instability.
In second hand - it seems not to make any sense to
create or destroy threads from the interrupt - so scheduller lock
is enough here.

I had a problems with bt connection when using stack analysis.
This fixes the problem.
See: nrfconnect/sdk-nrf#1529

andrewboie · 2019-11-22T18:08:29Z

kernel/thread.c

@@ -47,11 +46,11 @@ void k_thread_foreach(k_thread_user_cb_t user_cb, void *user_data)
 	 * The indirect ways are through calling k_thread_create and
 	 * k_thread_abort from user_cb.
 	 */
-	key = k_spin_lock(&lock);
+	k_sched_lock();


This seems like it would break SMP unfortunately if another CPU is working on _kernel.threads at the same time.

This function is only used to read and print some debug information. It is lesser evil if we print non complete debug information than if we break bt communication during this printing.

Another way to solve it may be by the usage of the mutex, but then it has to be placed inside functions for creating and destroying the thread also.

@andrewboie I can see from the code, that threads may not be created in ISR, but thread aborting seems to consider a situation when thread is removed from the interrupt (why?).

Any suggestion how to solve this situation without locking the whole system for the time long enough to break the bt communication?

andyross

Yeah, this isn't going to work as it stands. I agree k_thread_foreach() is a really problematic API, but the global locking is required for correctness. Also note that k_sched_lock() protects against preemption by other cooperative threads, but not by metairqs (which are threads, not interrupt contexts, and can legally create/abort threads). Can't you just modify the debugging framework for your app specifically so that it skips the loop here? Or just arrange things such that it never needs to happen during bluetooth use?

Basically, I think the answer has to be "you can't use k_thread_foreach() in latency-critical environments". What's the particular code that's causing you problems?

rakons · 2019-11-26T09:10:26Z

@andyross

What's the particular code that's causing you problems?

It is stack analysis that took to much time. But note that this is basically the same code that is used inside shell. So calling the stack analysis from shell a few times should also break bt communication.

Maybe we should rethink the queue implementation - I can leave without correctness here. What is the risk:

The task is added when we are analysing the head in threads field - then we would just not print the data about the new thread.
The task that is actually analysed is removed during that process - we would hit NULL in next field and just finish the analysis - I can live also with that - we would just get incomplete analysis.

Maybe I can just implement something like k_thread_foreach_nocorrect, or k_thread_foreach_nolock and note that the correctness is not guaranteed if thread is added or removed during the process. Most important thing is "do not harm", especially when just printing some statistics.

rakons · 2019-11-27T13:22:08Z

New try - new version - I did not changed the original function - just added a function with limited locking.

zephyrbot · 2019-11-27T13:36:41Z

All checks are passing now.

Tip: The bot edits this comment instead of posting a new one, so you can check the comment's history to see earlier messages.

rakons · 2019-11-28T10:23:01Z

@andyross @andrewboie Could you revisit and check if this (new) approach may be implemented?

rakons · 2019-12-04T14:26:13Z

Should I create another PR? Anyone there?

andyross

Sorry, I missed this update. I don't see anything awful about this. It's simple enough and solves a real problem.

andyross · 2019-12-11T16:58:06Z

include/kernel.h

+ * all the threads. If currently processed thread is removed in the same time,
+ * this @c foreach processing would end too early.
+ */
+extern void k_thread_foreach_nolock(


Naming nitpick: to my eyes, "unlocked" or "unsynchronized" are more informative than "nolock". It does, in fact, do some locking internally.

agree, we have been using _unlocked in the kernel (internally), would be great to keep this consistent.

andyross · 2019-12-11T17:02:39Z

include/kernel.h

+ * queue elements. It unlocks it during user callback function processing.
+ * If thread is created or terminated this function may not process
+ * all the threads. If currently processed thread is removed in the same time,
+ * this @c foreach processing would end too early.


Probably worth a stronger disclaimer: any creation or aborting of threads during execution of this function may result in threads being "missed" during execution, even ones not themselves created or destroyed.

Only deletion may lead to miss threads after the one deleted.
I will see if creation follows following scheme:

In the new thread structure, set next handler to the current head

Replace the head with new thread

If it is that way - (first set next, then set head), then there is only a possibility that the newly created thread would be missed.

This is how it is created:

#ifdef CONFIG_THREAD_MONITOR new_thread->entry.pEntry = entry; new_thread->entry.parameter1 = p1; new_thread->entry.parameter2 = p2; new_thread->entry.parameter3 = p3; k_spinlock_key_t key = k_spin_lock(&lock); new_thread->next_thread = _kernel.threads; _kernel.threads = new_thread; k_spin_unlock(&lock, key); #endif

But going into z_thread_monitor_exit shows that next_thread is not set to NULL:

void z_thread_monitor_exit(struct k_thread *thread) { k_spinlock_key_t key = k_spin_lock(&lock); if (thread == _kernel.threads) { _kernel.threads = _kernel.threads->next_thread; } else { struct k_thread *prev_thread; prev_thread = _kernel.threads; while ((prev_thread != NULL) && (thread != prev_thread->next_thread)) { prev_thread = prev_thread->next_thread; } if (prev_thread != NULL) { prev_thread->next_thread = thread->next_thread; } } k_spin_unlock(&lock, key); }

Well - this seems not safe - if we delete an element that we are currently processing, and the next one - we may finish analysing some data that was removed (if thread structure was placed on the heap). Do you thing we can safetly modify the z_monitor_exit with adding following line just before k_spin_unlock:

thread->next_thread = NULL;

On second thought - we still would have an issue if we delete a task structure that we are analysing. So no problem if it is already removed from task list. There may be an issue if the RAM space where it is was reused.

Just as a reference, I have looked how it is managed in FreeRTOS. When the task is deleted, they are not doing it technically in the place of call, they are adding it into a special list for "to be deleted" tasks. It is destroyed in the idle thread. Not bad idea.

@andyross can you take a look at the comments above from @rakons?

It was just a suggestion. I'm not rejecting the docs as written, I'd just like to see a clearer warning that this API is going to misbehave if the user isn't very careful.

@andyross @carlescufi What I believe we are missing here is the fact, that it is quite safe as long as TCB is not placed in heap and deleted after aborted - then we will be in trouble.
The problem is that the API assumes that right after we exit k_thread_abort the thread is already disconnected from the system and we can release TCB. I am just trying to discuss if we should not protect thread aborting somehow - it is destructible part of the system. If we add mutex to protect k_thread_abort function and then use it wherever we need the TCB to exist for a little longer we would be on safe side.

nashif

rename API to k_thread_foreach_unlocked

Implement thread foreach processing with limited locking to allow threads processing that may take more time but allows missing some threads processing when the thread list is modified. Signed-off-by: Radoslaw Koppel <radoslaw.koppel@nordicsemi.no>

rakons · 2019-12-19T14:07:23Z

@nashif

rename API to k_thread_foreach_unlocked

Done

rakons · 2019-12-20T09:07:41Z

@andyross

Documentation changed.

rakons requested review from andrewboie and andyross as code owners November 22, 2019 11:34

zephyrbot added the area: Kernel label Nov 22, 2019

rakons added the Trivial Changes that can be reviewed by anyone, i.e. doc changes, minor build system tweaks, etc. label Nov 22, 2019

rakons requested review from carlescufi and kapi-no November 22, 2019 11:59

kapi-no approved these changes Nov 22, 2019

View reviewed changes

ioannisg added this to the v2.2.0 milestone Nov 22, 2019

andrewboie reviewed Nov 22, 2019

View reviewed changes

andyross requested changes Nov 25, 2019

View reviewed changes

rakons force-pushed the k_thread_foreach_nospinlock branch from 9549479 to 9b1826d Compare November 27, 2019 13:21

rakons removed the Trivial Changes that can be reviewed by anyone, i.e. doc changes, minor build system tweaks, etc. label Nov 27, 2019

zephyrbot added the area: API Changes to public APIs label Nov 27, 2019

rakons force-pushed the k_thread_foreach_nospinlock branch from 9b1826d to f9d0f5c Compare November 27, 2019 13:58

joerchan requested review from andrewboie and andyross December 11, 2019 10:59

andyross approved these changes Dec 11, 2019

View reviewed changes

nashif requested changes Dec 11, 2019

View reviewed changes

rakons mentioned this pull request Dec 18, 2019

Option for safe k_thread_abort #21484

Closed

rakons force-pushed the k_thread_foreach_nospinlock branch from f9d0f5c to 56ab584 Compare December 19, 2019 14:05

rakons changed the title ~~kernel: thread: k_thread_foreach: Replace spin_lock by sched_lock~~ kernel: thread: k_thread_foreach_unlocked: Implementation of unlocked version of k_thread_foreach Dec 19, 2019

nashif approved these changes Dec 21, 2019

View reviewed changes

nashif merged commit 2c529ce into zephyrproject-rtos:master Dec 21, 2019

rakons mentioned this pull request Dec 23, 2019

Stack analyzer nrfconnect/sdk-nrf#1529

Closed

jharris-intel mentioned this pull request Feb 9, 2021

kernel threads and kernel stacks deadlock in many scenarios #32145

Closed

peter-mitsis mentioned this pull request Mar 8, 2022

k_thread_foreach api breaks real time semantics #13318

Closed

rakons deleted the k_thread_foreach_nospinlock branch July 28, 2023 09:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel: thread: k_thread_foreach_unlocked: Implementation of unlocked version of k_thread_foreach #20937

kernel: thread: k_thread_foreach_unlocked: Implementation of unlocked version of k_thread_foreach #20937

rakons commented Nov 22, 2019

andrewboie Nov 22, 2019

rakons Nov 25, 2019

rakons Nov 25, 2019

andyross left a comment

rakons commented Nov 26, 2019

rakons commented Nov 27, 2019

zephyrbot commented Nov 27, 2019 •

edited

Loading

rakons commented Nov 28, 2019

rakons commented Dec 4, 2019

andyross left a comment

andyross Dec 11, 2019

nashif Dec 11, 2019

andyross Dec 11, 2019

rakons Dec 12, 2019

rakons Dec 12, 2019

rakons Dec 12, 2019 •

edited

Loading

rakons Dec 12, 2019

carlescufi Dec 16, 2019

andyross Dec 16, 2019

rakons Dec 17, 2019

nashif left a comment •

edited

Loading

rakons commented Dec 19, 2019

rakons commented Dec 20, 2019

kernel: thread: k_thread_foreach_unlocked: Implementation of unlocked version of k_thread_foreach #20937

kernel: thread: k_thread_foreach_unlocked: Implementation of unlocked version of k_thread_foreach #20937

Conversation

rakons commented Nov 22, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyross left a comment

Choose a reason for hiding this comment

rakons commented Nov 26, 2019

rakons commented Nov 27, 2019

zephyrbot commented Nov 27, 2019 • edited Loading

rakons commented Nov 28, 2019

rakons commented Dec 4, 2019

andyross left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rakons Dec 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nashif left a comment • edited Loading

Choose a reason for hiding this comment

rakons commented Dec 19, 2019

rakons commented Dec 20, 2019

zephyrbot commented Nov 27, 2019 •

edited

Loading

rakons Dec 12, 2019 •

edited

Loading

nashif left a comment •

edited

Loading