-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel: thread: k_thread_foreach_unlocked: Implementation of unlocked version of k_thread_foreach #20937
kernel: thread: k_thread_foreach_unlocked: Implementation of unlocked version of k_thread_foreach #20937
Conversation
kernel/thread.c
Outdated
@@ -47,11 +46,11 @@ void k_thread_foreach(k_thread_user_cb_t user_cb, void *user_data) | |||
* The indirect ways are through calling k_thread_create and | |||
* k_thread_abort from user_cb. | |||
*/ | |||
key = k_spin_lock(&lock); | |||
k_sched_lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it would break SMP unfortunately if another CPU is working on _kernel.threads at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is only used to read and print some debug information. It is lesser evil if we print non complete debug information than if we break bt communication during this printing.
Another way to solve it may be by the usage of the mutex, but then it has to be placed inside functions for creating and destroying the thread also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewboie I can see from the code, that threads may not be created in ISR, but thread aborting seems to consider a situation when thread is removed from the interrupt (why?).
Any suggestion how to solve this situation without locking the whole system for the time long enough to break the bt communication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this isn't going to work as it stands. I agree k_thread_foreach() is a really problematic API, but the global locking is required for correctness. Also note that k_sched_lock() protects against preemption by other cooperative threads, but not by metairqs (which are threads, not interrupt contexts, and can legally create/abort threads). Can't you just modify the debugging framework for your app specifically so that it skips the loop here? Or just arrange things such that it never needs to happen during bluetooth use?
Basically, I think the answer has to be "you can't use k_thread_foreach() in latency-critical environments". What's the particular code that's causing you problems?
It is stack analysis that took to much time. But note that this is basically the same code that is used inside shell. So calling the stack analysis from shell a few times should also break bt communication. Maybe we should rethink the queue implementation - I can leave without correctness here. What is the risk:
Maybe I can just implement something like |
9549479
to
9b1826d
Compare
New try - new version - I did not changed the original function - just added a function with limited locking. |
All checks are passing now. Tip: The bot edits this comment instead of posting a new one, so you can check the comment's history to see earlier messages. |
9b1826d
to
f9d0f5c
Compare
@andyross @andrewboie Could you revisit and check if this (new) approach may be implemented? |
Should I create another PR? Anyone there? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed this update. I don't see anything awful about this. It's simple enough and solves a real problem.
include/kernel.h
Outdated
* all the threads. If currently processed thread is removed in the same time, | ||
* this @c foreach processing would end too early. | ||
*/ | ||
extern void k_thread_foreach_nolock( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming nitpick: to my eyes, "unlocked" or "unsynchronized" are more informative than "nolock". It does, in fact, do some locking internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, we have been using _unlocked in the kernel (internally), would be great to keep this consistent.
include/kernel.h
Outdated
* queue elements. It unlocks it during user callback function processing. | ||
* If thread is created or terminated this function may not process | ||
* all the threads. If currently processed thread is removed in the same time, | ||
* this @c foreach processing would end too early. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth a stronger disclaimer: any creation or aborting of threads during execution of this function may result in threads being "missed" during execution, even ones not themselves created or destroyed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only deletion may lead to miss threads after the one deleted.
I will see if creation follows following scheme:
- In the new thread structure, set next handler to the current head
- Replace the head with new thread
If it is that way - (first set next, then set head), then there is only a possibility that the newly created thread would be missed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how it is created:
#ifdef CONFIG_THREAD_MONITOR
new_thread->entry.pEntry = entry;
new_thread->entry.parameter1 = p1;
new_thread->entry.parameter2 = p2;
new_thread->entry.parameter3 = p3;
k_spinlock_key_t key = k_spin_lock(&lock);
new_thread->next_thread = _kernel.threads;
_kernel.threads = new_thread;
k_spin_unlock(&lock, key);
#endif
But going into z_thread_monitor_exit shows that next_thread is not set to NULL:
void z_thread_monitor_exit(struct k_thread *thread)
{
k_spinlock_key_t key = k_spin_lock(&lock);
if (thread == _kernel.threads) {
_kernel.threads = _kernel.threads->next_thread;
} else {
struct k_thread *prev_thread;
prev_thread = _kernel.threads;
while ((prev_thread != NULL) &&
(thread != prev_thread->next_thread)) {
prev_thread = prev_thread->next_thread;
}
if (prev_thread != NULL) {
prev_thread->next_thread = thread->next_thread;
}
}
k_spin_unlock(&lock, key);
}
Well - this seems not safe - if we delete an element that we are currently processing, and the next one - we may finish analysing some data that was removed (if thread structure was placed on the heap). Do you thing we can safetly modify the z_monitor_exit with adding following line just before k_spin_unlock:
thread->next_thread = NULL;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought - we still would have an issue if we delete a task structure that we are analysing. So no problem if it is already removed from task list. There may be an issue if the RAM space where it is was reused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a reference, I have looked how it is managed in FreeRTOS. When the task is deleted, they are not doing it technically in the place of call, they are adding it into a special list for "to be deleted" tasks. It is destroyed in the idle thread. Not bad idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just a suggestion. I'm not rejecting the docs as written, I'd just like to see a clearer warning that this API is going to misbehave if the user isn't very careful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andyross @carlescufi What I believe we are missing here is the fact, that it is quite safe as long as TCB is not placed in heap and deleted after aborted - then we will be in trouble.
The problem is that the API assumes that right after we exit k_thread_abort the thread is already disconnected from the system and we can release TCB. I am just trying to discuss if we should not protect thread aborting somehow - it is destructible part of the system. If we add mutex to protect k_thread_abort function and then use it wherever we need the TCB to exist for a little longer we would be on safe side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename API to k_thread_foreach_unlocked
Implement thread foreach processing with limited locking to allow threads processing that may take more time but allows missing some threads processing when the thread list is modified. Signed-off-by: Radoslaw Koppel <radoslaw.koppel@nordicsemi.no>
f9d0f5c
to
56ab584
Compare
Done |
Documentation changed. |
The k_thread_foreach function is used in the code for debugging.
It is used to print thread names and analyze used stacks size.
Stacks analysis takes too much time for spin locks and this
may lead to bluetooth stack instability.
In second hand - it seems not to make any sense to
create or destroy threads from the interrupt - so scheduller lock
is enough here.
I had a problems with bt connection when using stack analysis.
This fixes the problem.
See: nrfconnect/sdk-nrf#1529