Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel: thread: k_thread_foreach_unlocked: Implementation of unlocked version of k_thread_foreach #20937

Merged

Conversation

rakons
Copy link
Collaborator

@rakons rakons commented Nov 22, 2019

The k_thread_foreach function is used in the code for debugging.
It is used to print thread names and analyze used stacks size.
Stacks analysis takes too much time for spin locks and this
may lead to bluetooth stack instability.
In second hand - it seems not to make any sense to
create or destroy threads from the interrupt - so scheduller lock
is enough here.

I had a problems with bt connection when using stack analysis.
This fixes the problem.
See: nrfconnect/sdk-nrf#1529

@rakons rakons added the Trivial Changes that can be reviewed by anyone, i.e. doc changes, minor build system tweaks, etc. label Nov 22, 2019
@ioannisg ioannisg added this to the v2.2.0 milestone Nov 22, 2019
kernel/thread.c Outdated
@@ -47,11 +46,11 @@ void k_thread_foreach(k_thread_user_cb_t user_cb, void *user_data)
* The indirect ways are through calling k_thread_create and
* k_thread_abort from user_cb.
*/
key = k_spin_lock(&lock);
k_sched_lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it would break SMP unfortunately if another CPU is working on _kernel.threads at the same time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is only used to read and print some debug information. It is lesser evil if we print non complete debug information than if we break bt communication during this printing.

Another way to solve it may be by the usage of the mutex, but then it has to be placed inside functions for creating and destroying the thread also.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewboie I can see from the code, that threads may not be created in ISR, but thread aborting seems to consider a situation when thread is removed from the interrupt (why?).

Any suggestion how to solve this situation without locking the whole system for the time long enough to break the bt communication?

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this isn't going to work as it stands. I agree k_thread_foreach() is a really problematic API, but the global locking is required for correctness. Also note that k_sched_lock() protects against preemption by other cooperative threads, but not by metairqs (which are threads, not interrupt contexts, and can legally create/abort threads). Can't you just modify the debugging framework for your app specifically so that it skips the loop here? Or just arrange things such that it never needs to happen during bluetooth use?

Basically, I think the answer has to be "you can't use k_thread_foreach() in latency-critical environments". What's the particular code that's causing you problems?

@rakons
Copy link
Collaborator Author

rakons commented Nov 26, 2019

@andyross

What's the particular code that's causing you problems?

It is stack analysis that took to much time. But note that this is basically the same code that is used inside shell. So calling the stack analysis from shell a few times should also break bt communication.

Maybe we should rethink the queue implementation - I can leave without correctness here. What is the risk:

  1. The task is added when we are analysing the head in threads field - then we would just not print the data about the new thread.
  2. The task that is actually analysed is removed during that process - we would hit NULL in next field and just finish the analysis - I can live also with that - we would just get incomplete analysis.

Maybe I can just implement something like k_thread_foreach_nocorrect, or k_thread_foreach_nolock and note that the correctness is not guaranteed if thread is added or removed during the process. Most important thing is "do not harm", especially when just printing some statistics.

@rakons rakons removed the Trivial Changes that can be reviewed by anyone, i.e. doc changes, minor build system tweaks, etc. label Nov 27, 2019
@rakons
Copy link
Collaborator Author

rakons commented Nov 27, 2019

New try - new version - I did not changed the original function - just added a function with limited locking.

@zephyrbot zephyrbot added the area: API Changes to public APIs label Nov 27, 2019
@zephyrbot
Copy link
Collaborator

zephyrbot commented Nov 27, 2019

All checks are passing now.

Tip: The bot edits this comment instead of posting a new one, so you can check the comment's history to see earlier messages.

@rakons
Copy link
Collaborator Author

rakons commented Nov 28, 2019

@andyross @andrewboie Could you revisit and check if this (new) approach may be implemented?

@rakons
Copy link
Collaborator Author

rakons commented Dec 4, 2019

Should I create another PR? Anyone there?

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed this update. I don't see anything awful about this. It's simple enough and solves a real problem.

include/kernel.h Outdated
* all the threads. If currently processed thread is removed in the same time,
* this @c foreach processing would end too early.
*/
extern void k_thread_foreach_nolock(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming nitpick: to my eyes, "unlocked" or "unsynchronized" are more informative than "nolock". It does, in fact, do some locking internally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, we have been using _unlocked in the kernel (internally), would be great to keep this consistent.

include/kernel.h Outdated
* queue elements. It unlocks it during user callback function processing.
* If thread is created or terminated this function may not process
* all the threads. If currently processed thread is removed in the same time,
* this @c foreach processing would end too early.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth a stronger disclaimer: any creation or aborting of threads during execution of this function may result in threads being "missed" during execution, even ones not themselves created or destroyed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only deletion may lead to miss threads after the one deleted.
I will see if creation follows following scheme:

  • In the new thread structure, set next handler to the current head
  • Replace the head with new thread

If it is that way - (first set next, then set head), then there is only a possibility that the newly created thread would be missed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it is created:

#ifdef CONFIG_THREAD_MONITOR
	new_thread->entry.pEntry = entry;
	new_thread->entry.parameter1 = p1;
	new_thread->entry.parameter2 = p2;
	new_thread->entry.parameter3 = p3;

	k_spinlock_key_t key = k_spin_lock(&lock);

	new_thread->next_thread = _kernel.threads;
	_kernel.threads = new_thread;
	k_spin_unlock(&lock, key);
#endif

But going into z_thread_monitor_exit shows that next_thread is not set to NULL:

void z_thread_monitor_exit(struct k_thread *thread)
{
	k_spinlock_key_t key = k_spin_lock(&lock);

	if (thread == _kernel.threads) {
		_kernel.threads = _kernel.threads->next_thread;
	} else {
		struct k_thread *prev_thread;

		prev_thread = _kernel.threads;
		while ((prev_thread != NULL) &&
			(thread != prev_thread->next_thread)) {
			prev_thread = prev_thread->next_thread;
		}
		if (prev_thread != NULL) {
			prev_thread->next_thread = thread->next_thread;
		}
	}

	k_spin_unlock(&lock, key);
}

Well - this seems not safe - if we delete an element that we are currently processing, and the next one - we may finish analysing some data that was removed (if thread structure was placed on the heap). Do you thing we can safetly modify the z_monitor_exit with adding following line just before k_spin_unlock:

thread->next_thread = NULL;

Copy link
Collaborator Author

@rakons rakons Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought - we still would have an issue if we delete a task structure that we are analysing. So no problem if it is already removed from task list. There may be an issue if the RAM space where it is was reused.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a reference, I have looked how it is managed in FreeRTOS. When the task is deleted, they are not doing it technically in the place of call, they are adding it into a special list for "to be deleted" tasks. It is destroyed in the idle thread. Not bad idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyross can you take a look at the comments above from @rakons?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just a suggestion. I'm not rejecting the docs as written, I'd just like to see a clearer warning that this API is going to misbehave if the user isn't very careful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyross @carlescufi What I believe we are missing here is the fact, that it is quite safe as long as TCB is not placed in heap and deleted after aborted - then we will be in trouble.
The problem is that the API assumes that right after we exit k_thread_abort the thread is already disconnected from the system and we can release TCB. I am just trying to discuss if we should not protect thread aborting somehow - it is destructible part of the system. If we add mutex to protect k_thread_abort function and then use it wherever we need the TCB to exist for a little longer we would be on safe side.

Copy link
Member

@nashif nashif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename API to k_thread_foreach_unlocked

Implement thread foreach processing with limited locking
to allow threads processing that may take more time but allows
missing some threads processing when the thread list is modified.

Signed-off-by: Radoslaw Koppel <radoslaw.koppel@nordicsemi.no>
@rakons rakons changed the title kernel: thread: k_thread_foreach: Replace spin_lock by sched_lock kernel: thread: k_thread_foreach_unlocked: Implementation of unlocked version of k_thread_foreach Dec 19, 2019
@rakons
Copy link
Collaborator Author

rakons commented Dec 19, 2019

@nashif

rename API to k_thread_foreach_unlocked

Done

@rakons
Copy link
Collaborator Author

rakons commented Dec 20, 2019

@andyross

Documentation changed.

@nashif nashif merged commit 2c529ce into zephyrproject-rtos:master Dec 21, 2019
@rakons rakons deleted the k_thread_foreach_nospinlock branch July 28, 2023 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: Kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants